This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cd ~/ | |
mkdir flink-sql-libs | |
cd flink-sql-libs/ | |
wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka/1.17.2/flink-connector-kafka-1.17.2.jar | |
wget https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/3.4.0/kafka-clients-3.4.0.jar | |
# Only need this if querying WMF event streams. | |
# https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Processing/Flink_Catalog#Creating_Tables | |
# wget https://archiva.wikimedia.org/repository/releases/org/wikimedia/eventutilities-flink/1.3.3/eventutilities-flink-1.3.3-jar-with-dependencies.jar |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark3-submit --class org.wikimedia.analytics.refinery.job.refine.tool.EvolveHiveTable ./refinery-job/target/refinery-job-0.2.28-SNAPSHOT-shaded.jar --table=event.mediawiki_page_change_v1 --schema_uri=/mediawiki/page/change/latest --dry_run=true | |
24/01/02 21:49:53 INFO DataFrameToHive: Found difference in schemas for Hive table otto.mw_page_change0 | |
Table schema: | |
root | |
-- _schema: string (nullable = true) | |
-- changelog_kind: string (nullable = true) | |
-- comment: string (nullable = true) | |
-- created_redirect_page: struct (nullable = true) | |
|-- is_redirect: boolean (nullable = true) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import argparse | |
import logging | |
import sys | |
from pyflink.common import WatermarkStrategy, Encoder, Types | |
from pyflink.datastream import StreamExecutionEnvironment, RuntimeExecutionMode, ProcessFunction, OutputTag | |
from pyflink.datastream.connectors.file_system import FileSource, StreamFormat, FileSink, OutputFileConfig, RollingPolicy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import argparse | |
import logging | |
import sys | |
from pyflink.common import WatermarkStrategy, Encoder, Types | |
from pyflink.datastream import StreamExecutionEnvironment, RuntimeExecutionMode, ProcessFunction, OutputTag | |
from pyflink.datastream.connectors.file_system import FileSource, StreamFormat, FileSink, OutputFileConfig, RollingPolicy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM docker-registry.wikimedia.org/flink:1.16.0-37 | |
# add python script | |
USER root | |
RUN mkdir -p /srv/flink_app && ls | |
ADD python_demo.py /srv/flink_app/python_demo.py | |
USER flink | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# flake8: noqa | |
# This file is autogenerated by /metadata-ingestion/scripts/avro_codegen.py | |
# Do not modify manually! | |
# pylint: skip-file | |
# fmt: off | |
# The SchemaFromJSONData method only exists in avro-python3, but is called make_avsc_object in avro. | |
# We can use this fact to detect conflicts between the two packages. Pip won't detect those conflicts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Need to download flink-connector-kafka-1.15.2.jar and kafka-clients-2.4.1.jar | |
./bin/sql-client.sh -i flink_sql_init.sql -pyfs get_revision_content_udf.py -pyexec /home/otto/pyflink_udf2/bin/python3 -pyclientexec /home/otto/pyflink_udf2/bin/python3 -j /home/otto/flink-connector-kafka-1.15.2.jar -j /home/otto/kafka-clients-2.4.1.jar |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TEMPORARY TABLE mediawiki_page_change ( | |
`wiki_id` STRING, | |
`meta` ROW<domain STRING>, | |
`page_change_kind` STRING, | |
`page` ROW<page_id BIGINT, page_title STRING>, | |
`revision` ROW<rev_id BIGINT, content_slots MAP<string, ROW<slot_role STRING, content_format STRING, content_body STRING>>> | |
) WITH ( | |
'connector' = 'kafka', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Quick and hacky script that will use dyff to show the diff between | |
// Any modified materialized schema version and its previous version. | |
// | |
// Defaults to using https://github.com/homeport/dyff, so install that first. | |
// This could be cleaned up and incorporated into jsonschema-tools itself, and | |
// then shown in CI. | |
// | |
jsonschema_tools = require('@wikimedia/jsonschema-tools'); | |
const _ = require('lodash'); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Pyflink Streaming Table Env + Event Utilities Event Platform integration. | |
# Download wikimedia-event-utilities Flink: | |
# Download Wikimedia Event Utilities Flink jar, e.g. | |
# https://archiva.wikimedia.org/#artifact-details-download-content/org.wikimedia/eventutilities-flink/1.2.0 | |
# wget http://archiva.wikimedia.org/repository/releases/org/wikimedia/eventutilities-flink/1.2.0/eventutilities-flink-1.2.0-jar-with-dependencies.jar | |
# Also download other dependencies we need: | |
# wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka/1.15.2/flink-connector-kafka-1.15.2.jar | |
# wget https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/3.2.3/kafka-clients-3.2.3.jar | |
# If you like, use ipython for your REPL, its nicer! |
NewerOlder