Skip to content

Instantly share code, notes, and snippets.

@devender-yadav
Last active August 24, 2018 12:52
Show Gist options
  • Save devender-yadav/939f367581204346e187e0c64b053635 to your computer and use it in GitHub Desktop.
Save devender-yadav/939f367581204346e187e0c64b053635 to your computer and use it in GitHub Desktop.
Interesting apache incubators

Python based orchestration engine (STABLE & POPULAR)

Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks.

Scala based serverless event-based programming service (STABLE & POPULAR)

OpenWhisk is a cloud-first distributed event-based programming service. It provides a programming model to upload event handlers to a cloud service, and register the handlers to respond to various events.

Java based data ingestion tool (STABLE)

Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources,

Java based data quality service platform (NEW)

Apache Griffin is a model driven data quality solution for modern data systems. It provides a standard process to define data quality measures, execute, report, as well as an unified dashboard across multiple data systems.

Java based ML hive UDFs (NEW & PROMISING)

Apache Hivemall is a scalable machine learning library that runs on Apache Hive, Apache Spark, and Apache Pig. Hivemall is designed to be scalable to the number of training instances as well as the number of training features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment