Python based orchestration engine (STABLE
& POPULAR
)
Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks.
Scala based serverless event-based programming service (STABLE
& POPULAR
)
OpenWhisk is a cloud-first distributed event-based programming service. It provides a programming model to upload event handlers to a cloud service, and register the handlers to respond to various events.
Java based data ingestion tool (STABLE
)
Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources,
Java based data quality service platform (NEW
)
Apache Griffin is a model driven data quality solution for modern data systems. It provides a standard process to define data quality measures, execute, report, as well as an unified dashboard across multiple data systems.
Java based ML hive UDFs (NEW
& PROMISING
)
Apache Hivemall is a scalable machine learning library that runs on Apache Hive, Apache Spark, and Apache Pig. Hivemall is designed to be scalable to the number of training instances as well as the number of training features.