Skip to content

Instantly share code, notes, and snippets.

@omnisis
Last active March 9, 2017 02:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save omnisis/451dcdee0845912ec77846e42ed5096b to your computer and use it in GitHub Desktop.
Save omnisis/451dcdee0845912ec77846e42ed5096b to your computer and use it in GitHub Desktop.
Interesting Opensource Projects

Infrastructure

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

Cloud/BigData Development

Genie is a completely open source federated job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Conductor is an orchestration engine that runs in the cloud.

Framework to build batch, streaming and api services to deploy machine learning models using Spark and Akka compute

Docker / Containers

A simple docker client for the JVM

Docker garbage collection of containers and images

Machine Learning / DataScience

A machine learning package built for humans.

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Photon Machine Learning (Photon ML) is a machine learning library based upon Apache Spark originally developed by the LinkedIn Machine Learning Algorithms team.

Dataviz / Data Sharing

Superset is a data exploration platform designed to be visual, intuitive and interactive.

The Knowledge Repository project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using data formats and tools that make sense in these professions. It provides various data stores (and utilities to manage them) for "knowledge posts", with a particular focus on notebooks (R Markdown and Jupyter / iPython Notebook) to better promote reproducible research.

It's designed to be flexible, scalable and efficient, while providing handy analytical abilities to help modelers / data scientists make predictions easily and quickly.

Datastores

A scalable time series database based on Bigtable, Cassandra, and Elasticsearch.

Simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.

Graph Processing

Quiver is a Scala library that provides support for modeling multi-graphs which is a network of nodes connected by (possibly multiple) directed edges between nodes.

Misc

Knobs is a configuration library for Scala. It is based on the Data.Configurator library for Haskell, but is extended in a number of ways to make it more useful. (e.g. Zookeeper config loading)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment