Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View daefresh's full-sized avatar

Doug Eisenstein daefresh

View GitHub Profile
@daefresh
daefresh / top_100_open_data_tooling.csv
Created July 11, 2022 11:40
Top 100+ data engineering repos in GitHub for 2022
Repo Name Stars GitHub URL Project URL Project Description
airbyte 7176 https://github.com/airbytehq/airbyte https://airbyte.com Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses lakes and databases.
amundsen 3389 https://github.com/amundsen-io/amundsen https://www.amundsen.io/amundsen/ Amundsen is a metadata driven application for improving the productivity of data analysts data scientists and engineers when interacting with data.
arangodb 12377 https://github.com/arangodb/arangodb https://www.arangodb.com 🥑 ArangoDB is a native multi-model database with flexible data models for documents graphs and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
arctic 2729 https://github.com/man-group/arctic https://arctic.readthedocs.io/en/latest/ High performance datastore for time series and tick data
arrow-datafusion 2173 https://github.com/apache/arrow-datafusion https://arrow.apache.org/datafusion Apache
@daefresh
daefresh / advanti_open_data_tools_2021.csv
Last active May 25, 2022 14:25
[Advanti's Top 150+ Open Data Tools on GitHub in 2021.] This is a hand-curated list of tools 🔨 that I refer to when designing data platforms ❤️. Connect with me on LinkedIn if you'd like this! https://www.linkedin.com/in/douglaseisenstein/
Repo Name Stars Last Commit Timestamp GitHub URL Project URL Project Description
airbyte 3829 Tue 31 Aug 2021 12:27:10 GMT https://github.com/airbytehq/airbyte https://airbyte.io Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses lakes and databases.
airflow 22946 Tue 31 Aug 2021 13:25:22 GMT https://github.com/apache/airflow https://airflow.apache.org/ Apache Airflow - A platform to programmatically author schedule and monitor workflows
amazoncaptcha 140 Sat 17 Jul 2021 02:06:48 GMT https://github.com/a-maliarov/amazoncaptcha Pure Python lightweight Pillow-based solver for Amazon's text captcha.
amundsen 2572 Fri 27 Aug 2021 04:50:38 GMT https://github.com/amundsen-io/amundsen https://www.amundsen.io/amundsen/ Amundsen is a metadata driven application for improving the productivity of data analysts data scientists and engineers when interacting with data.
arangodb 11554 Tue 31 Aug 2021 12:03:58 GMT https://github.com/arangodb/arangodb https://www.arangodb.com 🥑 Ar
cd /tmp/ && { sudo wget https://docs.google.com/uc\?export\=download\&id\=19A4VjqVil4741RU9J7SeGYomwc_ONJad -O openaristos-python-0.1.0.1.tar.gz ; cd -; }
sudo /usr/bin/anaconda/envs/py35/bin/pip install /tmp/openaristos-python-0.1.0.1.tar.gz
# external python libraries useful for explorations
sudo /usr/bin/anaconda/envs/py35/bin/pip install --upgrade pip
sudo /usr/bin/anaconda/envs/py35/bin/pip install seaborn
sudo /usr/bin/anaconda/envs/py35/bin/pip install pyarrow
sudo /usr/bin/anaconda/envs/py35/bin/pip install plotly
# external python libraries useful for explorations
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install seaborn
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install openaristos-python
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install pyarrow
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install plotly
# create a new python 3.7 virtual environment
sudo /usr/bin/anaconda/bin/conda create --prefix /usr/bin/anaconda/envs/pysparksnowflake python=3.7 anaconda --yes
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/conda install pip --yes
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/python -m pip install --upgrade pip
# create a new python 3.7 virtual environment
sudo /usr/bin/anaconda/bin/conda create --prefix /usr/bin/anaconda/envs/pysparksnowflake python=3.7 anaconda --yes
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/conda install pip --yes
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/python -m pip install --upgrade pip
# external python libraries useful for explorations
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install seaborn
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install openaristos-python
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install pyarrow
sudo /usr/bin/anaconda/envs/pysparksnowflake/bin/pip install plotly