Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akiatoji/b8ec0f17bf47a0baef2da3a618b4f196 to your computer and use it in GitHub Desktop.
Save akiatoji/b8ec0f17bf47a0baef2da3a618b4f196 to your computer and use it in GitHub Desktop.
Installing ML dev stack on MacOS

Install conda

Install MiniConda.

Install dependencies

These are the go-to packages I use for Machine Learning/Geo/NLTK work.

conda config --add channels conda-forge   # look in .condarc and make sure this is the first one 
conda config --add channels anaconda   
conda create -y --name py37-ml python=3.7
conda activate py37-ml

# Note that conda takes a long long looong time to install these
conda install -y qgrid psycopg2 pymssql jupyter ipykernel ipython-sql google-cloud-bigquery pandas 

conda install -y beautifulsoup4 gensim nltk seaborn matplotlib tabulate joblib redis-py

conda install -y keras=2.2.4  # plaidml works up to this version

conda install -y tensorflow python-levenshtein scikit-learn

 #  you'll get 'java.sql.Types' has no attribute '__javaclass__' if > 0.6.3 
conda install -y jpype1=0.6.3 jaydebeapi   

conda install -y geopy rtree pyproj shapely geojson krb5 fiona folium geopandas

# PlaidML lets you use AMD GPU/eGPU on Keras on MacOS (yayyy!)
pip install plaidml plaidml-keras

# Build a jupyter kernel that uses this environment.  
ipython kernel install --user --name py37-ml

Notes on PlaidML

PlaidML runs fairly well for developing neural network using Keras on macOS. I use it with an AMD RX580/Vega 56 + Sonnet Breakway eGPU.

With that said, the PlaidML performance is... meh. PlaidML+AMD GPU speeds up neural net training only by 3-5x over CPU. I use PlaidML for developing locally.

I run full training or do hyper parameter tuning by training on a GCP VM instance with Tesla T4 GPU. This costs ~$0.40/hr. A 12 hour developing/training session costs $5-$6. GCP is great in that unlike AWS, you can attach GPU to inexpensive instances, but this still adds up.

There are also bugs with PlaidML. For example, logcosh loss function and clipnorm optimizer options result in exceptions.

Still, for quick prototyping/experiments where CPU is too slow, or when GCP GPU instances aren't available (happens often), running with PlaidML works well.

Notes about GeoPandas

GeoPandas is cool. It adds geo features to Pandas. You can do spatial joins, go back/forth between EPSG:4326 and EPSG:3857 easily, and do various map and GeoJSON manipulation using familiar DataFrame paradigm.

This sure beats having to do all the extra set up in PostGIS just to do spatial joins. It's really great for doing Geo work in Jupyter/Python.

Alas, GeoPandas is not easy to install. The last install line is what I came up with to install GeoPandas reliably on a MacOS (Catalina).

GeoPandas is available on conda-forge only. It's dependencies must come from conda-forge also. If you get an error like below, do conda list and make sure you don't have packages from different channels installed. If you do, you are better off building from scratch. Make sure ~/.condarc lists conda-forge as the first channel.

ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/fiona/ogrext.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libkea.1.4.7.dylib
Referenced from: /anaconda3/lib/libgdal.20.dylib
Reason: image not found
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment