Install MiniConda.
These are the go-to packages I use for Machine Learning/Geo/NLTK work.
conda config --add channels conda-forge # look in .condarc and make sure this is the first one
conda config --add channels anaconda
conda create -y --name py37-ml python=3.7
conda activate py37-ml
# Note that conda takes a long long looong time to install these
conda install -y qgrid psycopg2 pymssql jupyter ipykernel ipython-sql google-cloud-bigquery pandas
conda install -y beautifulsoup4 gensim nltk seaborn matplotlib tabulate joblib redis-py
conda install -y keras=2.2.4 # plaidml works up to this version
conda install -y tensorflow python-levenshtein scikit-learn
# you'll get 'java.sql.Types' has no attribute '__javaclass__' if > 0.6.3
conda install -y jpype1=0.6.3 jaydebeapi
conda install -y geopy rtree pyproj shapely geojson krb5 fiona folium geopandas
# PlaidML lets you use AMD GPU/eGPU on Keras on MacOS (yayyy!)
pip install plaidml plaidml-keras
# Build a jupyter kernel that uses this environment.
ipython kernel install --user --name py37-ml
PlaidML runs fairly well for developing neural network using Keras on macOS. I use it with an AMD RX580/Vega 56 + Sonnet Breakway eGPU.
With that said, the PlaidML performance is... meh. PlaidML+AMD GPU speeds up neural net training only by 3-5x over CPU. I use PlaidML for developing locally.
I run full training or do hyper parameter tuning by training on a GCP VM instance with Tesla T4 GPU. This costs ~$0.40/hr. A 12 hour developing/training session costs $5-$6. GCP is great in that unlike AWS, you can attach GPU to inexpensive instances, but this still adds up.
There are also bugs with PlaidML. For example, logcosh
loss function and clipnorm
optimizer options result in exceptions.
Still, for quick prototyping/experiments where CPU is too slow, or when GCP GPU instances aren't available (happens often), running with PlaidML works well.
GeoPandas is cool. It adds geo features to Pandas. You can do spatial joins, go back/forth between EPSG:4326 and EPSG:3857 easily, and do various map and GeoJSON manipulation using familiar DataFrame paradigm.
This sure beats having to do all the extra set up in PostGIS just to do spatial joins. It's really great for doing Geo work in Jupyter/Python.
Alas, GeoPandas is not easy to install. The last install line is what I came up with to install GeoPandas reliably on a MacOS (Catalina).
GeoPandas is available on conda-forge only. It's dependencies must come from conda-forge also. If you get an error like below, do conda list
and make sure you don't have packages from different channels installed. If you do, you are better off building from scratch. Make sure ~/.condarc lists conda-forge as the first channel.
ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/fiona/ogrext.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libkea.1.4.7.dylib
Referenced from: /anaconda3/lib/libgdal.20.dylib
Reason: image not found