Skip to content

Instantly share code, notes, and snippets.

@dkapitan
Last active October 4, 2015 13:53
Show Gist options
  • Save dkapitan/7c7d7a2a7c7c26923aec to your computer and use it in GitHub Desktop.
Save dkapitan/7c7d7a2a7c7c26923aec to your computer and use it in GitHub Desktop.
OSX voor data science

Aim: A manageble and transparent OSX setup for all things data at NL Healthcare. This OSX configuration also aims to mimic the server environment (CentOS 7) as much as possible, for maximum portability of the data analytics stack.

Main stack:

  1. OSX (Yosemite at time of writing, El Capitan has just been released)
  2. Python 3.4 via conda package manager. Anaconda distribution by Continuum Analytics is our standard base-Python stack. jupyter notebook is our preferred analytics environment for interactive computing in various languages
  3. PostgreSQL 9.3, using MacPorts as the preferred package manager. In case an app is not available in MacPorts, we will use homebrew as a fall-back package manager
  4. Optional interactive computing languages: R, julia or any of the supported languages incl. bash
1. OSX itself
1.1. Must-haves system prefs to ensure a secure and stable setup
  • Encrypt harddrive with FileVault: System preferences > Security & Privacy > FileVault
  • Turn firewall on: System preferences > Security > Firewall
  • xcode libraries since they are required by the data science tools that we use
  • I advise to stick to international encoding and charactersets as much as possible, i.e. en_US.UTF-8 and do transformation to local setting (e.g. ',' as decimal separator) as late as possible in your whole workflow.
1.2. Nice-to-haves settings to your own liking
  • Turned on tab for dialog boxes: System preferences > Keyboard > Shortcut > All controls
  • Silenced startup sound: sudo nvram SystemAudioVolume ="%80"
1.3. Apps and utilities (licensed, provided for by NLHC)
1.4. Apps and utilities (licensed, buy your own)
  • alfred great productivity app for keyboard shortcuts and automation
  • CleanMyMac to keep your prized MacBook snappy and clean
1.5. Apps and utilities (open source)
  • jupyter notebook for interactive computing
  • atom or TextWrangler for basic text editing
  • GitHub and, if you like, GitHub desktop which is our main versioning and online collaboration platform
  • Google Chrome as the standard browser, particularly because of the many apps and extensions
  • Speed Dial 2 for managing bookmarks and start-up tab
  • Journey as your cross-platform notepad with Markdown for easy publishing to websites and internal wiki's
  • Ghostery for blocking pop-ups and trackers
  • iTerm2 to replace the standard Terminal with more features
  • yEd for diagramming
  • Archi for developing Archimate models of the enterprise architecture
1.6. Read more on setting up your Mac
2. Installing Python
3. Installing and setting up PostgreSQL on OSX
3.1. Install PosgreSQL 9.3. with MacPorts
  • Install MacPorts. Note that MacPorts puts everthing in /opt. We will use this location to install everything that is managed by MacPorts and conda
  • Install PostgreSQL 9.3 with Macports as decribed here. Note the following:
    • Take note ofthe layout for the database files. Using Macports' standard for Postgres, I choose to put the database files in /opt/local/var/db/postgresql93/defaultdb
    • When this process has completed, you need to start the server
    • By default, a superuser 'postgres' with no password has been configured with default database 'postgres'
3.2. Compile and install pgloader (optional)
  • Make sure you have already installed PostgreSQL via macports and the server is running. This prevents brew from installing another PostgreSQL version
  • We follow installation procedure from https://github.com/dimitri/pgloader/blob/master/INSTALL.md
    • NB1: install clozure-cl with brew
    • NB2: compile with 'make CL=ccl' did the trick. Using sbcl (installed with Macports) got errors since it couldn't find certain libraries
4. Optional interactive languages:
  • Also install IRkernel to use R with Jupyter
  • Alternatively, use RStudio as a dedicated R IDE
  • Fix locale for RStudio: defaults write org.R-project.R force.LANG en_US.UTF-8

http://ianlunn.co.uk/articles/quickly-showhide-hidden-files-mac-os-x-mavericks/ shows how you can add aliases to your shell to quickly show or hide system files in OSX:

alias showFiles='defaults write com.apple.finder AppleShowAllFiles YES; killall Finder /System/Library/CoreServices/Finder.app'
alias hideFiles='defaults write com.apple.finder AppleShowAllFiles NO; killall Finder /System/Library/CoreServices/Finder.app'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment