#“A data scientist is someone who knows more statistics than a computer scientist and more computer science than a statistician.” -Josh Blumenstock
#"A geo-data scientist is someone who knows more about GIS than either of those guys." -Tyler Dahlberg
#Going Geo Open Source In all likelihood a list like this has been written somewhere, by someone, for some reason. I know I'm not breaking any ground here; I'm just trying to organize on paper what's been going through my head ever since I got out of grad school.
As someone trained in traditional GIS (ie ArcGIS), I have a lot of hurdles to clear when it comes to 'open-sourcing' myself. Don't get me wrong, I love (and love-hate) ESRI's product. Using ArcGIS is like curling up by a nice warm fire in the middle of a snowstorm or looking out the window from a cozy couch on a rainy day. It'll probably always be there, and years from now it'll be just like you remember. But today's world is changing far more rapidly than ESRI can keep up. GIS is expanding into web mapping, spatial apps, and new applications of location information that spatial analysts didn't even dream of five years ago, let alone two or three.
##The Geo Data Scientist's Toolkit: ###QGIS QGIS doesn't replicate everything in ArcGIS, and its unfamiliar interface is hard to get used to to, but it's the only FOSS product that comes close to a desktop GIS solution. Bonus: Many of QGIS' operations run on GDAL, and even output the code below the tool simultaneously. Two birds!
###GDAL GDAL is a command-line tool that's become ubiquitous in the open source geo world. It's great for projection, clipping, transforming, and converting spatial files.
###Python Python is incredibly easy (well, for a programming language anyway) to pick up. Its syntax is natural and easy to read, and it's widely used and accepted already in the geospatial and statistics communities as a means of processing spatial data.
- Tools:
- IPython Notebook: A means to share and perform Python analysis.
- Pandas: A Python data processing library that makes manipulating data structures easier.
- Anaconda: A Python installation package that includes just about every data analysis package out there.
- Learning Resources:
- Learn Python The Hard WayA long-standing and useful place to put yourself through the Python paces, from a position of zero knowledge.
- Think Python Great open-source python learning book.
- The Hitchhiker's Guide to Python: This seems like a great place to help with the fundamentals of developing with Python.
###Javascript If you want to be able to share your maps on the web, and you want to be free to innovate beyond the rails present in ArcGIS Online, CartoDB, and Mapbox, you NEED to know Javascript.
- Tools:
- Learning Resources
###R Command-line statistical software that has an incredible array of packages that can do bonkers stuff.
- R Spatial Cheatsheet
- Owen's 'The R Guide': Great starter documentation for R, from R.
- R for Cats: A guide written as if you have no familiarity w/programming, fun, nice intro. Based on JS for Cats
###Postgres+PostGIS This one is interesting. For me it represents and entirely different paradigm for thinking about spatial data (as databases, rather than as a bunch of flat proprietary files). The ability to feed in data and perform rapid spatial queries with near instant results is essential.
###Courses & MOOCs:
- Codecademy: Totally free, features web design, Python, Ruby resources. Very nice interactive tutorials, but it's perhaps a bit too easy to follow along. Not sure if the learning sticks.
- Codeschool: Low monthly fee. Mostly web development, with some miscellaneous lessons on Git and R. Courses feel more courses than codecademy's tutorials.
- Software Carpentry: Great place to get your feet wet with good software practices for science and analysis with bash, Python, R, and version control.
- Open Data Science Masters
- Harvard's Intro to Computer Science Course
- Harvard's Data Science Course
- Foss4G's Open Source GIS Courses
- QGIS Bootcamp
- Udemy: A technology focused MOOC
- Coursera: A broadly focused MOOC with some good technology courses.
###Data Tools
- Enigma.io: Great place to search for public data sets from all over the world
- Import.io: Data-scraping & API-making website. Desktop app too.
- Data Science Toolbox: A cool toolbox of data science libraries in Python and R that you can install on any machine with Amazon EC2 or Vagrant.
- Geojson.io: Create geojson point, line, and polygon files from scratch.Great web mapping tool.
- Geomancer.io: Upload CSV with some sort of geographic identifer column, get out census data. It's magic.
- A Paragraph: Literate MappingSimilar to Geojson.io, but lets you use natural language