Skip to content

Instantly share code, notes, and snippets.

@bjcubsfan
Last active August 29, 2015 14:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bjcubsfan/dec4084cfb065dac5ab2 to your computer and use it in GitHub Desktop.
Save bjcubsfan/dec4084cfb065dac5ab2 to your computer and use it in GitHub Desktop.

Tutorials

Reproducible Science

nullius in verba (take nobody's word for it) -- royal society motto

Testing Speed

  • on wifi: 7.52 down 11.64 up
  • on cable: 75.96 down 68.40 up

Github

JC? from github.

Showed this nice github flow guide.

how to make your code citable

I was able to get this working, logging into zenodo with github. I couldn't push to get the badge because my ssh is identifying me as waas-bot, not B. J. Potter.

Regression Testing

JC is giving this I believe. He talks about sscce.

Literate programming

Ana - using dexy to make reproducible documents.

They have a latex document that is latex but also has jinja inserting stuff dynamically. I guess that dexy builds all of this after it's done.

"A big barrier to reproducibility is that you don't remember what you have installed. Docker forces you to have that base."

Introduction to Julia

I like the string syntax:

μ = 3
println("The sine of $μ is $(sin(μ))")
The sine of 3 is 0.1411200080598672

Conference Day One 2014-07-08

Intro and keynote

Largest ever SciPy at 450.

Geospatial is a theme

Lorena Barba is giving keynote. ME from Chile. working on the flipped classroom. The homework is to watch a video lecture. the class time is spent working problems.

TEDed makes a lesson from a video. This learning technique is supported by research.

piazza - hosts discussion for her class.

IP for code is made open because it's in her grant, AAUP guidelines suggest copyright for instruction to faculty member. She's running her own MOOC in the fall.

Geopandas

By Kelsey Jordahl, Enthought

Started I. his hotel last conference, uses one codebase for 2/3. needs:

  • pandas .13+
  • shapely GEOS
  • Fiona (GDAL/OGR)
  • Pyproj (PROJ.4)
  • Matplotlib
  • psycopg2, sqlalchemy, geopy
  • outputs to GEOJSON. maybe I could use this on the backems and serve up the GEojson for plotting on a web map.
  • I need to look into pandas. it's reportedly very powerful.
  • mplleaflet - takes existing plot and gives you an interactive map

Common Scientific Compute Environment

by a guy from Berkeley D-lab which supports computational social sciences

  • motivated by jiffylab

  • their virtualbox

  • packer coordinates building a virtual box. burden only on person making box.

  • vagrant requires an extra step

  • ansible is added complexity from shell scripts

  • Docker adds complexity

  • they optimize for seamless installation and use

  • GUI only install

  • packer! builds for anything EC2, Vbox

Real time events with python

by Carson Farmer

Looking at Twitter topics in geo-located tweets

  • normalize by population
  • look at content of geo-located tweets only
  • what topics are popular where
  • has to support streaming (no batch processes)

TODO: look at pico which lets call python routines on your server from JavaScript.

Matplotlib BOF

by Benjamin Root

imshow makes 3 copies of an image. hurts memory usage.

feature request of metadata in output images.

WCSAxes

@astrofrog

plots geospatial and astro data.

EECS is an astronomy standard for putting spherical data on a plane, they have headers that show all the data.

ClimatePipes Tools

by a kitware guy

  • GEOjs is the main front end tool
  • They are working a Python back end and JS front end.
  • they use decorators to choose functions to make accessible over the web
  • They use celery for back end tasks that may be long running.

OpenClimateGIS from NOAA

  • stand alone library for access and analysis of climate data

some sources inc geojs

Takes care of the projections for you.

They started this because basemaps didn't work.

Reproducible builds and packaging with hashdist

by Aron Ahmadia

AKA: don't fail at packaging

Reproducibility

what if you used a buggy LAPAK? your results are suspect 10 yrs down the line

Relocatibility

  • conda has done a good job
  • homebrew does this with symlinks

Customizability

hrmm

demo

  • he's reproducing two ipython notebooks
  • he has ipython instructions in the notebook that will make it work
  • it reloads the notebook using specific versions
  • lmonade is a competitor, he hadn't heard of

Lightning talks

  • completely open journal.
  • free to publish, read, etc.
  • Very high impact score

Ipython

  • why ipython name if it does Julia, etc.
  • new name: Jupyter
  • TODO: Read authorea: science was always meant to be open

colaboratory

python in chrome!

Indexing!

Matt Turk - nationaldataservice.org

a work in progress library for navigation

Conference Day Two 2014-07-09

Keynote

  • Packaging from a Redhat guy.
  • "The data center installed Linux because they didn't have to ask anyone for money."
  • They need help on documenatation

Google Earth Engine

Real-time Crunching of Petabytes of Geospatial Data with Data Google Earth Engine by Tyler Erickson, Google & Randy Sargent, Carnegie Mellon University

  • previously researcher in env science
  • Earth Engine is what he's using
  • USGS Landsat series of 7 sats, very larg data set 30m resolution. They're data is hard to access, big & on tape.
  • Barbara Ryan helped make Landsat data freely available. It used to cost per scene, but she helped make it free. Google now hosts a lot of the data so people can get it for free without going to tape
  • Medium data - fits on a single machine but you can't be dumb about it.
  • Earth engine combines large data sets with geo algorithms to give researches easy access to this stuff.
  • IPython widget loading tiles from google and overlaying LANDSAT data.
  • They got rid of the clouds on Google Earth.
  • It's now being used for watching deforestation monitoring.

Rasterio

Geospatial Raster Data Access for Programmers and Future Programmers

By Sean Giles

  • Only person that says Raster-eye-oh at mapbox. Everyone else says ras-tier-ee-oh.
  • Mapbox is making a cloudless atlas based on current LANDSAT data.
  • Now they score pixel-by-pixel and sort the best to the top
  • The scanline corrector on landsat7 failed in 2003. they can't use landsat8 because it hasn't been around long enough to get cloudless images everywhere
  • they also now get seasonal lines because of the broken scanline corrector.
  • they want to fix this because of visual appeal of the final images

Bokeh

  • pronounced bow-kay

  • static plot examples

  • new version 0.5 out today

  • has widgets

  • charts interface

  • browser based. big data, visualization,

  • no need to write JavaScript

  • matplotlib compatible TODO: try this

  • other libraries like this

  • mpld3

  • plotly

  • working on adding open street maps

  • big priority is streaming data

  • "We wrote JavaScript so you don't have to."

Conda

Package manager for any binary distribution by Aaron Meurer from continuum analytics

AKA: solving the packaging problem

  • two sides to the problem: installing and building
  • pip is only Python, in scipy we use c libraries
  • setuptools is kind of evil
  • real packaging problem is Windows
  • windows
  • choclatey and
  • ???? are pac mangers
  • conda works on all 3
  • no admin privileges
  • upload to binstar to share. it's like github for binary packages

Visualization BOF

  • This one is across everything, not matplotlib specific, etc.
  • tableau is doing interesting things

Geophysical algorithm

by Karl Schleicher

lightning talks

training.enthought.com

Announced today!

Data Carpentry

Conference Day 3: 2014-07-10

Lessons learned by Greg Wilson of the Mozilla Science Lab

  • He wrote a paper
  • book to read: how learning works 7 research based principles for smart teaching susan a ambrose
  • What costs school students learning
  • Mark Guzdile at GA tech - best research on SW teaching
  • Wrote a book: Making Software; What really works and Why We Believe It . . . This has chapter length essays from people doing research on this
  • Randomo - a lang with a randomly designed syntax. Perl and Java do no better than Randomo
  • "We see researchers a day a week for the rest of their carrers." Scottish verdict. . . they can say "not proven" in addition to guilty or not guilty. - This statement is anecdotal.
  • We do not knoow how to measure productivity of programmers or scientists.
  • We need teachers to do like wikipedia - massive collaboration
  • "right now there are thousands of people making slides on how to solve diff eqs, that makes as much sense as thousands of people writing paypal plugins for django"
  • We are making this happen.
  • "A week of hard work can save you an hour of thought."
  • We need diff/merge for word, images, etc. "We're being arrogant. Our way is not the only way."
  • http://nbdiff.org/
  • Build a open-source tool for diff/merge in excel, hdf5, word, etc.
  • "We need an incremental path from Word/Excel, now they have to drop everything and come over. That's hard and it's rude."
  • "It's like we got a wonderful future at Ikea, but can't put it together."
  • Read Seven Ways to Think like the Web

Time Series Analysis for Network Security

by Phil Roth data scientist at Endgame mrphilroth.com

  • use HDFS to put stuff in Redis. batch and historical.
  • kairos - interface to back end storage. good with time series
  • pyspark - good engine for large data
  • pandas has a great way to calculate a weighted moving average. pd.ewma
  • investigating ARIMA autoregressive.... statsmodels.TSA.arima_model

Campaign for IT literacy through FOSS and Spoken Tutorials

by Kannan Moudgalya

  • Using AAKASH for tablet. Also running something on a cheap arduino

by Paul Ivanov

  • MacVim doesn't have to be linked to the python you're running. it just has to be able to talk it.
  • vimception makes it more Ipython more vim like.

Zero Dependency Python

by Matthew Turk & Kester Tong

Matthew's Talk

happy about his 5 commits to Chrome "that moment when they realize that they can bend their computer to their will." dark sky simulation. run on 6 digits of cores Gordon Bell prizes hard to get at the data how can they make available without hurting people using it Python mediates problem of people accessing their data Everyone uses POSIX pepper API - what can be passed to and from VMs pexe - packaged apps for chrome python.pexe & kernel.pexe gonacl.com

Kester's Talk

Colaboratory GH/jupyter/Colaboratory talks to Google drive for the ipython notebook 1 click install for ipython used Google docs for easy sharing has numpy, pandas, and matplotlib has auto complete Google docs features: see other people, cells marked by who put it there will grab the latest naclports version of dependencies bliss is a c implementation of LAPAK zerovm out of rackspace

Holography

easy 3D imaging

Circuits in IPython

  • He is doing quantum optics circuits
  • made cirq to visualize circuits in IPython

Interactive Vis with Holo

  • Still learning with interactive vis as grad student at Harvard

How to choose a good colour map

by Damon McDougall (Institute for Computational and Engineering Sciences, UT Austin, USA)

  • this talk is based on a paper.
  • I'm dumping on jet for the rest of this talk
  • Colin Ware has a good paper to read
  • Psychological experiments by S. S. Stevens (fromerly at Harvard)

Perceptions of matplotlib colormaps

Kristen M. Thyng from Texas A&M

  • Using the CIELAB color space
  • Looking at the matplotlib color maps. They should behave in the lighness L*.
  • She has some great examples of the colormaps and how they are working in L*.
  • Weber-Fechner Law (and Stevens) about perceived lightness
  • Joseph Albers, "Interaction of color" <- cool book on color.
  • color brewer color maps - brewertompl gets those into matplotlib
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment