Notes available at http://tinyurl.com/otws-db
What is a data archive?
How might you use a data archive in astronomy?
Interested in the paper Green et al., MNRAS 437, 1070 (2014).
Would be interested in plotting Halpha vs Mgas to test the analysis in the paper.
Type numbers into python from the paper? Ridiculous!!!
Use Vizier to retrieve tabular data for published papers
Go to Vizier!
Search "Green 2014"
Catalog appears (name is J/MNRAS/437/1070) in search results. Looking at the catalogs available, it looks like we want the "sample" catalog.
We could download this file in some format, and then parse it into python...or we could just see if python can get the data directly.
Connecting to Vizier using a script/program
Search "Vizier API" on Google. Looks like there is a python package called "Astroquery" that can directly query Vizier.
from astroquery.vizier import Vizier
Install astroquery using pip
# Check that python is correctly set-up pip install astroquery
cat_list = Vizier.get_catalogs('J/MNRAS/437/1070/sample') cat = cat_list cat.colnames cat['logLIHa'] cat['Mgas'] import matplotlib.pyplot as plt plt.scatter(cat['Mgas'], cat['logLIHa']) plt.show()
Simple data retrieval
Interested in downloading some raw data from the archive. I want NIFS data on the galaxy "GDDS 22-2172" at RA 22:17:39.85, DEC +00:15:26.42.
Go to the Gemini Archive
Type in details to find data. Hit Search. Results view appears below search fields
- Files available to download.
- link to download all (selected) at the bottom of the page
- Permanent link to this data.
Using the API
Now what if I want to get data automatically, e.g. as part of an automated data reduction pipeline?
Scroll down to find Python script
Copy and paste it to ipython.
Paste into editor and edit to match our program:
import urllib import json # Construct the URL. We'll use the jsonfilelist service url = "https://archive.gemini.edu/jsonsummary/" # List the files for GN-2010B-Q-22 taken with GMOS-N on 2010-12-31 url += "GN-2008A-Q-18/GMOS-N/NIFS/20090828/science/GDDS-22-2172" # Open the URL and fetch the JSON document text into a string u = urllib.urlopen(url) jsondoc = u.read() u.close() # Decode the JSON files = json.loads(jsondoc) # This is a list of dictionaries each containing info about a file total_data_size = 0 print "%20s %22s %10s %8s %s" % ("Filename", "Data Label", "ObsClass", "QA state", "Object Name") for f in files: total_data_size += f['data_size'] print "%20s %22s %10s %8s %s" % (f['name'], f['data_label'], f['observation_class'], f['qa_state'], f['object']) print "Total data size: %d" % total_data_size
Paste updates into python.
Open a FITS file from the archive directly in python
Look at download URL from the web page.
Looks something like https://archive.gemini.edu/file/N20090828S0182.fits
So we just need the filename to download the file.
The previous script has created a
files variable, which we can use to get the filenames of our files. Lets see if we can open one of the files directly in python.
a_filename = files['name'] response = urllib.urlopen("http://archive.gemini.edu/file/" + a_filename) fits_bytes = response.read() response.close() from io import BytesIO from astropy.io import fits f = fits.open(BytesIO(fits_bytes)) f.info() f.header f.data.shape plt(f.data, clim=(0,500), interpolation='nearest')