Skip to content

Instantly share code, notes, and snippets.



Last active May 3, 2016
What would you like to do?
AAT Observational Techniques Workshop 2016: Data Archives Notes


Notes available at

What is a data archive?

How might you use a data archive in astronomy?


Interested in the paper Green et al., MNRAS 437, 1070 (2014).

Would be interested in plotting Halpha vs Mgas to test the analysis in the paper.

Type numbers into python from the paper? Ridiculous!!!

Use Vizier to retrieve tabular data for published papers

Go to Vizier!

Search "Green 2014"

Catalog appears (name is J/MNRAS/437/1070) in search results. Looking at the catalogs available, it looks like we want the "sample" catalog.

We could download this file in some format, and then parse it into python...or we could just see if python can get the data directly.

Connecting to Vizier using a script/program

Search "Vizier API" on Google. Looks like there is a python package called "Astroquery" that can directly query Vizier.


from astroquery.vizier import Vizier


Install astroquery using pip

# Check that python is correctly set-up

pip install astroquery
cat_list = Vizier.get_catalogs('J/MNRAS/437/1070/sample')

cat = cat_list[0]


import matplotlib.pyplot as plt

plt.scatter(cat['Mgas'], cat['logLIHa'])

Gemini Archive

Simple data retrieval

Interested in downloading some raw data from the archive. I want NIFS data on the galaxy "GDDS 22-2172" at RA 22:17:39.85, DEC +00:15:26.42.

Go to the Gemini Archive

Type in details to find data. Hit Search. Results view appears below search fields

  • Files available to download.
  • link to download all (selected) at the bottom of the page
  • Permanent link to this data.

Using the API

Now what if I want to get data automatically, e.g. as part of an automated data reduction pipeline?

Look at the Help. Under "Accessing the Archive from scripts and the command line", there is a link to the API Help.

Scroll down to find Python script

Copy and paste it to ipython.

Paste into editor and edit to match our program:

import urllib
import json

# Construct the URL. We'll use the jsonfilelist service
url = ""

# List the files for GN-2010B-Q-22 taken with GMOS-N on 2010-12-31
url += "GN-2008A-Q-18/GMOS-N/NIFS/20090828/science/GDDS-22-2172"

# Open the URL and fetch the JSON document text into a string
u = urllib.urlopen(url)
jsondoc =

# Decode the JSON
files = json.loads(jsondoc)

# This is a list of dictionaries each containing info about a file
total_data_size = 0
print "%20s %22s %10s %8s %s" % ("Filename", "Data Label", "ObsClass",
                                 "QA state", "Object Name")
for f in files:
    total_data_size += f['data_size']
    print "%20s %22s %10s %8s %s" % (f['name'], f['data_label'],
                                     f['observation_class'], f['qa_state'],

print "Total data size: %d" % total_data_size

Paste updates into python.

Open a FITS file from the archive directly in python

Look at download URL from the web page.

Looks something like

So we just need the filename to download the file.

The previous script has created a files variable, which we can use to get the filenames of our files. Lets see if we can open one of the files directly in python.

a_filename = files[0]['name']

response = urllib.urlopen("" + a_filename)

fits_bytes =

from io import BytesIO
from import fits

f =



plt(f[1].data, clim=(0,500), interpolation='nearest')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment