Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Export Issues from Github repo to CSV (API v3)
"""
Exports Issues from a specified repository to a CSV file
Uses basic authentication (Github username + password) to retrieve Issues
from a repository that username has access to. Supports Github API v3.
"""
import csv
import requests
GITHUB_USER = ''
GITHUB_PASSWORD = ''
REPO = '' # format is username/repo
ISSUES_FOR_REPO_URL = 'https://api.github.com/repos/%s/issues' % REPO
AUTH = (GITHUB_USER, GITHUB_PASSWORD)
def write_issues(response):
"output a list of issues to csv"
if not r.status_code == 200:
raise Exception(r.status_code)
for issue in r.json():
labels = issue['labels']
for label in labels:
if label['name'] == "Client Requested":
csvout.writerow([issue['number'], issue['title'].encode('utf-8'), issue['body'].encode('utf-8'), issue['created_at'], issue['updated_at']])
r = requests.get(ISSUES_FOR_REPO_URL, auth=AUTH)
csvfile = '%s-issues.csv' % (REPO.replace('/', '-'))
csvout = csv.writer(open(csvfile, 'wb'))
csvout.writerow(('id', 'Title', 'Body', 'Created At', 'Updated At'))
write_issues(r)
#more pages? examine the 'link' header returned
if 'link' in r.headers:
pages = dict(
[(rel[6:-1], url[url.index('<')+1:-1]) for url, rel in
[link.split(';') for link in
r.headers['link'].split(',')]])
while 'last' in pages and 'next' in pages:
r = requests.get(pages['next'], auth=AUTH)
write_issues(r)
if pages['next'] == pages['last']:
break
@kilirobbs

This comment has been minimized.

Copy link

kilirobbs commented Dec 20, 2012

Hey Brian,

I had to update line 21 from r.json to r.json() in order for it work. Great work and thanks for sharing.

Kilimanjaro

@akiradeveloper

This comment has been minimized.

Copy link

akiradeveloper commented Mar 17, 2013

This code only works in Python 2.7 or above.
In Python 2.6, L.14 and L.27 have to be rewritten not to use format method.

ISSUES_FOR_REPO_URL = 'https://api.github.com/repos/%s/issues' % (REPO)

and

csvfile = '%s-issues.csv' % (REPO.replace('/', '-'))

are correct.

Please see https://code.google.com/p/python-twiggy/issues/detail?id=10

@unbracketed

This comment has been minimized.

Copy link
Owner Author

unbracketed commented Apr 25, 2013

@kilirobbs Thanks, this was running against an older release of requests.

@unbracketed

This comment has been minimized.

Copy link
Owner Author

unbracketed commented Apr 25, 2013

@akiradeveloper Updated for Python < 2.7

@Pinwheeler

This comment has been minimized.

Copy link

Pinwheeler commented Jan 24, 2014

Quick disclaimer for the stupid (like me)

This gist only exports issues with the label "Client Requested". So if you are trying to export all issues, just remove that for loop.

@mblackstock

This comment has been minimized.

Copy link

mblackstock commented Feb 12, 2014

Thanks for this! For my 4 page list of issues I needed to copy pages = dict(...) into the while loop so that the loop would break. --Mike

@sidan5

This comment has been minimized.

Copy link

sidan5 commented Jul 7, 2014

I get this:

Traceback (most recent call last):
File "export_github_issues.py", line 32, in
write_issues(r)
File "export_github_issues.py", line 20, in write_issues
raise Exception(r.status_code)

@ghardy2

This comment has been minimized.

Copy link

ghardy2 commented Jul 18, 2014

sidan5, I get the same issue

@dclarktandem

This comment has been minimized.

Copy link

dclarktandem commented Aug 18, 2014

I'm getting the same issue

@chstewa1

This comment has been minimized.

Copy link

chstewa1 commented Sep 15, 2014

That exception means that you're probably getting a 404 back- check the URL manually (and careful of the difference for Enterprise GitHub, if you're using that).

@davedyk

This comment has been minimized.

Copy link

davedyk commented Nov 12, 2014

Thanks Brian, this code is just what I needed. (I'm a beginning Python user, and editing your code to accomplish a little task at work has been my first real project; A big thank-you).

For those of you running Github Enterprise with 2-factor authentication, you can get a "Personal API token", and filter the responses, with something like this:

#Update your personal API token here
PERSONAL_TOKEN = '[YOUR_TOKEN_HERE]'
headers = {'Authorization': 'token %s' % PERSONAL_TOKEN }

# Update your filter here.  Filter is who owns the issue.  State is open, closed, or all.
params_payload = {'filter' : 'all', 'state' : 'closed' }

# Specify the repo
REPO = 'foo/bar'  # format is username/repo

ISSUES_FOR_REPO_URL = 'https://github.enterprise.com/api/v3/repos/%s/issues' % REPO

# add the def write_issues() here

r = requests.get(ISSUES_FOR_REPO_URL, params=params_payload, headers=headers)
@davedyk

This comment has been minimized.

Copy link

davedyk commented Nov 12, 2014

Also, like @mblackstock, I had trouble with the page=x incrementing. I'm not really sure what that pages dictionary is doing, but I just copied and pasted it into both code blocks, and it seems to work now. Like this:

if 'link' in r.headers:
    pages = dict(
        [(rel[6:-1], url[url.index('<')+1:-1]) for url, rel in
            [link.split(';') for link in
                r.headers['link'].split(',')]])
    while 'last' in pages and 'next' in pages:
        pages = dict(
            [(rel[6:-1], url[url.index('<')+1:-1]) for url, rel in
                [link.split(';') for link in
                    r.headers['link'].split(',')]])
        r = requests.get(pages['next'], headers=headers)

For some reason it doesn't seem to work when I put it only in the while loop, nor does it work when it is only outside of the while loop. Anybody want to chime in with a description of what is going on there?

@js9045

This comment has been minimized.

Copy link

js9045 commented Nov 12, 2014

First thanks for the code. It goes into an infinite loop if there are more than two pages. Right now it only works for me if there is only a single page of output.

I made the following change and I can walk through the pages now:

#more pages? examine the 'link' header returned
if 'link' in r.headers:
    pages = dict(
        [(rel[6:-1], url[url.index('<')+1:-1]) for url, rel in
            [link.split(';') for link in
                r.headers['link'].split(',')]])
    print "***"
    print pages
    while 'last' in pages and 'next' in pages:
        print pages['next']
        r = requests.get(pages['next'], auth=OAUTH)
        write_issues(r)
        if pages['next'] == pages['last']:
            break
        pages = dict(
        [(rel[6:-1], url[url.index('<')+1:-1]) for url, rel in
            [link.split(';') for link in
                r.headers['link'].split(',')]])

and get this

{'last': 'https://api.github.com/repositories/24953865/issues?state=all&page=7', 'next': 'https://api.github.com/repositories/24953865/issues?state=all&page=2'}
https://api.github.com/repositories/24953865/issues?state=all&page=2
https://api.github.com/repositories/24953865/issues?state=all&page=3
https://api.github.com/repositories/24953865/issues?state=all&page=4
https://api.github.com/repositories/24953865/issues?state=all&page=5
https://api.github.com/repositories/24953865/issues?state=all&page=6
https://api.github.com/repositories/24953865/issues?state=all&page=7
@js9045

This comment has been minimized.

Copy link

js9045 commented Nov 12, 2014

Also, I'm by no means a python expert, but should the function use its parameter, instead of the global variable r. As in:

def write_issues(response):
    "output a list of issues to csv"
    if not response.status_code == 200:
        raise Exception(response.status_code)
    for issue in response.json():
        csvout.writerow([issue['number'], issue['title'].encode('utf-8'), issue['created_at'], issue['updated_at']])
@ppyordanov

This comment has been minimized.

Copy link

ppyordanov commented Dec 14, 2014

Hello, can this be extended to save images as well?

@lev-dev

This comment has been minimized.

Copy link

lev-dev commented Jan 23, 2015

@js9045, I was thinking about the same issue. 'response' is an argument, but it is never used inside write_issues(). Your version makes sense to me.

@markjd84

This comment has been minimized.

Copy link

markjd84 commented Apr 13, 2015

Thanks original author and @js9045 ...worth adding a couple of tiny things in case anyone comes here and has same setup as me:

Python 3

You might get some kind of message such as TypeError: 'str' does not support the buffer interface. Change your writer line to (thanks StackO): ...open(csvfile, 'w', newline=''))

Issue types

I needed all issues, not just open. Simply add an argument:

ISSUES_FOR_REPO_URL = 'https://api.github.com/repos/%s/issues' % REPO
ARGS = "?state=all"

...
r = requests.get(ISSUES_FOR_REPO_URL + ARGS, auth=AUTH)
@hkbai

This comment has been minimized.

Copy link

hkbai commented May 8, 2015

It seems to run fine, except it only adds "id,Title,Body,Created At,Updated At" to the csv. How should i fix this?

@boombatower

This comment has been minimized.

Copy link

boombatower commented Sep 25, 2015

Since this is high up in search results... If you're just looking for a json dump (presumably with credentials since public can be done in browser):

#!/bin/bash

repo=$1
filename=$(echo "$repo.json" | tr / -)
echo "Dumping $1 to $filename..."
echo
echo

# remove -u if not private
curl -u "user:pass" \
  "https://api.github.com/repos/$1/issues?per_page=1000&state=all" \
  > $filename

Note you can set per_page to avoid needing to check headers and do multiple requests in most cases.

@cbonilla20

This comment has been minimized.

Copy link

cbonilla20 commented Jan 11, 2016

Nice @boombatower!

@bojanpopic

This comment has been minimized.

Copy link

bojanpopic commented Jan 29, 2016

Thanks @boombatower. However, your script tries to get 1000 items and Github API have max 100 items per page. So, it didn't work for me with 500+ issues. I modified it to use pagination. It's super dirty, but it worked. Here is the gist https://gist.github.com/bojanpopic/2c3025d2952844de1dd0

@awneil

This comment has been minimized.

Copy link

awneil commented May 19, 2016

I tried removing the 'Client Requested' - per @Pinwheeler, 24 Jan 2014

And copying the 'dict' part - per @davedyk, 12 Nov 2014

And adding 'state=all' - per @markjd84, 13 Apr 2015

And adding per_page=100 (or 1000) - per @boombatower, 25 Sep 2015, and @bojanpopic, 29 Jan 2016.

But none of it gives me a complete list of all issues.

What am I missing?

@awneil

This comment has been minimized.

Copy link

awneil commented May 19, 2016

It looks like the 2nd (and subsequent?) call(s) to write_issues() do not manage to parse the JSON - so they don't find any issues to put into the CSV ?

I did try using the argument - per @js9045,12 Nov 2014 - but that didn't help.

@jithingangadharan

This comment has been minimized.

Copy link

jithingangadharan commented Jun 12, 2016

Thank you for sharing this code and how can i retrieve issues from private Repository ?

@Billy-

This comment has been minimized.

Copy link

Billy- commented Sep 21, 2016

Hi All. I was running this script (thanks @unbracketed) and found that it would stop writing the data at an odd point. I found that it was because the csv file was not being .close()'d. I have fixed that and also made another couple of changes:

  • It also outputs the response json into a file (this was mainly for debugging, and seeing what data is available to me). This feature is not fully working, each request appends a new json object to the file, which makes the file invalid json. As I say it was just for debugging, so this was not a problem for me
  • It only writes rows which are issues (as opposed to pull requests)
  • It appends a total number of issues on the last row
  • Prints some useful information as it processes the issues

See my fork here

@Kebiled

This comment has been minimized.

Copy link

Kebiled commented Sep 26, 2016

I expanded on @billy's fork by adding @mblackstock's solution to ensure the while loop breaks and stops reiterating over the second page.

I also added a file called export_multi_repo_issues_to_csv.py which contains a repository list so you can export issues from multiple repositories into seperate csv files.

Here's my fork.

@ukreddy-erwin

This comment has been minimized.

Copy link

ukreddy-erwin commented Oct 25, 2016

Can anyone suggest how to export issues from GitHub Enterprise to Public GitHub

@Jammizzle

This comment has been minimized.

Copy link

Jammizzle commented Nov 2, 2016

I expanded on @Kebiled's fork by adding the ability to also include ZenHub's API to include 'Estimate Value' and 'Pipeline' and also include a list of Labels and assignees. I'm not sure how many people use ZenHub but the fork is here if anyone does end up wanting to use it .
Might seem a bit funny making one request per issue for the ZenHub API, that's just the

@patrickfuller

This comment has been minimized.

Copy link

patrickfuller commented Nov 8, 2016

Yet another fork here.

  • Prompts for username/pass instead of raw text
  • Repositories passed as arguments
  • Python3 / PEP8

Usage: python github_to_csv.py --help.

@axd1967

This comment has been minimized.

Copy link

axd1967 commented Dec 22, 2016

(off-topic: this is about gists)

it's striking how much changes are proposed, but how few actually end up somewhere in the associated Git repo... users here are apparently supposed to type in those changes by hand? I guess... maybe Git could help?

I'm a gist n00b, but I can't understand why all the comments suggesting code changes are not accessible as e.g. branches or SHA-1 references (this would require commenters to start by forking, then applying changes, then sharing those changes!).

Just to have an idea I took the trouble to clone this gist, and added a handful of forks as remotes.

fks

for example, @davedyk (I just picked a random commenter) proposes changes, but didn't fork...

(if anyone knows of places where Gist issues are discussed, let me know: a bit similar to https://github.com/dear-github/dear-github)

@mfunk

This comment has been minimized.

Copy link

mfunk commented Mar 1, 2017

Thank you Brian! This snippet helped make writing up known issues for release notes sooooo much easier.

@marcelkornblum

This comment has been minimized.

Copy link

marcelkornblum commented Mar 8, 2017

https://gist.github.com/marcelkornblum/21be3c13b2271d1d5a89bf08cbfa500e

Another fork if it's useful to anyone.

The basic functionality is the same, but reorganised into clearer methods. I've added the various snippets people suggested in the early comments, meaning

  • you can use username/pass or token auth
  • set filters for results (including for labels which is more efficient than the original approach)
  • I added labels to the CSV output
  • Pagination is more clearly handled

Tested on python 2.7

Hope this is useful to someone and thanks @unbracketed

@sshaw

This comment has been minimized.

Copy link

sshaw commented Jul 23, 2017

Here's something else (in Ruby) to export pull requests and issues to a CSV file. Supports GitLab and Bitbucket too: https://github.com/sshaw/export-pull-requests

@abevoelker

This comment has been minimized.

Copy link

abevoelker commented Sep 2, 2017

This script comes up high in Google results for certain queries but it's pretty limited in that it only exports the initial issue, not issue comments.

My goal was to backup GitHub data for an organization, and this project worked a lot better for that purpose: https://github.com/josegonzalez/python-github-backup It also lets you back up issue comments, issue events, PRs, PR review comments, wikis, etc.

@Vravi123

This comment has been minimized.

Copy link

Vravi123 commented Sep 4, 2017

Hi ,
I am trying to export zenhub issues to csv and using the below code

REPO = ''
url = "https://github.ibm.com/Webtrans/EOSD-ISA-LocalApps/issues/json?issues=%s" %(REPO)

response = requests.get(url,auth=AUTH)
response.json() --- here i am getting the below error :
JSONDecodeError Traceback (most recent call last)
in ()
----> 1 response.json()

C:\Anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
883 # used.
884 pass
--> 885 return complexjson.loads(self.text, **kwargs)
886
887 @Property

C:\Anaconda3\lib\json_init_.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder

C:\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
337
338 """
--> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340 end = _w(s, end).end()
341 if end != len(s):

C:\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

can any one pls help on this

@jschristie

This comment has been minimized.

Copy link

jschristie commented Sep 5, 2017

Hello all,

First Thanks for the code, while i try to run this code, i get the below error, could someone please tell me why i get this error.

Traceback (most recent call last):
File "export-issues.py", line 33, in
write_issues(r)
File "export-issues.py", line 21, in write_issues
raise Exception(r.status_code)
Exception: 401

And also, i use python 3.5 and python 3.6 and i get the same error.

and i use python export-issues.py command in Command prompt

Any help on this would be great

@amacfie

This comment has been minimized.

Copy link

amacfie commented Oct 26, 2017

@jschristie usually that means an incorrect password

@simard57

This comment has been minimized.

Copy link

simard57 commented Nov 9, 2017

I am running on Windows 7 machine.
the import requests (line 7) is reporting the module is not found! I just installed Python from python.org -- is there a library I need to get as well?

[update]
I found instructions for requests.py @ http://docs.python-requests.org/en/master/user/install/#install and installed it. I then ran
python.exe getIssues.py and got

Traceback (most recent call last):
File "getIssues.py", line 30, in
csvout.writerow(('id', 'Title', 'Body', 'Created At', 'Updated At'))
TypeError: a bytes-like object is required, not 'str'

@damithc

This comment has been minimized.

Copy link

damithc commented Dec 2, 2017

Traceback (most recent call last):
File "getIssues.py", line 30, in
csvout.writerow(('id', 'Title', 'Body', 'Created At', 'Updated At'))
TypeError: a bytes-like object is required, not 'str'

@simard57 I ran into the same problem. I suspect it is an incompatibility between python 2 and 3.
Try using this (worked for me),

csvout = csv.writer(open(csvfile, 'w', newline=''))

instead of this:

csvout = csv.writer(open(csvfile, 'wb'))
@PatMcCarthy

This comment has been minimized.

Copy link

PatMcCarthy commented Jan 2, 2018

WRT the script github_to_csv.py, and others here....
So, BLEEDING EDGE NEWBIE here (I can code in everything from COBOL to C#, But today is my first attempt at Python)
Download to windows & Install - smooooth
Copied Python script and ran it.... ummm...
I am getting kicked due to
import requests
ModuleNotFoundError: No module named 'requests'
So..... where can I find this module???
In case it helps: Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on win32

@PatMcCarthy

This comment has been minimized.

Copy link

PatMcCarthy commented Jan 2, 2018

Update... Found Requests.
Wish to add it....
Found Install dox...
To install Requests, simply:
.. code-block:: bash
$ pip install requests
@^%^%$!*)@^
Satisfaction guaranteed.

So...

  1. Install Python
  2. Run the following: python -m pip install requests
  3. Run the script, as described above
  4. at this point, YMMV....
@kirankodali

This comment has been minimized.

Copy link

kirankodali commented Oct 29, 2018

I am getting the below error, please advise what might be the issue

Traceback (most recent call last):
File git_issues.py", line 31, in
write_issues(r)
File git_issues.py", line 19, in write_issues
raise Exception(r.status_code)
Exception: 404

Process finished with exit code 1

@Craigfis

This comment has been minimized.

Copy link

Craigfis commented Aug 14, 2019

Doesn't work with two-factor auth. I ended up just using curl.

@DavidMCook

This comment has been minimized.

@gavinr

This comment has been minimized.

Copy link

gavinr commented Apr 19, 2020

This is a good python script. Thanks for posting it. Here is this concept wrapped in a CLI tool:
https://github.com/gavinr/github-csv-tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.