-
-
Save unbracketed/3380407 to your computer and use it in GitHub Desktop.
""" | |
Exports Issues from a specified repository to a CSV file | |
Uses basic authentication (Github username + password) to retrieve Issues | |
from a repository that username has access to. Supports Github API v3. | |
""" | |
import csv | |
import requests | |
GITHUB_USER = '' | |
GITHUB_PASSWORD = '' | |
REPO = '' # format is username/repo | |
ISSUES_FOR_REPO_URL = 'https://api.github.com/repos/%s/issues' % REPO | |
AUTH = (GITHUB_USER, GITHUB_PASSWORD) | |
def write_issues(response): | |
"output a list of issues to csv" | |
if not r.status_code == 200: | |
raise Exception(r.status_code) | |
for issue in r.json(): | |
labels = issue['labels'] | |
for label in labels: | |
if label['name'] == "Client Requested": | |
csvout.writerow([issue['number'], issue['title'].encode('utf-8'), issue['body'].encode('utf-8'), issue['created_at'], issue['updated_at']]) | |
r = requests.get(ISSUES_FOR_REPO_URL, auth=AUTH) | |
csvfile = '%s-issues.csv' % (REPO.replace('/', '-')) | |
csvout = csv.writer(open(csvfile, 'wb')) | |
csvout.writerow(('id', 'Title', 'Body', 'Created At', 'Updated At')) | |
write_issues(r) | |
#more pages? examine the 'link' header returned | |
if 'link' in r.headers: | |
pages = dict( | |
[(rel[6:-1], url[url.index('<')+1:-1]) for url, rel in | |
[link.split(';') for link in | |
r.headers['link'].split(',')]]) | |
while 'last' in pages and 'next' in pages: | |
r = requests.get(pages['next'], auth=AUTH) | |
write_issues(r) | |
if pages['next'] == pages['last']: | |
break |
Thanks original author and @js9045 ...worth adding a couple of tiny things in case anyone comes here and has same setup as me:
Python 3
You might get some kind of message such as TypeError: 'str' does not support the buffer interface
. Change your writer line to (thanks StackO): ...open(csvfile, 'w', newline=''))
Issue types
I needed all issues, not just open. Simply add an argument:
ISSUES_FOR_REPO_URL = 'https://api.github.com/repos/%s/issues' % REPO
ARGS = "?state=all"
...
r = requests.get(ISSUES_FOR_REPO_URL + ARGS, auth=AUTH)
It seems to run fine, except it only adds "id,Title,Body,Created At,Updated At" to the csv. How should i fix this?
Since this is high up in search results... If you're just looking for a json dump (presumably with credentials since public can be done in browser):
#!/bin/bash
repo=$1
filename=$(echo "$repo.json" | tr / -)
echo "Dumping $1 to $filename..."
echo
echo
# remove -u if not private
curl -u "user:pass" \
"https://api.github.com/repos/$1/issues?per_page=1000&state=all" \
> $filename
Note you can set per_page
to avoid needing to check headers and do multiple requests in most cases.
Nice @boombatower!
Thanks @boombatower. However, your script tries to get 1000 items and Github API have max 100 items per page. So, it didn't work for me with 500+ issues. I modified it to use pagination. It's super dirty, but it worked. Here is the gist https://gist.github.com/bojanpopic/2c3025d2952844de1dd0
I tried removing the 'Client Requested' - per @Pinwheeler, 24 Jan 2014
And copying the 'dict' part - per @davedyk, 12 Nov 2014
And adding 'state=all' - per @markjd84, 13 Apr 2015
And adding per_page=100 (or 1000) - per @boombatower, 25 Sep 2015, and @bojanpopic, 29 Jan 2016.
But none of it gives me a complete list of all issues.
What am I missing?
It looks like the 2nd (and subsequent?) call(s) to write_issues() do not manage to parse the JSON - so they don't find any issues to put into the CSV ?
I did try using the argument - per @js9045,12 Nov 2014 - but that didn't help.
Thank you for sharing this code and how can i retrieve issues from private Repository ?
Hi All. I was running this script (thanks @unbracketed) and found that it would stop writing the data at an odd point. I found that it was because the csv file was not being .close()
'd. I have fixed that and also made another couple of changes:
- It also outputs the response json into a file (this was mainly for debugging, and seeing what data is available to me). This feature is not fully working, each request appends a new json object to the file, which makes the file invalid json. As I say it was just for debugging, so this was not a problem for me
- It only writes rows which are issues (as opposed to pull requests)
- It appends a total number of issues on the last row
- Prints some useful information as it processes the issues
See my fork here
I expanded on @billy's fork by adding @mblackstock's solution to ensure the while loop breaks and stops reiterating over the second page.
I also added a file called export_multi_repo_issues_to_csv.py which contains a repository list so you can export issues from multiple repositories into seperate csv files.
Here's my fork.
Can anyone suggest how to export issues from GitHub Enterprise to Public GitHub
I expanded on @Kebiled's fork by adding the ability to also include ZenHub's API to include 'Estimate Value' and 'Pipeline' and also include a list of Labels and assignees. I'm not sure how many people use ZenHub but the fork is here if anyone does end up wanting to use it .
Might seem a bit funny making one request per issue for the ZenHub API, that's just the
Yet another fork here.
- Prompts for username/pass instead of raw text
- Repositories passed as arguments
- Python3 / PEP8
Usage: python github_to_csv.py --help
.
(off-topic: this is about gists)
it's striking how much changes are proposed, but how few actually end up somewhere in the associated Git repo... users here are apparently supposed to type in those changes by hand? I guess... maybe Git could help?
I'm a gist n00b, but I can't understand why all the comments suggesting code changes are not accessible as e.g. branches or SHA-1 references (this would require commenters to start by forking, then applying changes, then sharing those changes!).
Just to have an idea I took the trouble to clone this gist, and added a handful of forks as remotes.
for example, @davedyk (I just picked a random commenter) proposes changes, but didn't fork...
(if anyone knows of places where Gist issues are discussed, let me know: a bit similar to https://github.com/dear-github/dear-github)
Thank you Brian! This snippet helped make writing up known issues for release notes sooooo much easier.
https://gist.github.com/marcelkornblum/21be3c13b2271d1d5a89bf08cbfa500e
Another fork if it's useful to anyone.
The basic functionality is the same, but reorganised into clearer methods. I've added the various snippets people suggested in the early comments, meaning
- you can use username/pass or token auth
- set filters for results (including for labels which is more efficient than the original approach)
- I added labels to the CSV output
- Pagination is more clearly handled
Tested on python 2.7
Hope this is useful to someone and thanks @unbracketed
Here's something else (in Ruby) to export pull requests and issues to a CSV file. Supports GitLab and Bitbucket too: https://github.com/sshaw/export-pull-requests
This script comes up high in Google results for certain queries but it's pretty limited in that it only exports the initial issue, not issue comments.
My goal was to backup GitHub data for an organization, and this project worked a lot better for that purpose: https://github.com/josegonzalez/python-github-backup It also lets you back up issue comments, issue events, PRs, PR review comments, wikis, etc.
Hi ,
I am trying to export zenhub issues to csv and using the below code
REPO = ''
url = "https://github.ibm.com/Webtrans/EOSD-ISA-LocalApps/issues/json?issues=%s" %(REPO)
response = requests.get(url,auth=AUTH)
response.json() --- here i am getting the below error :
JSONDecodeError Traceback (most recent call last)
in ()
----> 1 response.json()
C:\Anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
883 # used.
884 pass
--> 885 return complexjson.loads(self.text, **kwargs)
886
887 @Property
C:\Anaconda3\lib\json_init_.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder
C:\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
337
338 """
--> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340 end = _w(s, end).end()
341 if end != len(s):
C:\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
can any one pls help on this
Hello all,
First Thanks for the code, while i try to run this code, i get the below error, could someone please tell me why i get this error.
Traceback (most recent call last):
File "export-issues.py", line 33, in
write_issues(r)
File "export-issues.py", line 21, in write_issues
raise Exception(r.status_code)
Exception: 401
And also, i use python 3.5 and python 3.6 and i get the same error.
and i use python export-issues.py command in Command prompt
Any help on this would be great
@jschristie usually that means an incorrect password
I am running on Windows 7 machine.
the import requests (line 7) is reporting the module is not found! I just installed Python from python.org -- is there a library I need to get as well?
[update]
I found instructions for requests.py @ http://docs.python-requests.org/en/master/user/install/#install and installed it. I then ran
python.exe getIssues.py and got
Traceback (most recent call last):
File "getIssues.py", line 30, in
csvout.writerow(('id', 'Title', 'Body', 'Created At', 'Updated At'))
TypeError: a bytes-like object is required, not 'str'
Traceback (most recent call last):
File "getIssues.py", line 30, in
csvout.writerow(('id', 'Title', 'Body', 'Created At', 'Updated At'))
TypeError: a bytes-like object is required, not 'str'
@simard57 I ran into the same problem. I suspect it is an incompatibility between python 2 and 3.
Try using this (worked for me),
csvout = csv.writer(open(csvfile, 'w', newline=''))
instead of this:
csvout = csv.writer(open(csvfile, 'wb'))
WRT the script github_to_csv.py, and others here....
So, BLEEDING EDGE NEWBIE here (I can code in everything from COBOL to C#, But today is my first attempt at Python)
Download to windows & Install - smooooth
Copied Python script and ran it.... ummm...
I am getting kicked due to
import requests
ModuleNotFoundError: No module named 'requests'
So..... where can I find this module???
In case it helps: Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on win32
Update... Found Requests.
Wish to add it....
Found Install dox...
To install Requests, simply:
.. code-block:: bash
$ pip install requests
@^%^%$!*)@^
Satisfaction guaranteed.
So...
- Install Python
- Run the following: python -m pip install requests
- Run the script, as described above
- at this point, YMMV....
I am getting the below error, please advise what might be the issue
Traceback (most recent call last):
File git_issues.py", line 31, in
write_issues(r)
File git_issues.py", line 19, in write_issues
raise Exception(r.status_code)
Exception: 404
Process finished with exit code 1
Doesn't work with two-factor auth. I ended up just using curl.
Forked for Python3: https://gist.github.com/DavidMCook/b31a6721c06c184ed1f2e898ec4e3561
This is a good python script. Thanks for posting it. Here is this concept wrapped in a CLI tool:
https://github.com/gavinr/github-csv-tools
@js9045, I was thinking about the same issue. 'response' is an argument, but it is never used inside write_issues(). Your version makes sense to me.