Skip to content

Instantly share code, notes, and snippets.

@sangheestyle
Created March 18, 2014 04:58
Show Gist options
  • Save sangheestyle/9613795 to your computer and use it in GitHub Desktop.
Save sangheestyle/9613795 to your computer and use it in GitHub Desktop.
python: extract only 'bug fixed' desc
import json
import cld
input_json_file = 'apps_public_m_t_desc.json'
output_bugs_desc_file = 'bugs_desc.txt'
with open(input_json_file) as fp:
json_contents = []
bugs_desc = []
for line in fp.readlines():
contents = json.loads(line)
desc = contents['desc'].lower()
if ("bug fixes" in desc) or \
("bug fixed" in desc) or \
("fixed bug" in desc) or \
("known bug" in desc):
lang = cld.detect(desc.encode('utf-8'))
if lang[1] == 'en' and len(lang[4]) == 1:
json_contents.append(line)
bugs_desc.append(desc)
print "total:", len(json_contents)
with open(output_bugs_desc_file, "wb") as fp:
for line in bugs_desc:
fp.write("%s\n" % line.encode('utf8'))
$ mongoexport -h guitarxx.cs.xxx.xxx -d apps -c public -f m,t,desc -o apps_public_m_t__desc.json
@sangheestyle
Copy link
Author

Number of description

a. total: 201,830
b. filter(bug fixes, bug fixed, fixed bug, know bug): 791
c. b by english only: 639

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment