Skip to content

Instantly share code, notes, and snippets.

@jofish
Last active April 13, 2018 17:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jofish/f3fc7ed57a24bd41c403d3cd0859a96e to your computer and use it in GitHub Desktop.
Save jofish/f3fc7ed57a24bd41c403d3cd0859a96e to your computer and use it in GitHub Desktop.
MOSS: extracting import / include statements to figure out what external tools we use
so first you want to download all the code from mozilla's repo
(there may be better ways to do this, like using the search api interface, but let's be dumb for now)
right now i'm doing this by just pulling all the python files as an example
mkdir mozilla-github
cd mozilla-github
curl -i https://api.github.com/search/code?q=include+in:file+org:mozilla+language:python python -c $'import json, sys, os\nfor repo in json.load(sys.stdin): os.system("git clone --depth 1" + repo["clone_url"])'
[see here for discussion of the right thing there: https://gist.github.com/caniszczyk/3856584]
[warning, it's pretty big, as you might expect. several gigs.]
now here's some python code to pull out all the packages people are importing *from* or just *importing*
packages = {}
stdpackages = []
for line in open("stdpackages.txt"):stdpackages+=[line.strip()] #thi
from operator import itemgetter
for line in open("imports.txt"):
try:
line=line.strip()
if line[0:6]=="import":
package=line.split(' ')[1]
if line.strip()[0:4]=="from":
package=line.split(' ')[1].split('.')[0]
#if package!="": print package, ":", line
if '.' in package:package=package.split('.')[0]
if package not in stdpackages: package="* "+package
if package in packages:
packages[package]+=1
else:
packages[package]=1
#print package, packages[package]
except:
#print "!!!!!!!",line
pass
package="" #and reset it
for k, v in sorted(packages.items(), key=itemgetter(1)):
print k + ",", v
this is the file i called
#stdpackages.txt
__future__
__main__
_dummy_thread
_thread
a
abc
aifc
argparse
array
ast
asynchat
asyncio
asyncore
atexit
audioop
b
base64
bdb
binascii
binhex
bisect
builtins
bz2
c
calendar
cgi
cgitb
chunk
cmath
cmd
code
codecs
codeop
collections
colorsys
compileall
concurrent
configparser
contextlib
copy
copyreg
cProfile
crypt 
csv
ctypes
curses 
d
datetime
dbm
decimal
difflib
dis
distutils
doctest
dummy_threading
e
email
encodings
ensurepip
enum
errno
f
faulthandler
fcntl 
filecmp
fileinput
fnmatch
formatter
fpectl 
fractions
ftplib
functools
g
gc
getopt
getpass
gettext
glob
grp 
gzip
h
hashlib
heapq
hmac
html
http
i
imaplib
imghdr
imp
importlib
inspect
io
ipaddress
itertools
j
json
k
keyword
l
lib2to3
linecache
locale
logging
lzma
m
macpath
mailbox
mailcap
marshal
math
mimetypes
mmap
modulefinder
msilib 
msvcrt 
multiprocessing
n
netrc
nis 
nntplib
numbers
o
operator
optparse
os
ossaudiodev 
p
parser
pathlib
pdb
pickle
pickletools
pipes 
pkgutil
platform
plistlib
poplib
posix 
pprint
profile
pstats
pty 
pwd 
py_compile
pyclbr
pydoc
q
queue
quopri
r
random
re
readline 
reprlib
resource 
rlcompleter
runpy
s
sched
secrets
select
selectors
shelve
shlex
shutil
signal
site
smtpd
smtplib
sndhdr
socket
socketserver
spwd 
sqlite3
ssl
stat
statistics
string
stringprep
struct
subprocess
sunau
symbol
symtable
sys
sysconfig
syslog 
t
tabnanny
tarfile
telnetlib
tempfile
termios 
test
textwrap
threading
time
timeit
tkinter
token
tokenize
trace
traceback
tracemalloc
tty 
turtle
turtledemo
types
typing
u
unicodedata
unittest
urllib
uu
uuid
v
venv
w
warnings
wave
weakref
webbrowser
winreg 
winsound 
wsgiref
x
xdrlib
xml
xmlrpc
z
zipapp
zipfile
zipimport
zlib
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment