Skip to content

Instantly share code, notes, and snippets.

@pandanote-info
Created January 7, 2022 07:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pandanote-info/12062c080b972ba940a7dd4a5745407e to your computer and use it in GitHub Desktop.
Save pandanote-info/12062c080b972ba940a7dd4a5745407e to your computer and use it in GitHub Desktop.
転置インデックスもどきのデータをベクトル化するためのPython3のプログラム片。
with open(inputfile, encoding='utf-8') as fh:
freqlist = json.load(fh)
words = []
articleids = []
for k,v in freqlist.items():
words.append(k)
for vv in v:
a,f = vv.split(",")
aa = int(a)
if aa not in articleids:
articleids.append(aa)
articleids.sort()
alen = len(articleids)
wlen = len(words)
bow = lil_matrix((alen,wlen))
for k,v in freqlist.items():
wi = words.index(k)
for vv in v:
a,f = map(int,vv.split(","))
bow[articleids.index(a),wi] = f
nzelemnum = len(bow.nonzero()[0])
print("{0:d} {1:d} {2:d}".format(alen,wlen,nzelemnum))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment