Skip to content

Instantly share code, notes, and snippets.

@kLiHz
Last active July 29, 2023 11:27
Show Gist options
  • Save kLiHz/488114f1d53e9e53e12b021b230c464a to your computer and use it in GitHub Desktop.
Save kLiHz/488114f1d53e9e53e12b021b230c464a to your computer and use it in GitHub Desktop.
Parse `tree` command txt output

Steps to reproduce:

Download azw3.txt, epub.txt, mobi.txt and pdf.txt at https://al.chirmyram.com/doc/平台/zlibrary-cn/.

Then run

python3 .\parse.py azw3
python3 .\parse.py epub
python3 .\parse.py mobi
python3 .\parse.py pdf

to get azw3.csv, epub.csv, mobi.csv and pdf.csv.

Use

head -n 1 azw3.csv > combined.csv 
tail -n+2 -q azw3.csv >> combined.csv
tail -n+2 -q epub.csv >> combined.csv
tail -n+2 -q mobi.csv >> combined.csv
tail -n+2 -q pdf.csv >> combined.csv

to combine these four csv.

Upload the documents to melisearch:

curl \
  -X POST 'http://127.0.0.1:7700/indexes/zlibcn/documents?primaryKey=id' \
  -H 'Content-Type: text/csv' \
  -H 'Authorization: Bearer \.@^_^@./' \
  --data-binary @combined.csv
import sys
import csv
import uuid
if len(sys.argv) < 2:
print('Please specify the filename.')
exit()
t = sys.argv[1]
print(f"Reading: '{t}'")
f = open(f'./{t}.txt')
o = open(f'./{t}.csv', 'w')
w = csv.DictWriter(o, ["id", "type", "title", "path1"])
w.writeheader()
d = 0
p = []
for line in f:
line = line[:-1]
pos = max(line.find('├── '), line.find('└── '))
pos += 1 if pos == -1 else 4
nd = pos // 4 + 1
n = line[pos:]
k = False
l = n.split(' -> ')
if len(l) > 1:
n = l[0]
k = True
if nd > d:
p.append(n)
d = nd
elif nd == d:
p[-1] = n
else:
p.pop()
p[-1] = n
d = nd
if k:
w.writerow({
"id": str(uuid.uuid5(uuid.NAMESPACE_OID, f'zlib-cn-index-{t}:{n}')),
"type": t,
"title": n,
"path1": 'https://al.chirmyram.com/doc/平台/zlibrary-cn/' + '/'.join(p),
})
f.close()
o.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment