Skip to content

Instantly share code, notes, and snippets.

View linzeqipku's full-sized avatar

Zeqi Lin linzeqipku

  • Microsoft Research
  • Beijing
View GitHub Profile
@linzeqipku
linzeqipku / docx_to_html.py
Last active June 9, 2021 02:08
convert .docx files to .html files
import mammoth
from zipfile import BadZipFile
import os
path='E:/dc/data/docx'
html_path='E:/dc/data/html'
def gci(filepath):
files = os.listdir(filepath)
for fi in files:
@linzeqipku
linzeqipku / readme.md
Last active February 16, 2018 04:56
Extract questions and answers from StackOverflow dump files by tags

Step 1: run so-splitter-Posts.py (config: srcPath, dstPath, tagsPattern)

Step 2: run so-splitter-PostLinks.py (config: srcPath, dstPath)