Skip to content

Instantly share code, notes, and snippets.

View duylebkHCM's full-sized avatar
🎯
Focusing

Lê Anh Duy duylebkHCM

🎯
Focusing
View GitHub Profile
@duylebkHCM
duylebkHCM / book_splitter.py
Last active December 8, 2023 16:27
Automatically split ICDAR proceedings into separated papers
import re
import fitz
from fitz import Page
import argparse
import pandas as pd
from pathlib import Path
from collections import defaultdict
EXCLUDE_KEYWORD = [