Skip to content

Instantly share code, notes, and snippets.

@jwest75674
jwest75674 / parse_cc_index.py
Created November 5, 2019 22:33 — forked from snakers4/parse_cc_index.py
Plain common crawl pre-processing
import gc
import gzip
import time
import json
import shutil
import os,sys
import tldextract
import collections
import pandas as pd
from tqdm import tqdm