Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python | |
| # -*- coding: utf-8 -*- | |
| from argparse import ArgumentParser | |
| import torch | |
| import torch.distributed as dist | |
| from torch.nn.parallel import DistributedDataParallel as DDP | |
| from torch.utils.data import DataLoader, Dataset | |
| from torch.utils.data.distributed import DistributedSampler | |
| from transformers import BertForMaskedLM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 🌞 Morning 131 commits ███████▏░░░░░░░░░░░░░ 34.3% | |
| 🌆 Daytime 179 commits █████████▊░░░░░░░░░░░ 46.9% | |
| 🌃 Evening 69 commits ███▊░░░░░░░░░░░░░░░░░ 18.1% | |
| 🌙 Night 3 commits ▏░░░░░░░░░░░░░░░░░░░░ 0.8% |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from itertools import cycle | |
| from lxml import html | |
| from requests import Response, Session | |
| class RotatingProxySession(Session): | |
| def __init__(self) -> None: | |
| super().__init__() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # coding: utf-8 | |
| from itertools import cycle | |
| from pathlib import Path | |
| from typing import Optional | |
| from lxml import html | |
| from requests import Response, Session | |
| class RotatingUASession(Session): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| from concurrent.futures import Future, ThreadPoolExecutor | |
| from functools import partial | |
| from itertools import repeat | |
| from typing import Any, Dict, List, Optional, Union | |
| from requests import Response, Session | |
| class ConcurrentSession(Session): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/sh | |
| set -e | |
| LG=$1 | |
| WIKI_DUMP_NAME=${LG}wiki-latest-pages-articles.xml.bz2 | |
| WIKI_DUMP_DOWNLOAD_URL=https://dumps.wikimedia.org/${LG}wiki/latest/$WIKI_DUMP_NAME | |
| # download latest Wikipedia dump in chosen language | |
| echo "Downloading the latest $LG-language Wikipedia dump from $WIKI_DUMP_DOWNLOAD_URL..." | |
| wget -c $WIKI_DUMP_DOWNLOAD_URL |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import csv | |
| from pathlib import Path | |
| from typing import Optional, Union, Sequence, List, Literal | |
| import pandas as pd | |
| from tqdm import tqdm | |
| def read_csv( | |
| file: Path, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/sh | |
| set -e | |
| WIKI_DUMP_FILE_IN=$1 | |
| WIKI_DUMP_FILE_OUT=${WIKI_DUMP_FILE_IN%%.*}.txt | |
| # clone the WikiExtractor repository | |
| git clone https://github.com/attardi/wikiextractor.git | |
| # extract and clean the chosen Wikipedia dump |
NewerOlder