Skip to content

Instantly share code, notes, and snippets.

@stefan-it
Last active September 5, 2023 20:44
Show Gist options
  • Save stefan-it/c746ed3562a9b5162f8229724d136975 to your computer and use it in GitHub Desktop.
Save stefan-it/c746ed3562a9b5162f8229724d136975 to your computer and use it in GitHub Desktop.
Flair Base Model Detector
import flair
import os
import pickle
import shutil
from huggingface_hub import login, HfApi
from flair.models import SequenceTagger
from flair.embeddings import StackedEmbeddings, TransformerWordEmbeddings
from pathlib import Path
# Please adjust!
flair.cache_root = Path("/mnt/datasets/.flair")
errors = []
def determine_base_model(flair_model_name: str) -> str:
try:
tagger = SequenceTagger.load(flair_model_name)
if isinstance(tagger.embeddings, StackedEmbeddings):
for embedding in tagger.embeddings.embeddings:
if isinstance(embedding, TransformerWordEmbeddings):
return embedding.model.name_or_path
elif isinstance(tagger.embeddings, TransformerWordEmbeddings):
return tagger.embeddings.model.name_or_path
except Exception as e:
error_message = f"Could not parse Flair Model {flair_model_name} :" + str(e)
errors.append(error_message)
print(error_message)
return ""
hf_token = os.environ.get("HF_TOKEN")
login(token=hf_token, add_to_git_credential=True)
api = HfApi()
base_model_mapping = {}
for flair_model in api.list_models(filter="flair"):
print("Detecting base model for:", flair_model.modelId)
base_model = determine_base_model(flair_model.modelId)
if not base_model:
continue
print("Detected Base model for", flair_model.modelId, "is:", base_model)
base_model_mapping[flair_model.modelId] = base_model
with open("base_model_mapping.pkl", "wb") as f_out:
pickle.dump(base_model_mapping, f_out, pickle.HIGHEST_PROTOCOL)
with open("errors.pkl", "wb") as f_out:
pickle.dump(errors, f_out, pickle.HIGHEST_PROTOCOL)
@stefan-it
Copy link
Author

stefan-it commented Sep 5, 2023

🤗 Flair Base Model Detector

In order to get the script working, the following things needs to be configured:

  • HF_TOKEN environment variable: just visit Access Tokens site and copy your Access Token. Then set it via export HF_TOKEN="<your-token>" on commandline (add leading space so it won't occur in your shell history).
  • flair.cache_root: set it to e.g. a NAS storage

05.09.2023, 22:31: Stats

The script was initially executed. Runtime was ~1.5 hours and it downloaded ~85GB of data.

Base Model Table can be created with the following code:

from tabulate import tabulate

with open("base_model_mapping.pkl", "rb") as f_p:
    base_model_mapping_loaded = pickle.load(f_p)

headers = ["Model ID", "Base Model ID"]

rows = []

for model_id, base_model_id in base_model_mapping_loaded.items():
    rows.append([f"[{model_id}](https://huggingface.co/{model_id})", f"[{base_model_id}](https://huggingface.co/{base_model_id})"])

print(tabulate(rows, headers=headers, tablefmt="github"))

It then outputs:

Model ID Base Model ID
Wikidepia/SB-AutoSegment microsoft/xtremedistil-l12-h384-uncased
amtam0/timer-ner-en distilroberta-base
amtam0/timer-ner-fr camembert-base
dbmdz/flair-clef-hipe-german-base dbmdz/bert-base-german-europeana-cased
dbmdz/flair-distilbert-ner-germeval14 distilbert-base-german-cased
flair/ner-dutch-large xlm-roberta-large
flair/ner-english-large xlm-roberta-large
flair/ner-english-ontonotes-large xlm-roberta-large
flair/ner-german-large xlm-roberta-large
flair/ner-spanish-large xlm-roberta-large
qanastek/pos-french-camembert-flair camembert-base
lirondos/anglicisms-spanish-flair-cs sagorsarker/codeswitch-spaeng-lid-lince
Saisam/Inquirer_ner xlm-roberta-large
Saisam/Inquirer_ner_loc xlm-roberta-large
dbmdz/flair-hipe-2022-ajmc-all dbmdz/bert-base-historic-multilingual-cased
abid/indonesia-bioner xlm-roberta-base
helpmefindaname/mini-sequence-tagger-conll03 hf-internal-testing/tiny-random-bert
lighthousefeed/yoda-ner bert-base-multilingual-cased
philschmid/flair-ner-english-ontonotes-large xlm-roberta-large
beki/flair-pii-distilbert distilbert-base-cased
GuiGel/beto-uncased-flert-context-we-finetune-meddocan dccuchile/bert-base-spanish-wwm-cased
UGARIT/flair_grc_multi_ner UGARIT/grc-alignment
UGARIT/flair_grc_bert_ner pranaydeeps/Ancient-Greek-BERT
GuiGel/beto-uncased-flert-finetune-meddocan dccuchile/bert-base-spanish-wwm-cased
GuiGel/beto-uncased-flert-lstm-crf-meddocan dccuchile/bert-base-spanish-wwm-cased
GuiGel/xlm-roberta-large-flert-we-finetune-meddocan xlm-roberta-large
GuiGel/xlm-roberta-large-flert-finetune-meddocan xlm-roberta-large
GuiGel/beto-uncased-flert-context-we-lstm-crf-meddocan dccuchile/bert-base-spanish-wwm-cased
PooryaPiroozfar/Flair-Persian-NER HooshvareLab/bert-base-parsbert-uncased
Jecenia/anglicism-custom-handler sagorsarker/codeswitch-spaeng-lid-lince
aymurai/flair-ner-spanish-judicial dccuchile/bert-base-spanish-wwm-cased
matijap/hera TamedWicked/AddressBERT
AhmedTaha012/Hadith-ner Davlan/xlm-roberta-base-finetuned-arabic
Nara-Lab/History_NER bert-base-multilingual-cased
Geor111y/flair-ner-addresses-extractor cointegrated/rubert-tiny2
aehrm/redewiedergabe-direct lkonle/fiction-gbert-large
aehrm/redewiedergabe-indirect lkonle/fiction-gbert-large
aehrm/redewiedergabe-reported lkonle/fiction-gbert-large
aehrm/redewiedergabe-freeindirect lkonle/fiction-gbert-large
aehrm/droc-character-recognizer lkonle/fiction-gbert-large
hmteams/flair-hipe-2022-ajmc-en hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-hipe-2022-ajmc-de hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-hipe-2022-ajmc-fr hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-hipe-2022-newseye-fi hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-hipe-2022-newseye-sv hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-icdar-nl hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-icdar-fr hmteams/teams-base-historic-multilingual-discriminator
skulick/xlmb-ck05-yid1 skulick/xlmb-ybc-ck05
hmteams/flair-hipe-2022-letemps-fr hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-hipe-2022-topres19th-en hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-hipe-2022-newseye-de hmteams/teams-base-historic-multilingual-discriminator
hmteams/flair-hipe-2022-newseye-fr hmteams/teams-base-historic-multilingual-discriminator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment