Skip to content

Instantly share code, notes, and snippets.

@Katsumata420
Katsumata420 / jaccard_nlp.py
Created December 25, 2023 07:36
Jaccard Score for NLP (with Weighted)
import os
from typing import List
from collections import defaultdict
import fugashi
import ipadic
class JaccardNLP:
def __init__(self):
@Katsumata420
Katsumata420 / additional_save_wandb.py
Last active November 22, 2023 08:40
Run open-llm-leaderboard at local
"""追加で実施した lm-evaluation-harness の結果を wandb に Upload する
注意事項:
- batch_size, commit_id は、lm-evaluation-harness の実行時のものを指定すること
- is_write_out もできれば lm-evaluation-harness の実行時のものを指定すること
- average は追加したタスクを反映させた結果が上書きされる
- artifact は追加で実施した lm-evaluation-harness の結果のみ Upload される(ただし、以前に実行した結果がローカルに残っている場合は、それも Upload される)
- 古い結果は wandb の UI 上で version を選択して確認する
"""
import argparse
"""
Optuna example that optimizes a classifier configuration for Iris dataset using sklearn.
In this example, we optimize a classifier configuration for Iris dataset. Classifiers are from
scikit-learn. We optimize both the choice of classifier (among SVC and RandomForest) and their
hyperparameters.
"""
import mlflow
"""
Optuna example that optimizes a classifier configuration for Iris dataset using sklearn.
In this example, we optimize a classifier configuration for Iris dataset. Classifiers are from
scikit-learn. We optimize both the choice of classifier (among SVC and RandomForest) and their
hyperparameters.
"""
import mlflow
@Katsumata420
Katsumata420 / run_jswag.py
Last active June 15, 2022 02:31
run_swag.py (transformers v4.19.4) for JGLUE (https://github.com/yahoojapan/JGLUE)
#!/usr/bin/env python
# coding=utf-8
# Copyright The HuggingFace Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@Katsumata420
Katsumata420 / run_jglue.py
Last active June 14, 2022 03:03
run_glue.py (transformers v4.19.4) for JGLUE (https://github.com/yahoojapan/JGLUE)
#!/usr/bin/env python
# coding=utf-8
# Copyright 2020 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@Katsumata420
Katsumata420 / brat2conll2003.py
Last active July 1, 2020 01:31
A python script to convert annotated data in standoff format (brat format) into the BIO1 format (conll2003 format) for NER training
import argparse
import os
import re
"""
convert brat2conll2003 (IOB1)
input:
input_text: brat text file; same basename + '.ann' is used as annotation file.
output_file: output file path converted to conll format
if not given, use input_text basename + '.conll03'