This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Генерация датасета для интерпретации вопросов: | |
- что анализирую? | |
- изображение | я анализирую изображение | |
30-08-2020 Добавлена проверка, что сгенерированный сэмпл не имеется в датасетах qa.txt и interpretations.txt | |
28-04-2021 Доработки для получения сырого датасета из assertions_1s.txt | |
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import io | |
import itertools | |
import pickle | |
import collections | |
import glob | |
import os | |
import tqdm | |
import torch | |
from transformers import GPT2LMHeadModel, GPT2Tokenizer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Эксперимент с моделью несимметричного автоэнкодера с энкодером на базе претренированной модели BERT | |
""" | |
import io | |
import os | |
import random | |
import numpy as np | |
import sklearn.model_selection |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
""" | |
Использование модели BERT, натренированной кодом train_bert.py, в качестве энкодера | |
в автоэнкодерной модели. | |
Для экспериментов по изучению зависимости качества декодирования от сложности BERT. | |
""" | |
import random | |
import numpy as np |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
""" | |
Тренировка BERT с заданной конфигурацией на небольшом корпусе | |
""" | |
import numpy as np | |
import keras | |
import os | |
import collections | |
import pickle |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
========== dialog_id=1 ========== | |
B: Привет, давай поболтаем? | |
H: горячо приветствую | |
B: что будем обсуждать? | |
H: давай тебя | |
B: Я готова отвечать на вопросы о себе. | |
H: ты кто? | |
B: я чатбот | |
B: Скажи, пожалуйста, кто ты? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
B: День добрый | |
H: кто ты? | |
B: я чатбот | |
B: Скажи, пожалуйста, кто ты? | |
H: ты чатбот? | |
B: ведь , честно говоря , я тоже являюсь чатботом ! | |
H: ты человек? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding: utf-8 | |
import pandas as pd | |
import os | |
import numpy as np | |
import logging | |
import gc | |
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Тренировка модели детектора синонимичности двух фраз с использованием претренированной модели BERT на PyTorch | |
09.03.2020 Добавлен расчет метрики MRR (mean reciprocal rank) | |
""" | |
import pandas as pd | |
import numpy as np | |
import random |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" Тренировка модели детектора синонимичности двух фраз (сиамская рекуррентная сетка) на PyTorch """ | |
import io | |
import pandas as pd | |
import numpy as np | |
import itertools | |
import random | |
import tqdm | |
from sklearn.model_selection import train_test_split |