Charin cstorm125

## gist:4572eeed9ae02338cde98e3eaae4e26c
"<human>: เล่าเรื่องมังกรให้ฟังหน่อย <bot>: ",
"<human>: อธิบายเกี่ยวกับพระราชาของฝรั่งเศสในปี 2019 <bot>: ",
"<human>: ลิสต์ชื่อจังหวัดทุกจังหวัดในประเทศไทย <bot>: ",
"<human>: แต่งกลอนเกี่ยวกับแมวให้หน่อย <bot>: ",
"<context>: I went to Suankularb School. <human>: แปลเป็นภาษาไทย <bot>: ",
"<context>: ชื่อ AIS เป็นที่รู้จักกันเป็นอย่างดีในฐานะผู้ให้บริการโทรคมนาคมอันดับหนึ่งของไทย แต่หลายปีที่ผ่านมา AIS ก็ได้ทำ Transformation ปรับเปลี่ยนแนวทางของบริษัทกลายเป็น Digitial Life Service Provider ที่ลูกค้าจะสามารถเลือกใช้บริการจาก AIS ได้หลากหลายยิ่งขึ้น ไม่ใช่เพียงแค่การสื่อสารที่เป็นท่อเชื่อมต่อไปยังบริการอื่นๆ เท่านั้น แต่บริการต่างๆ ที่สำคัญในยุคดิจิทัลก็สามารถเลือกใช้ AIS ได้เสมอ และขั้นต่อไปของ AIS คือการก้าวไปสู่องค์กรโทรคมนาคมอัจฉริยะ หรือ Cognitive Tech-Co เป็นการขยายบริการ จากเดิมที่เป็นผู้ให้บริการโทรศัพท์สู่การเป็นผู้ให้บริการที่รู้ว่าลูกค้าต้องการอะไรหรือมีปัญหาการใช้งานอย่างไรบ้าง ผ่านการใช้งานเทคโนโลยีอย่าง AI, Data Analytics, Intelligent IT, และ Autonomous Network <human>: สรุปสั้นๆให้ฟังหน่อย <bot

## th_common_voice70.py
# coding=utf-8
# Copyright 2021 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software

## fake_berthai.ipynb

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                cstorm125
                / fake_berthai.ipynb
            
            
              Created
              October 14, 2020 15:20
            
              
                fake_berthai.ipynb
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tokenizer_train.py
from pathlib import Path

from tokenizers import ByteLevelBPETokenizer

paths = [str(x) for x in Path("cleaned_data/oscar").glob("**/*.txt")]

# Initialize a tokenizer
tokenizer = ByteLevelBPETokenizer()

# Customize training

## gist:c4096b32512678269e3170b2b97cbd77
def get_class_weights(label_freq, mu=1., return_log=False):
    total = np.sum(list(label_freq.values()))
    keys = label_freq.keys()
    class_weight = dict()
    class_weight_log = dict()

    for key in keys:
        score = total / float(label_freq[key])
        score_log = np.log(mu * score)
        class_weight[key] = round(score, 2) if score > 1.0 else 1.0
	"<human>: เล่าเรื่องมังกรให้ฟังหน่อย <bot>: ",
	"<human>: อธิบายเกี่ยวกับพระราชาของฝรั่งเศสในปี 2019 <bot>: ",
	"<human>: ลิสต์ชื่อจังหวัดทุกจังหวัดในประเทศไทย <bot>: ",
	"<human>: แต่งกลอนเกี่ยวกับแมวให้หน่อย <bot>: ",
	"<context>: I went to Suankularb School. <human>: แปลเป็นภาษาไทย <bot>: ",
	"<context>: ชื่อ AIS เป็นที่รู้จักกันเป็นอย่างดีในฐานะผู้ให้บริการโทรคมนาคมอันดับหนึ่งของไทย แต่หลายปีที่ผ่านมา AIS ก็ได้ทำ Transformation ปรับเปลี่ยนแนวทางของบริษัทกลายเป็น Digitial Life Service Provider ที่ลูกค้าจะสามารถเลือกใช้บริการจาก AIS ได้หลากหลายยิ่งขึ้น ไม่ใช่เพียงแค่การสื่อสารที่เป็นท่อเชื่อมต่อไปยังบริการอื่นๆ เท่านั้น แต่บริการต่างๆ ที่สำคัญในยุคดิจิทัลก็สามารถเลือกใช้ AIS ได้เสมอ และขั้นต่อไปของ AIS คือการก้าวไปสู่องค์กรโทรคมนาคมอัจฉริยะ หรือ Cognitive Tech-Co เป็นการขยายบริการ จากเดิมที่เป็นผู้ให้บริการโทรศัพท์สู่การเป็นผู้ให้บริการที่รู้ว่าลูกค้าต้องการอะไรหรือมีปัญหาการใช้งานอย่างไรบ้าง ผ่านการใช้งานเทคโนโลยีอย่าง AI, Data Analytics, Intelligent IT, และ Autonomous Network <human>: สรุปสั้นๆให้ฟังหน่อย <bot
	# coding=utf-8
	# Copyright 2021 The HuggingFace Datasets Authors and the current dataset script contributor.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software
	from pathlib import Path

	from tokenizers import ByteLevelBPETokenizer

	paths = [str(x) for x in Path("cleaned_data/oscar").glob("*/.txt")]

	# Initialize a tokenizer
	tokenizer = ByteLevelBPETokenizer()

	# Customize training
	def get_class_weights(label_freq, mu=1., return_log=False):
	total = np.sum(list(label_freq.values()))
	keys = label_freq.keys()
	class_weight = dict()
	class_weight_log = dict()

	for key in keys:
	score = total / float(label_freq[key])
	score_log = np.log(mu * score)
	class_weight[key] = round(score, 2) if score > 1.0 else 1.0