Skip to content

Instantly share code, notes, and snippets.

Created May 23, 2022 08:55
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save titomus/266d819990b21579bd3078ab75fe1f64 to your computer and use it in GitHub Desktop.
Tokenize js
function tokenize(txt) {
// on sépare en phrases pour avoir quelques points de départ dans la génération
let tokens = [];
const sentences = txt.split(/\n/gim).filter((x) => x);
// on tokenize chaque phrase en splitant les mots
for (let i = 0; i < sentences.length; i++) {
// on insert un START
let tks = sentences[i].match(/\S+/gim).filter((x) => x); => tokens.push(token));
return tokens;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment