Skip to content

Instantly share code, notes, and snippets.

View ErikTromp's full-sized avatar

Erik Tromp ErikTromp

View GitHub Profile
@ErikTromp
ErikTromp / train_emotions.py
Created March 9, 2020 07:56
text categorization with spacy-transformers
#!/usr/bin/env python
import plac
import re
import random
import json
import pandas as pd
from pathlib import Path
from collections import Counter
import thinc.extra.datasets
import spacy
@ErikTromp
ErikTromp / gist:6c904e64386114566c3e0c1ed2abe424
Created February 25, 2020 09:27
sample_dutch_emojis.csv
;id;text;hand_over_mouth;sun_with_face;satisfied;cry;two_hearts;upside_down_face;grimacing;innocent;partying_face;flushed;relaxed;sunny;slightly_smiling_face;smirk;pout;yum;female_sign;scream;smile;pray;star_struck;zany_face;tada;smiley;blue_heart;crossed_fingers;roll_eyes;muscle;point_down;grinning;sob;ok_hand;christmas_tree;hugs;sunglasses;smiling_face_with_three_hearts;stuck_out_tongue_winking_eye;see_no_evil;kissing_heart;heart;sweat_smile;thinking;grin;four_leaf_clover;blush;thumbsup;rofl;wink;heart_eyes;joy;labels
343631;343632;Bij mij is een poosje geleden een paar dingen mis gegaan met pampers maandbox. Netjes opgelost door de klantenservice en later ontving ik een pakketje met een knuffelkonijn voor mijn dochter 👌;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
286659;286660;Teleurgesteld😢🤨;0;0;0;1;0;0;0;0;0;0
@ErikTromp
ErikTromp / doc2vec.scala
Last active October 27, 2020 17:33
doc2vec DL4j
def train(dataset: List[String], labels: Option[List[String]]) = {
val tokenizer: TokenizerFactory = new DefaultTokenizerFactory()
val labelsUsed = collection.mutable.ListBuffer.empty[String]
// Create the labeled documents, unique label for each document
val docs = dataset.zipWithIndex.map(docWithIndex => {
val label = labels match {
case Some(lbl) => lbl(docWithIndex._2)
case None => "SENT_" + docWithIndex._2
}