Skip to content

Instantly share code, notes, and snippets.

View Orbifold's full-sized avatar
🍀
Happy. Thinking. Understanding.

Francois Vanderseypen Orbifold

🍀
Happy. Thinking. Understanding.
View GitHub Profile
@Orbifold
Orbifold / KendoZeppelin.py
Created November 17, 2016 07:54
Using Kendo inside a Zeppelin section is simple.
%angular
<base href="http://demos.telerik.com/kendo-ui/gantt/index">
<style>html { font-size: 14px; font-family: Arial, Helvetica, sans-serif; }</style>
<title></title>
<link rel="stylesheet" href="http://kendo.cdn.telerik.com/2016.3.1028/styles/kendo.common-material.min.css" />
<link rel="stylesheet" href="http://kendo.cdn.telerik.com/2016.3.1028/styles/kendo.material.min.css" />
<link rel="stylesheet" href="http://kendo.cdn.telerik.com/2016.3.1028/styles/kendo.material.mobile.min.css" />
<script src="http://kendo.cdn.telerik.com/2016.3.1028/js/jquery.min.js"></script>
@Orbifold
Orbifold / LSTM MxNet.R
Created December 12, 2016 18:43
An LSTM neural network reproducing mini Shakespeare.
require(mxnet)
batch.size = 32
seq.len = 32
num.hidden = 16
num.embed = 16
num.lstm.layer = 1
num.round = 1
learning.rate= 0.1
wd=0.00001
clip_gradient=1
@Orbifold
Orbifold / train_ner.py
Created June 20, 2017 15:08
NER out-of-memory trainer for Dutch.
# http://nlpforhackers.io/training-ner-large-dataset/
# http://gmb.let.rug.nl/data.php
import os
from nltk import conlltags2tree
def to_conll_iob(annotated_sentence):
"""
`annotated_sentence` = list of triplets [(w1, t1, iob1), ...]
Transform a pseudo-IOB notation: O, PERSON, PERSON, O, O, LOCATION, O
@Orbifold
Orbifold / Gensim.py
Created December 12, 2016 06:52
Using Word2Vec experiment on the Bible.
from gensim.utils import simple_preprocess
tokenize = lambda x: simple_preprocess(x)
# tokenize("We can load the vocabulary from the JSON file, and generate a reverse mapping (from index to word, so that we can decode an encoded string if we want)?!")
import os
import json
import numpy as np
from gensim.models import Word2Vec
@Orbifold
Orbifold / LinearRegressionThreeWays.ipynb
Created November 4, 2016 06:12
Standard linear regression using TensorFlow, TFLearn and sklearn.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Orbifold
Orbifold / epochs.py
Created January 27, 2018 11:33
Running experiments to improve accuracy without grid-search or alike.
import argparse
import matplotlib.pyplot as plt
import numpy as np
from keras.layers.core import Dense
from keras.models import Sequential
from numpy import array
from scipy import signal
@Orbifold
Orbifold / LSTM_translation.py
Created January 27, 2018 12:37
Keras translation network.
# The [Anki repository](http://www.manythings.org/anki/) has a lot of sentence-pairs to learn a language and they are ideal to train a translation network.
# To judge the quality of a translation it helps to understand a bit both languages so in my case
# the [Dutch-English](http://www.manythings.org/anki/nld-eng.zip),
# [French-English](http://www.manythings.org/anki/fra-eng.zip)
# and [German-English](http://www.manythings.org/anki/deu-eng.zip) were perfect.
import string
import re
from pickle import dump
from unicodedata import normalize
@Orbifold
Orbifold / SemanticsWithPython.ipynb
Created March 13, 2018 08:23
An intro to using RDFLib and triples in Python
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Orbifold
Orbifold / Intends.ipynb
Created January 27, 2018 06:43
Mapping language to intends
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Intro

Imbalanced data typically refers to a problem with classification problems where the classes are not represented equally.For example, you may have a 2-class (binary) classification problem with 100 instances (rows). A total of 80 instances are labeled with Class-1 and the remaining 20 instances are labeled with Class-2. This is an imbalanced dataset and the ratio of Class-1 to Class-2 instances is 80:20 or more concisely 4:1. You can have a class imbalance problem on two-class classification problems as well as multi-class classification problems. Most techniques can be used on either.

Most classification data sets do not have exactly equal number of instances in each class, but a small difference often does not matter.

There are problems where a class imbalance is not just common, it is expected. For example, in datasets like those that characterize fraudulent transactions are imbalanced. The vast majority of the transactions will be in the “Not-Fraud” class and a very small minority will be