Skip to content

Instantly share code, notes, and snippets.

A Few Useful Things to Know about Machine Learning

The paper presents some key lessons and "folk wisdom" that machine learning researchers and practitioners have learnt from experience and which are hard to find in textbooks.

1. Learning = Representation + Evaluation + Optimization

All machine learning algorithms have three components:

  • Representation for a learner is the set if classifiers/functions that can be possibly learnt. This set is called hypothesis space. If a function is not in hypothesis space, it can not be learnt.
  • Evaluation function tells how good the machine learning model is.
  • Optimisation is the method to search for the most optimal learning model.
@EtzAlex
EtzAlex / updating.R
Created July 25, 2015 22:40
Updating priors via likelihood
shotData<- c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1,
0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0)
#figure 1 from blog, likelihood curve for 58/100 shots
x = seq(.001, .999, .001) ##Set up for creating the distributions
y2 = dbeta(x, 1 + 58, 1 + 42) # data for likelihood curve, plotted as the posterior from a beta(1,1)
@honnibal
honnibal / theano_mlp_small.py
Last active March 1, 2023 15:10
Stripped-down example of Multi-layer Perceptron MLP in Theano
"""A stripped-down MLP example, using Theano.
Based on the tutorial here: http://deeplearning.net/tutorial/mlp.html
This example trims away some complexities, and makes it easier to see how Theano works.
Design changes:
* Model compiled in a distinct function, so that symbolic variables are not in run-time scope.
* No classes. Network shown by chained function calls.
@adewes
adewes / README.md
Last active February 13, 2024 16:39
Ebay Ads - Bot. Because who wants to write messages by hand...

To use this bot:

  • Download ads_bot.py and requirements.txt.
  • Type pip install -r requirements.txt to install the requirements.
  • Fill out the required information in the Python file.
  • Ideally, create a (free) Slack account and set up a web hook to receive notifications from the bot.
  • Run the script :)
  • Relax and be ready to answer incoming calls :D
base_data_sources = make_dict({
'business': BusinessDataSource(clientUS),
'money': USMoneyDataSource(clientUS),
'technology': TechnologyDataSource(clientUS),
'sport': SportUSDataSource(clientUS),
'comment': CommentIsFreeDataSource(clientUS),
'culture': CultureDataSource(clientUS),
'top_stories': TopStoriesDataSource(clientUS),
'video': VideoDataSource(clientUS),
})
@peterkuma
peterkuma / server.py
Created February 10, 2014 14:22
A flask server for serving all JPEG images in a local directory and subdirectories. Uses unveil.js for lazy-loading of images. Thumbnails are created on the fly by PIL.
#!/bin/python
import os
from flask import Flask, Response, request, abort, render_template_string, send_from_directory
import Image
import StringIO
app = Flask(__name__)
WIDTH = 1000
@lebedov
lebedov / multi_pycuda_ipc_demo.py
Created September 2, 2013 00:10
Allocate a GPUArray in one process and access it in another process using IPC handles.
#!/usr/bin/env python
"""
Allocate a GPUArray in one process and access it in another process using IPC
handles.
"""
import multiprocessing as mp
import numpy as np
import zmq
@AlexandreAbraham
AlexandreAbraham / unsupervised_alt.py
Last active February 5, 2023 23:00
These are two implementations of the silhouette score. They are compatible with the scikit learn implementation but offers different drawbacks in term of complexity and memory usage. The slow version needs no memory but is painfully slow and should, I think, not be used. The second one is based on a block strategy: distance between samples and c…
""" Unsupervised evaluation metrics. """
# License: BSD Style.
from itertools import combinations
import numpy as np
from sklearn.utils import check_random_state
from sklearn.metrics.pairwise import distance_metrics
from sklearn.metrics.pairwise import pairwise_distances
@jboner
jboner / latency.txt
Last active May 10, 2024 14:27
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD