Skip to content

Instantly share code, notes, and snippets.

View bgruszka's full-sized avatar

Blazej Gruszka bgruszka

View GitHub Profile
@bgruszka
bgruszka / sparks.py
Created November 17, 2011 06:42 — forked from stefanv/sparks.py
Command line sparks in Python
#!/usr/bin/python
# coding=utf-8
# Python version of Zach Holman's "spark"
# https://github.com/holman/spark
# by Stefan van der Walt <stefan@sun.ac.za>
"""
USAGE:
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD* 150,000 ns 0.15 ms
These weights are often combined into a tf-idf value, simply by multiplying them together. The best scoring words under tf-idf are uncommon ones which are repeated many times in the text, which lead early web search engines to be vulnerable to pages being stuffed with repeated terms to trick the search engines into ranking them highly for those keywords. For that reason, more complex weighting schemes are generally used, but tf-idf is still a good first step, especially for systems where no one is trying to game the system.
There are a lot of variations on the basic tf-idf idea, but a straightforward implementation might look like:
<?php
$tfidf = $term_frequency * // tf
log( $total_document_count / $documents_with_term, 2); // idf
?>
It's worth repeating that the IDF is the total document count over the count of the ones containing the term. So, if there were 50 documents in the collection, and two of them contained the term in question, the IDF would be 50/2 = 25. To be accurate, we s
"""An lxml Port of Nirmal Patel's port (http://nirmalpatel.com/fcgi/hn.py) of
Arc90's Readability to Python.
"""
import re
from lxml.html import fromstring, tostring
from lxml.html.clean import Cleaner
NEGATIVE = re.compile('comment|meta|footer|footnote|foot')
POSITIVE = re.compile('post|hentry|entry|content|text|body|article')
module.exports = (robot) ->
robot.respond /deploy to stage/i, (msg) ->
process.chdir('/your/dir')
doing = require('child_process').spawn 'phing', ['remotedeploy','-Denv=stage']
msg.send 'stage deployment request sent'
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
<!--
Documented at
http://linux.die.net/man/5/fonts-conf
To check font mapping run the command at terminal
$ fc-match 'helvetica Neue'

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname
@bgruszka
bgruszka / gitlab.conf
Last active August 29, 2015 14:26 — forked from sameersbn/gitlab.conf
Nginx reverse proxy configuration for GitLab
upstream gitlab {
server 172.17.42.1:10080 fail_timeout=0;
}
# let gitlab deal with the redirection
server {
listen 80;
server_name git.example.com;
server_tokens off;
root /dev/null;
@bgruszka
bgruszka / bongo.sh
Last active August 29, 2015 14:26 — forked from smashew/bongo.sh
This one works... Tested
LOADING=false
usage()
{
cat << EOF
usage: $0 [options] dbname
OPTIONS:
-h Show this help.
-l Load instead of export
@bgruszka
bgruszka / convert.php
Last active August 29, 2015 14:26 — forked from dantleech/convert.php
Script to convert symfony YAML translation file to XLIFF
<?php
// Script to convert Symfony YAML translation files to XLIFF.
//
// Will add a .xliff version of the given file in its directory.
//
// $ php convert.php path/to/MyBundle.en.yml
$file = $argv[1];