Skip to content

Instantly share code, notes, and snippets.

View CrazyLionHeart's full-sized avatar

Aleksandr Sytar CrazyLionHeart

  • Sberbank
  • Moscow
View GitHub Profile
@CrazyLionHeart
CrazyLionHeart / Documentation.md
Created November 27, 2016 19:49 — forked from KartikTalwar/Documentation.md
Rsync over SSH - (40MB/s over 1GB NICs)

The fastest remote directory rsync over ssh archival I can muster (40MB/s over 1gb NICs)

This creates an archive that does the following:

rsync (Everyone seems to like -z, but it is much slower for me)

  • a: archive mode - rescursive, preserves owner, preserves permissions, preserves modification times, preserves group, copies symlinks as symlinks, preserves device files.
  • H: preserves hard-links
  • A: preserves ACLs
@CrazyLionHeart
CrazyLionHeart / Linux Static IP
Created November 25, 2016 06:51 — forked from fernandoaleman/Linux Static IP
How To Configure Static IP On CentOS 6
## Configure eth0
#
# vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
NM_CONTROLLED="yes"
ONBOOT=yes
HWADDR=A4:BA:DB:37:F1:04
TYPE=Ethernet
BOOTPROTO=static
#!/bin/bash
################################
# OS X Install ISO Creater #
# #
# Author: shela #
################################
#######################################
# Declarations
@CrazyLionHeart
CrazyLionHeart / install-gradle-centos.sh
Created August 12, 2016 09:15 — forked from parzonka/install-gradle-centos.sh
Install gradle on redhat/centos linux
# installs to /opt/gradle
# existing versions are not overwritten/deleted
# seamless upgrades/downgrades
# $GRADLE_HOME points to latest *installed* (not released)
gradle_version=2.9
wget -N https://services.gradle.org/distributions/gradle-${gradle_version}-all.zip
sudo unzip -foq gradle-${gradle_version}-all.zip -d /opt/gradle
sudo ln -sfn gradle-${gradle_version} /opt/gradle/latest
sudo printf "export GRADLE_HOME=/opt/gradle/latest\nexport PATH=\$PATH:\$GRADLE_HOME/bin" > /etc/profile.d/gradle.sh
. /etc/profile.d/gradle.sh
@CrazyLionHeart
CrazyLionHeart / gist:5813802
Created June 19, 2013 12:11
Функция поиска заданной улицы внутри Москвы и Московской области
CREATE OR REPLACE FUNCTION search_street(IN street text)
RETURNS TABLE(aoguid uuid, fulladdress text, postalcode integer) AS
$BODY$
WITH RECURSIVE child_to_parents AS (
SELECT aoguid, format('{"%s": %s}', fias_addr_obj.shortname, formalname) AS fulladdress,
postalcode, parentguid, aolevel
FROM fias_addr_obj
WHERE currstatus = 0
AND (regioncode = '50' OR regioncode = '77')
AND formalname ILIKE '%' || format('%s', street) || '%'
@CrazyLionHeart
CrazyLionHeart / gist:5813733
Created June 19, 2013 11:58
Поиск полигона улицы с домом внутри Москвы или Московской области по Openstreet Map
SELECT search_street.way, format('%s, %s, %s', mo_m.fullname, all_other.fullname, search_street.fullname) as fullname
FROM
(
SELECT way, name as fullname
FROM planet_osm_polygon
WHERE admin_level = '4' AND (name = 'Москва' OR name = 'Московская область')
AND boundary = 'administrative'
) as mo_m
INNER JOIN
(
@CrazyLionHeart
CrazyLionHeart / XMLtoJSON.py
Created September 5, 2012 14:41 — forked from smihica/XMLtoJSON.py
Xml to JSON UTF-8 parser-converter in Python.
#!/usr/bin/env python -S
# -*- coding: utf-8 -*-
import sys
import re
import xml.sax
import io # for 2.6
import StringIO # for 3.0
#
# ** If your python is 2.x and xml-cording is utf-8 set follows.
@CrazyLionHeart
CrazyLionHeart / 1_netatalk-3-install-on-ubuntu-14.04.sh
Last active September 15, 2015 20:52 — forked from mAAdhaTTah/1_netatalk-3-install-on-ubuntu-14.04.sh
Shell script to install Netatalk 3 on Ubuntu 14.04
# Get root:
sudo su
# Install prerequisites:
apt-get install build-essential pkg-config checkinstall git avahi-daemon libavahi-client-dev libcrack2-dev libwrap0-dev autotools-dev automake libtool libdb-dev libacl1-dev libdb5.1-dev db-util db5.1-util libgcrypt11 libgcrypt11-dev
# Build libevent from source:
cd /usr/local/src
#!/usr/bin/python
import os
import hashlib
import getpass
import base64
password1 = None
password2 = None

Text Classification

To demonstrate text classification with Scikit Learn, we'll build a simple spam filter. While the filters in production for services like Gmail will obviously be vastly more sophisticated, the model we'll have by the end of this chapter is effective and surprisingly accurate.

Spam filtering is the "hello world" of document classification, but something to be aware of is that we aren't limited to two classes. The classifier we will be using supports multi-class classification, which opens up vast opportunities like author identification, support email routing, etc… However, in this example we'll just stick to two classes: SPAM and HAM.

For this exercise, we'll be using a combination of the Enron-Spam data sets and the SpamAssassin public corpus. Both are publicly available for download and are retreived from the internet during the setup phase of the example code that goes with this chapter.

Loading Examples