Skip to content

Instantly share code, notes, and snippets.

View josephmisiti's full-sized avatar

Joseph Misiti josephmisiti

View GitHub Profile
(function(){const t=document.createElement("link").relList;if(t&&t.supports&&t.supports("modulepreload"))return;for(const l of document.querySelectorAll('link[rel="modulepreload"]'))r(l);new MutationObserver(l=>{for(const i of l)if(i.type==="childList")for(const o of i.addedNodes)o.tagName==="LINK"&&o.rel==="modulepreload"&&r(o)}).observe(document,{childList:!0,subtree:!0});function n(l){const i={};return l.integrity&&(i.integrity=l.integrity),l.referrerPolicy&&(i.referrerPolicy=l.referrerPolicy),l.crossOrigin==="use-credentials"?i.credentials="include":l.crossOrigin==="anonymous"?i.credentials="omit":i.credentials="same-origin",i}function r(l){if(l.ep)return;l.ep=!0;const i=n(l);fetch(l.href,i)}})();function Zu(e){return e&&e.__esModule&&Object.prototype.hasOwnProperty.call(e,"default")?e.default:e}var Gu={exports:{}},nl={},Xu={exports:{}},I={};/**
* @license React
* react.production.min.js
*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found
#root{max-width:1280px;margin:0 auto;padding:2rem}#scrubbing-container{position:relative}.label-item{display:flex;align-items:center;border:1px dotted #888;justify-content:space-between;margin-top:5px}.label-item.selected{background:rgb(178,255,178)}.label-item button{margin:3px}.export{border:1px solid rgb(0,0,52);background:rgb(114,114,233)}.export:hover{background:rgb(30,30,236);color:#fff}.blue-hover{color:#7272e9;fill:#7272e9}.blue-hover:hover{color:#1e1eec;fill:#1e1eec;cursor:pointer}.missing{box-shadow:0 0 10px 5px #ff5959}.validation-error-modal{width:100%;height:100%;position:absolute;z-index:999;top:0;left:0;display:flex;justify-content:center;align-items:center;background-color:#f0f0f0e6;border:2px solid rgba(100,100,100,.5);pointer-events:all;background-image:url(
@josephmisiti
josephmisiti / AnkiMultipleChoiceTemplate.html
Created July 12, 2019 16:07 — forked from hgiesel/AnkiMultipleChoiceBackTemplate.html
A Multiple Choice Template for Anki Cards
<script>
// MULTIPLE CHOICE TEMPLATE v1.2 {{{
// https://gist.github.com/hgiesel/2e8361afccca5713414a6a4ee66b7ece
const query = 'div#thecard'
const colors = ['orange', 'olive', 'maroon', 'aqua', 'fuchsia', 'navy', 'lime']
const fieldPadding = '4px'
const syntax = {
openDelim: '(^',
closeDelim: '^)',
@josephmisiti
josephmisiti / gist:10489023
Last active March 2, 2023 13:36
Install Cloudera Impala On Ubuntu 12.04
  1. update apt-get
cd /etc/apt/sources.list.d/
wget http://archive.cloudera.com/impala/ubuntu/precise/amd64/impala/cloudera.list
sudo apt-get update
  1. Install Impala [found here [1]]
@josephmisiti
josephmisiti / README.md
Last active June 14, 2022 02:00 — forked from mango314/README.md
a map of all 2166 Census Tracts of New York City in Python Matplotlib

Census Tracts of New York City

Here at PyData NYC, I heard a tutorial of how to use numpy and iPython notebooks. In a previous gist, I wrote drew all the zip codes of the Bronx in d3.js

This would be great for reproducing inforgraphics like Educational Attainment in New York City -- Brooklyn which looks a bit like a jigsaw puzzle:

Where to Obtain the Data

@josephmisiti
josephmisiti / helloevolve.py
Created November 11, 2016 18:19
helloevolve.py - a simple genetic algorithm in Python
"""
helloevolve.py implements a genetic algorithm that starts with a base
population of randomly generated strings, iterates over a certain number of
generations while implementing 'natural selection', and prints out the most fit
string.
The parameters of the simulation can be changed by modifying one of the many
global variables. To change the "most fit" string, modify OPTIMAL. POP_SIZE
controls the size of each generation, and GENERATIONS is the amount of
generations that the simulation will loop through before returning the fittest
@josephmisiti
josephmisiti / chunkify.js
Created March 14, 2017 13:33 — forked from woollsta/chunkify.js
Fixes an issue with Google Chrome Speech Synthesis where long texts pause mid-speaking. The function takes in a speechUtterance object and intelligently chunks it into smaller blocks of text that are stringed together one after the other. Basically, you can play any length of text. See http://stackoverflow.com/questions/21947730/chrome-speech-sy…
/**
* Chunkify
* Google Chrome Speech Synthesis Chunking Pattern
* Fixes inconsistencies with speaking long texts in speechUtterance objects
* Licensed under the MIT License
*
* Peter Woolley and Brett Zamir
*/
var speechUtteranceChunker = function (utt, settings, callback) {
{'landmark_name': None, 'street': '857 ocean rd', 'zipcode': '11932', 'city': 'bridgehampton', 'state': 'ny', 'po_box': '', 'full_address': '857 Ocean Rd, Bridgehampton, NY 11932, USA', 'from_email': 'joseph.misiti@gmail.com', 'time_to_fire_station_units': 1.41, 'time_to_fire_station': 4, 'distance_to_coast': 4545, 'distance_to_fire_station': 1.41, 'elevation': 6.028863906860352, 'fema_flood_zone': 'X', 'number_of_buildings': 0, 'year_built': '2000', 'roof_type': None, 'roof_cover': None, 'building_area': 1544, 'land_use': 'Single Family Residential', 'basement': 'Full Basement', 'neptune_quote_response': {'error': False, 'data': {'userID': None, 'brokerId': '9C0CD433-CC15-4053-8579-581F51B11D50', 'agentNo': 'FL0001', 'producerName': None, 'producerLicense': None, 'isDirectToConsumer': False, 'quoteNumber': 'NY0197APZQ3R9', 'status': 'approved', 'isBound': False, 'dateBound': '0001-01-01T00:00:00', 'isLocationBound': False, 'link': 'https://uat.neptuneflood.com/agent-hub/#/quote/NY0197APZQ3R9/auth/eyJhbGciOiJ

Text Classification

To demonstrate text classification with Scikit Learn, we'll build a simple spam filter. While the filters in production for services like Gmail will obviously be vastly more sophisticated, the model we'll have by the end of this chapter is effective and surprisingly accurate.

Spam filtering is the "hello world" of document classification, but something to be aware of is that we aren't limited to two classes. The classifier we will be using supports multi-class classification, which opens up vast opportunities like author identification, support email routing, etc… However, in this example we'll just stick to two classes: SPAM and HAM.

For this exercise, we'll be using a combination of the Enron-Spam data sets and the SpamAssassin public corpus. Both are publicly available for download and are retreived from the internet during the setup phase of the example code that goes with this chapter.

Loading Examples

def get_quote(self, address, city, state, zipcode):
""" """
if not self.__token:
self.get_token_bearer()
assert len(state) == 2
resp = requests.request(
"POST",
'%s/Services/API/v1/Quote' % (self.__base_url),