- update apt-get
cd /etc/apt/sources.list.d/
wget http://archive.cloudera.com/impala/ubuntu/precise/amd64/impala/cloudera.list
sudo apt-get update
- Install Impala [found here [1]]
(function(){const t=document.createElement("link").relList;if(t&&t.supports&&t.supports("modulepreload"))return;for(const l of document.querySelectorAll('link[rel="modulepreload"]'))r(l);new MutationObserver(l=>{for(const i of l)if(i.type==="childList")for(const o of i.addedNodes)o.tagName==="LINK"&&o.rel==="modulepreload"&&r(o)}).observe(document,{childList:!0,subtree:!0});function n(l){const i={};return l.integrity&&(i.integrity=l.integrity),l.referrerPolicy&&(i.referrerPolicy=l.referrerPolicy),l.crossOrigin==="use-credentials"?i.credentials="include":l.crossOrigin==="anonymous"?i.credentials="omit":i.credentials="same-origin",i}function r(l){if(l.ep)return;l.ep=!0;const i=n(l);fetch(l.href,i)}})();function Zu(e){return e&&e.__esModule&&Object.prototype.hasOwnProperty.call(e,"default")?e.default:e}var Gu={exports:{}},nl={},Xu={exports:{}},I={};/** | |
* @license React | |
* react.production.min.js | |
* | |
* Copyright (c) Facebook, Inc. and its affiliates. | |
* | |
* This source code is licensed under the MIT license found |
#root{max-width:1280px;margin:0 auto;padding:2rem}#scrubbing-container{position:relative}.label-item{display:flex;align-items:center;border:1px dotted #888;justify-content:space-between;margin-top:5px}.label-item.selected{background:rgb(178,255,178)}.label-item button{margin:3px}.export{border:1px solid rgb(0,0,52);background:rgb(114,114,233)}.export:hover{background:rgb(30,30,236);color:#fff}.blue-hover{color:#7272e9;fill:#7272e9}.blue-hover:hover{color:#1e1eec;fill:#1e1eec;cursor:pointer}.missing{box-shadow:0 0 10px 5px #ff5959}.validation-error-modal{width:100%;height:100%;position:absolute;z-index:999;top:0;left:0;display:flex;justify-content:center;align-items:center;background-color:#f0f0f0e6;border:2px solid rgba(100,100,100,.5);pointer-events:all;background-image:url(data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMTAwIiBoZWlnaHQ9IjEwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KICAgIDxsaW5lIHgxPSIwIiB5MT0iMCIgeDI9IjEwMCIgeTI9IjEwMCIgc3Ryb2tlPSJyZ2JhKDEwMCwxMDAsMTAwLDAuNSkiIHN0cm9rZS13aWR0aD0iMiIvPgog |
<script> | |
// MULTIPLE CHOICE TEMPLATE v1.2 {{{ | |
// https://gist.github.com/hgiesel/2e8361afccca5713414a6a4ee66b7ece | |
const query = 'div#thecard' | |
const colors = ['orange', 'olive', 'maroon', 'aqua', 'fuchsia', 'navy', 'lime'] | |
const fieldPadding = '4px' | |
const syntax = { | |
openDelim: '(^', | |
closeDelim: '^)', |
cd /etc/apt/sources.list.d/
wget http://archive.cloudera.com/impala/ubuntu/precise/amd64/impala/cloudera.list
sudo apt-get update
Here at PyData NYC, I heard a tutorial of how to use numpy and iPython notebooks. In a previous gist, I wrote drew all the zip codes of the Bronx in d3.js
This would be great for reproducing inforgraphics like Educational Attainment in New York City -- Brooklyn which looks a bit like a jigsaw puzzle:
""" | |
helloevolve.py implements a genetic algorithm that starts with a base | |
population of randomly generated strings, iterates over a certain number of | |
generations while implementing 'natural selection', and prints out the most fit | |
string. | |
The parameters of the simulation can be changed by modifying one of the many | |
global variables. To change the "most fit" string, modify OPTIMAL. POP_SIZE | |
controls the size of each generation, and GENERATIONS is the amount of | |
generations that the simulation will loop through before returning the fittest |
/** | |
* Chunkify | |
* Google Chrome Speech Synthesis Chunking Pattern | |
* Fixes inconsistencies with speaking long texts in speechUtterance objects | |
* Licensed under the MIT License | |
* | |
* Peter Woolley and Brett Zamir | |
*/ | |
var speechUtteranceChunker = function (utt, settings, callback) { |
{'landmark_name': None, 'street': '857 ocean rd', 'zipcode': '11932', 'city': 'bridgehampton', 'state': 'ny', 'po_box': '', 'full_address': '857 Ocean Rd, Bridgehampton, NY 11932, USA', 'from_email': 'joseph.misiti@gmail.com', 'time_to_fire_station_units': 1.41, 'time_to_fire_station': 4, 'distance_to_coast': 4545, 'distance_to_fire_station': 1.41, 'elevation': 6.028863906860352, 'fema_flood_zone': 'X', 'number_of_buildings': 0, 'year_built': '2000', 'roof_type': None, 'roof_cover': None, 'building_area': 1544, 'land_use': 'Single Family Residential', 'basement': 'Full Basement', 'neptune_quote_response': {'error': False, 'data': {'userID': None, 'brokerId': '9C0CD433-CC15-4053-8579-581F51B11D50', 'agentNo': 'FL0001', 'producerName': None, 'producerLicense': None, 'isDirectToConsumer': False, 'quoteNumber': 'NY0197APZQ3R9', 'status': 'approved', 'isBound': False, 'dateBound': '0001-01-01T00:00:00', 'isLocationBound': False, 'link': 'https://uat.neptuneflood.com/agent-hub/#/quote/NY0197APZQ3R9/auth/eyJhbGciOiJ |
To demonstrate text classification with Scikit Learn, we'll build a simple spam filter. While the filters in production for services like Gmail will obviously be vastly more sophisticated, the model we'll have by the end of this chapter is effective and surprisingly accurate.
Spam filtering is the "hello world" of document classification, but something to be aware of is that we aren't limited to two classes. The classifier we will be using supports multi-class classification, which opens up vast opportunities like author identification, support email routing, etc… However, in this example we'll just stick to two classes: SPAM and HAM.
For this exercise, we'll be using a combination of the Enron-Spam data sets and the SpamAssassin public corpus. Both are publicly available for download and are retreived from the internet during the setup phase of the example code that goes with this chapter.
def get_quote(self, address, city, state, zipcode): | |
""" """ | |
if not self.__token: | |
self.get_token_bearer() | |
assert len(state) == 2 | |
resp = requests.request( | |
"POST", | |
'%s/Services/API/v1/Quote' % (self.__base_url), |