Skip to content

Instantly share code, notes, and snippets.

View gumdropsteve's full-sized avatar

Winston Robson gumdropsteve

View GitHub Profile
@gumdropsteve
gumdropsteve / zillow.py
Created October 2, 2018 21:23 — forked from scrapehero/zillow.py
Python 3 script to find real estate listings of properties up for sale on zillow.com
from lxml import html
import requests
import unicodecsv as csv
import argparse
def parse(zipcode,filter=None):
if filter=="newest":
url = "https://www.zillow.com/homes/for_sale/{0}/0_singlestory/days_sort".format(zipcode)
elif filter == "cheapest":
@gumdropsteve
gumdropsteve / simple_test.py
Last active October 17, 2020 03:19
Simple test to see if Selenium is correctly installed
from time import sleep
from selenium import webdriver
story = 'https://medium.com/dropout-analytics/selenium-and-geckodriver-on-mac-b411dbfe61bc'
story = story + '?source=friends_link&sk=18e2c2f07fbe1f8ae53fef5ad57dbb12' # 'https://bit.ly/2WaKraO' <- short link
def gecko_test(site_000=story):
"""
simple overview:
1) set up webdriver

Keybase proof

I hereby claim:

  • I am gumdropsteve on github.
  • I am winston_ (https://keybase.io/winston_) on keybase.
  • I have a public key ASC6sL40hnMAmWH9Wum_6F-H8L-QvW_0BLvCJfSyQr4r7Ao

To claim this, I am signing this object:

from blazingsql import BlazingContext
import cudf
# cuDF DataFrame from CSV stored external via URL
turkey_poll = cudf.read_csv('https://query.data.world/s/ss47hkdmqe5d6353neouv4ourm2ous')
# make columns easier to work with
new_cols = []
for col in turkey_poll.columns:
# replace spaces w/ underscore and drop question & quotation marks
@gumdropsteve
gumdropsteve / cuml_taxi_fare_prediction.ipynb
Last active December 18, 2019 00:39
cuML_Taxi_Fare_Prediction.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gumdropsteve
gumdropsteve / sample_log.py
Last active December 20, 2019 04:50
Example of how to query your BlazingSQL logs
# This query determines the data load time and total time for all queries, showing the latest ones first.
# load time and total time being the maximum load time and total time for any node.
log_query = """
SELECT
MAX(end_time) as end_time, query_id,
MAX(load_time) AS load_time, MAX(total_time) AS total_time
FROM (
SELECT
query_id, node_id,
SUM(CASE WHEN info = 'evaluate_split_query load_data' THEN duration ELSE 0 END) AS load_time,
@gumdropsteve
gumdropsteve / no_pool.py
Last active February 3, 2020 23:04
Test pool vs no pool performance with BlazingSQL
import os
import urllib
from blazingsql import BlazingContext
# set number of times to run each query
n_runs = 3
# let user know
print(f'nruns = {n_runs}')
'''CHECK FOR DATA
@gumdropsteve
gumdropsteve / case_taxi.csv
Created February 4, 2020 22:52
For predicting the cost of a ride from Grand Central Station to Samsung Next NYC at the top of each hour 4am-8am 29 Feb 2020
hours days months years longitude_distance latitude_distance passenger_count
4 29 2 2020 0.012727 0.008484 1
5 29 2 2020 0.012727 0.008484 1
6 29 2 2020 0.012727 0.008484 1
7 29 2 2020 0.012727 0.008484 1
8 29 2 2020 0.012727 0.008484 1
@gumdropsteve
gumdropsteve / distributed_datashader_demo.ipynb
Created February 13, 2020 08:46
visualizing distributing blazingsql query results with datashader
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gumdropsteve
gumdropsteve / sample_taxi.csv
Created February 13, 2020 19:18
Sample o f the data used in BlazingSQL Distributed Taxi demo Notebook
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 7.
"VendorID","tpep_pickup_datetime","tpep_dropoff_datetime","passenger_count","trip_distance","pickup_longitude","pickup_latitude","RateCodeID","store_and_fwd_flag","dropoff_longitude","dropoff_latitude","payment_type","fare_amount","extra","mta_tax","tip_amount","tolls_amount","improvement_surcharge","total_amount"
2,"2015-01-15 19:05:39","2015-01-15 19:23:42",1,1.59,-73.99389648,40.75011063,1,"N",-73.97478485,40.75061798,1,12.0,1.0,0.5,3.25,0.0,0.3,17.05
1,"2015-01-10 20:33:38","2015-01-10 20:53:28",1,3.3,-74.00164795,40.72424316,1,"N",-73.99441528,40.7591095,1,14.5,0.5,0.5,2.0,0.0,0.3,17.8
1,"2015-01-10 20:33:38","2015-01-10 20:43:41",1,1.8,-73.96334076,40.80278778,1,"N",-73.95182037,40.8244133,2,9.5,0.5,0.5,0.0,0.0,0.3,10.8
1,"2015-01-10 20:33:39","2015-01-10 20:35:31",1,0.5,-74.00908661,40.7138176,1,"N",-74.00432587,40.71998596,2,3.5,0.5,0.5,0.0,0.0,0.3,4.8
1,"2015-01-10 20:33:39","2015-01-10 20:52:58",1,3.0,-73.97117615,40.76242828,1,"N",-74.00418091,40.74265289,2,15.0,0.5,0.5,0.0,0.0,0.3,16.3
1,"2015-01-