Skip to content

Instantly share code, notes, and snippets.

View jaredwinick's full-sized avatar

Jared Winick jaredwinick

View GitHub Profile
@jaredwinick
jaredwinick / README.md
Last active April 21, 2020 10:44
Apache Zeppelin Object Exchange for passing a DataFrame of feature vectors from the Scala Spark interpreter to PySpark to get a numpy array

Apache Zeppelin has a helpful feature in its Spark Interpreter called Object Exchange. This allows you to pass objects, including DataFrames, between Scala and Python paragraphs of the same notebook. You can do your data prep/feature engineering with the Scala Spark Interpreter, and then pass off a DataFrame containing the features to PySpark for use with libraries like NumPy and scikit-learn. Also with Zeppelin's support for matplotlib you have a pretty good setup for poking around and testing out machine learning on your data.

@jaredwinick
jaredwinick / README.md
Last active May 26, 2018 03:43
Visualizing Linear Regression by Gradient Descent

Inspired by Professor Ng's lectures in the Coursera Machine Learning class, these animations visualize linear regression (1-variable) by using gradient descent. The graph on the left shows the data we are trying to fit, and the hypothesis line as the variables θ0 and θ1 converge. The plot on the right shows the value of the cost function. The animation loops forever, each time starting with a "random" value of θ0 and θ1.

@jaredwinick
jaredwinick / top_inbound_links
Created November 18, 2013 04:12
Calculate the number of unique inbound and outbound links between subdomains. Store the top 25 of each.
@jaredwinick
jaredwinick / README.md
Last active January 26, 2023 21:54
Z-Order Curve with Query

Z-Order curves are used to encode multiple dimensions to one dimension while maintaining locality. This feature makes them useful for indexing multidimensional data such as geospatial data. In BigTable-like systems (Accumulo, HBase, Cassandra a z-order curve index can translate a bounding box query to a single range scan. As this example shows, sometimes the locality properties of the curve are very good and few points outside the bounding box are scanned. Other times though, many points outside the bounding box are scanned if using a single range.

This example was inspired by Mike Bostock's Quadtree example

@jaredwinick
jaredwinick / index.html
Created March 2, 2013 05:15
Z-Order Curve
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>Z-Order Curve</title>
<script src="http://d3js.org/d3.v3.min.js"></script>
<style type="text/css">
body {
background: #fff;