Skip to content

Instantly share code, notes, and snippets.

View dice89's full-sized avatar

Alexander Mueller dice89

View GitHub Profile
@dice89
dice89 / gist:2c313bd5cfff0a4fb599
Created February 8, 2015 21:08
Word2Vec Usage from Java with Apache Spark
Word2VecModel model_stemmed = ModelUtil.loadWord2VecModel("/Users/mueller/Coding/Word2Vectors/webbase10p/model_word2vec_stemmed.ser");
Word2VecModel model_unstemmed = ModelUtil.loadWord2VecModel("/Users/mueller/Coding/Word2Vectors/webbase10p/model_word2vec.ser");
System.out.println("Stemmed example");
System.out.println("#############################################");
String term1= "scholar";
String term2 ="student";
//To Stem terms the Porter Stemmer from Apache Lucene is used
double result = Word2VecSim.cousineSimilarityBetweenTerms(model_stemmed,ModelUtil.porter_stem(term1),ModelUtil.porter_stem(term2));
@dice89
dice89 / spatial_radius_query.py
Created August 6, 2017 15:47
How to create a ball-tree based spatial search index and compare it to a brute-force approach
# Perform Dataset preparation and the comparison
import numpy as np
import random
import time
from collections import defaultdict
from sklearn.neighbors import BallTree
from sklearn.neighbors import DistanceMetric
n_samples_min = int(1e3)
n_samples_max = int(1e7)
@dice89
dice89 / Pipeline.ipynb
Last active February 17, 2023 04:09
Spatial Radius Queries
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dice89
dice89 / geohash_based_spatial_radius_query_approach.py
Last active August 15, 2019 15:06
Spatial indexing Approaches
RADIANT_TO_KM_CONSTANT = 6367
import proximitypyhash as ppyh
import pygeohash as pgh
from sklearn.neighbors import DistanceMetric
import numpy as np
from collections import defaultdict
class GeoHashIndexer:
def __init__(self, precision, lat_longs):
@dice89
dice89 / ball_tree_based_spatial_query_approach.py
Last active August 9, 2017 07:55
A BallTree-based spatial index
from sklearn.neighbors import BallTree
import numpy as np
RADIANT_TO_KM_CONSTANT = 6367
class BallTreeIndex:
def __init__(self,lat_longs):
self.lat_longs = np.radians(lat_longs)
self.ball_tree_index =BallTree(self.lat_longs, metric='haversine')
def query_radius(self,query,radius):
radius_km = radius/1e3
@dice89
dice89 / brute_force_spatial_radius_search.py
Created August 9, 2017 07:57
A simple brute force based spatial radius search using sklearn
from sklearn.neighbors import DistanceMetric
import numpy as np
RADIANT_TO_KM_CONSTANT = 6367
class BruteForce:
def __init__(self,lat_longs):
self.haversine =DistanceMetric.get_metric('haversine')
self.lat_longs = np.radians(lat_longs)
def query_radius(self,query,radius):
radius_km = radius/1e3
@dice89
dice89 / setup.py
Created August 9, 2017 07:57
Test setup for spatial radius search query experiments
import numpy as np
import random
n_samples_min = int(1e3)
n_samples_max = int(5e7)
lat_longs = np.array([[random.uniform(50, 52),
random.uniform(8, 15) ]
for i in range(n_samples_max)])
@dice89
dice89 / PyDataRangeQueries.ipynb
Created September 20, 2017 08:40
The Code for my Talk at PyData in Berlin September 2017
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dice89
dice89 / install.sh
Created November 9, 2017 11:25
Proof of Concept Spark,PySpark Cassandra Setup
## Install scala 2.11.8
export scalaVer="2.11.8"
sudo apt-get remove scala-library scala
wget www.scala-lang.org/files/archive/scala-"$scalaVer".deb
sudo dpkg -i scala-"$scalaVer".deb
sudo apt-get -y --force-yes update
sudo apt-get -y --force-yes install scala
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"