aslotnick

## point_in_polygon.sql

CREATE FUNCTION point_in_polygon(point_x float, point_y float, polygon_wkt varchar(max))
RETURNS boolean IMMUTABLE AS
$$

### begin section copied from http://geospatialpython.com/2011/08/point-in-polygon-2-on-line.html (I modifed to return boolean)
# Improved point in polygon test which includes edge
# and vertex points

def point_in_poly(x,y,poly):

## Strata+Hadoop World 2016 Notes.md

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
          
            
              
              0 comments
            
          
          
            
              
              0 stars
            
          
        
        
          
              
          
          
            
                aslotnick
                / Strata+Hadoop World 2016 Notes.md
            
            
              Last active
              November 16, 2016 21:34
            
              
                Strata+Hadoop World 2016 Notes
              
          
        
      
        
  
      
    File format benchmark: Avro, JSON, ORC, and Parquet
(slides: https://cdn.oreillystatic.com/en/assets/1/event/160/File%20format%20benchmark_%20Avro,%20JSON,%20ORC,%20and%20Parquet%20Presentation%201.pptx)

ORC has some built-in tuning for better performance with double and timestamp types
Both ORC and Parquet support predicate pushdown
Avro was a good choice for very wide tables with lots of text fields
For future investigation: look into “schema evolution” for both columnar formats
Snappy is faster than Zlib at the cost of more disk space

Data science at eHarmony: A generalized framework for personalization

	CREATE FUNCTION point_in_polygon(point_x float, point_y float, polygon_wkt varchar(max))
	RETURNS boolean IMMUTABLE AS
	$$

	### begin section copied from http://geospatialpython.com/2011/08/point-in-polygon-2-on-line.html (I modifed to return boolean)
	# Improved point in polygon test which includes edge
	# and vertex points

	def point_in_poly(x,y,poly):