Skip to content

Instantly share code, notes, and snippets.

@bahuljain
Last active December 11, 2015 19:18
Show Gist options
  • Save bahuljain/7ee2a8931b7ffa850b75 to your computer and use it in GitHub Desktop.
Save bahuljain/7ee2a8931b7ffa850b75 to your computer and use it in GitHub Desktop.

#Data Mining

##Knowledge Discovery in Databases

  • Types:
    • Association Rules**
    • Causality (Interestingness, Conviction)
    • Clustering
    • Classification
    • Sequential Patterns
  • Association Rules
    • Support
    • Confidence
    • Apriori Algorithm
    • Hierarchical Itemsets
    • Quantitative Fields

##OLAP

  • Decision Support System
  • Data Warehouse, ETL, Metadata Repository
  • Dimensions, facts, measures
  • OLAP, OLTP
  • Roll-up / Drill-down, Pivoting, Cross Tabulation, Slicing, Dicing
  • MOLAP
  • ROLAP
  • Schema
    • Star
    • Fact Constellation
    • Snowflake
  • Performance Considerations
  • Bit-Map Index – Bit vectors for sparse columns
  • Join Index

##Time-Series

####Similarity

  • Euclidean Distance
  • Cross-correlation measure
  • Dynamic Time Warping distance
    • Properties: Continuity, Boundary Constraint, Monotonicity
    • γ(i,j)=d(q_i,c_i )+min{ γ(i-1,j-1),γ(i-1,j),γ(i,j-1) }
    • Global constrains – Sakoe Chiba Band, Itakura Parallelogram
  • Symbolic Aggregate Approximation (SAX)
    • Lower Bound for Euclidean and DTW?? What
    • Piecewise Aggregate approximation (PAA)
    • PAA -> Symbols
  • Feature Based Similarity (μ,σ, Kurtosis, and Skew…)

####Pre-Processing (Removing Distortions)

  • Offset Translation (Mean)
  • Amplitude Scaling (Standard Deviation)
  • Linear Trend
  • Noise

####Clustering

  • Hierarchical Clustering
  • K-Means (Partitioning)

####Classification

  • Nearest Neighbor Classification

##Spatial Data Management

  • Data: Point Data, Raster Data, Region Data, Vector Data
  • Queries: Range Queries, Nearest Neighbor Queries, Spatial Join Queries
  • B+ Trees Index vs Spatial Index

####Space Filling Curves

  • Point Data: Z-Ordering and B+ Trees
  • Region Data: Region Quad Trees, Z-Ordering and B+ Trees - 2^k^ regions
  • Querying – range, nearest neighbor, join

####Grid Files

  • Grid Directory, Linear Scale
  • Querying – point, range, nearest neighbor
  • Creation / Insertion of points (Page Capacity, Splitting Policy)
  • Deletion – Convexity Requirement

####R-Trees

  • Bounding Box
  • Querying – point, range, nearest neighbor
  • Insertion and Deletion, Optimal Splits
  • R* Trees
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment