luthfibalaka/spatial_granularity.md Secret

## spatial_granularity.md

      
    Raw
  

              spatial_granularity.md
            
          
    A. Introduction

Below are the granularities I found in spatial columns from the Chicago Data Portal. Each one of them is organized as follows:
[GRANULARITY]:


[PATTERN 1]


Sample instance(s): [SAMPLE INSTANCE(s)]
Sample column: Link to Column
Regex Pattern: [REGEX]

Explanation: [Explanation of the pattern, from left to right]


...

B.Granularities

Geographic Coordinates:

1. POINT(float [',' or ' ' or ', '] float)

Sample instance(s): POINT(41.919365236,-87.769726946), POINT(41.919365236 -87.769726946)
Sample column(s): Location in Traffic Crashes - Crashes,
Pickup Centroid Location in Taxi Trips
Regex Pattern: POINT(([-+]?[0-9].?[0-9]+)[ ,]+([-+]?[0-9].?[0-9]+))

Explanation: From left to right, it matches "POINT", "(", first float, [",", " ", or ", "], second float, and ")". Both float allows an optional +/- sign.


2. (float° [',' or ' ' or ', '] float°)

Sample instance(s): (41.90643°, -87.703717°)
Sample column(s): Location in Beach Lab Data
Regex Pattern: ((-?\d+(.\d+)?°)[ ,]+(-?\d+(.\d+)?°))

Explanation: From left to right, it matches "(", first float with "°" & optional "+/-", [",", " ", or ", "], second float with "°" & optional "+/-", and ")".


Street:

1. [NAME] [AVE/BVD/DR/ST/RD]

Sample instance(s): ROOSEVELT RD, CICERO AVE, 23RD ST
Sample column: Street Name in Traffic Crashes - Crashes
Regex Pattern: \b[\w0-9]+ (?i)(AVE|BVD|DR|ST|RD)\b

Explanation: From left to right, it matches the first word (possibly with digits), followed by (case insensitive) suffixes.


2. [int] [Directional indicator, allowing abbreviation] [strings]

Sample instance(s): 400 E. Lower Wacker, 800 W 111TH ST, 1958 North Milwaukee Avenue
Sample column(s): Towed to Address in Towed Vehicles,
Address in Red Light Camera Violations,
Address in Ward Offices
Regex Pattern: \b\d+\s*(?i)(N|NORTH|SOUTH|S|EAST|E|WEST|W)?(?:.|\s)\s*(\w|\d| )+\b

Explanation: From left to right, it matches digits, direction (possibly abbreviated, with and without "."), and following strings.


ZIP Code:

1. [five-digit int]

Sample instance(s): 60634, 60637
Sample column: Zipcode in Traffic Crashes - People
Regex Pattern: \b\d{5}\b

Explanation: Simply match 5-digit numbers.


Region/State:

1. [two-letter str]

Sample instance(s): IL, NJ
Sample column: State in Traffic Crashes - People
Regex Pattern: \b[A-Z]{2}\b

Explanation: It matches two-letter strings. (I am worried that some columns (non-states) might also have the same pattern. Do we need to enumerate all 50 states?)