Below are the granularities I found in spatial columns from the Chicago Data Portal. Each one of them is organized as follows:
- [PATTERN 1]
- Sample instance(s): [SAMPLE INSTANCE(s)]
- Sample column: Link to Column
- Regex Pattern: [REGEX]
- Explanation: [Explanation of the pattern, from left to right]
...
1. POINT(float [',' or ' ' or ', '] float)
- Sample instance(s): POINT(41.919365236,-87.769726946), POINT(41.919365236 -87.769726946)
- Sample column(s): Location in Traffic Crashes - Crashes, Pickup Centroid Location in Taxi Trips
- Regex Pattern: POINT(([-+]?[0-9].?[0-9]+)[ ,]+([-+]?[0-9].?[0-9]+))
- Explanation: From left to right, it matches "POINT", "(", first float, [",", " ", or ", "], second float, and ")". Both float allows an optional +/- sign.
2. (float° [',' or ' ' or ', '] float°)
- Sample instance(s): (41.90643°, -87.703717°)
- Sample column(s): Location in Beach Lab Data
- Regex Pattern: ((-?\d+(.\d+)?°)[ ,]+(-?\d+(.\d+)?°))
- Explanation: From left to right, it matches "(", first float with "°" & optional "+/-", [",", " ", or ", "], second float with "°" & optional "+/-", and ")".
1. [NAME] [AVE/BVD/DR/ST/RD]
- Sample instance(s): ROOSEVELT RD, CICERO AVE, 23RD ST
- Sample column: Street Name in Traffic Crashes - Crashes
- Regex Pattern: \b[\w0-9]+ (?i)(AVE|BVD|DR|ST|RD)\b
- Explanation: From left to right, it matches the first word (possibly with digits), followed by (case insensitive) suffixes.
2. [int] [Directional indicator, allowing abbreviation] [strings]
- Sample instance(s): 400 E. Lower Wacker, 800 W 111TH ST, 1958 North Milwaukee Avenue
- Sample column(s): Towed to Address in Towed Vehicles, Address in Red Light Camera Violations, Address in Ward Offices
- Regex Pattern: \b\d+\s*(?i)(N|NORTH|SOUTH|S|EAST|E|WEST|W)?(?:.|\s)\s*(\w|\d| )+\b
- Explanation: From left to right, it matches digits, direction (possibly abbreviated, with and without "."), and following strings.
1. [five-digit int]
- Sample instance(s): 60634, 60637
- Sample column: Zipcode in Traffic Crashes - People
- Regex Pattern: \b\d{5}\b
- Explanation: Simply match 5-digit numbers.
1. [two-letter str]
- Sample instance(s): IL, NJ
- Sample column: State in Traffic Crashes - People
- Regex Pattern: \b[A-Z]{2}\b
- Explanation: It matches two-letter strings. (I am worried that some columns (non-states) might also have the same pattern. Do we need to enumerate all 50 states?)