Skip to content

Instantly share code, notes, and snippets.

@luthfibalaka
Last active March 3, 2024 07:35
Show Gist options
  • Save luthfibalaka/4845ba0085d6ff5bb222a10112e78b4b to your computer and use it in GitHub Desktop.
Save luthfibalaka/4845ba0085d6ff5bb222a10112e78b4b to your computer and use it in GitHub Desktop.

A. Introduction

Below are the granularities I found in spatial columns from the Chicago Data Portal. Each one of them is organized as follows:

[GRANULARITY]:

  1. [PATTERN 1]
  • Sample instance(s): [SAMPLE INSTANCE(s)]
  • Sample column: Link to Column
  • Regex Pattern: [REGEX]
    • Explanation: [Explanation of the pattern, from left to right]

...


B.Granularities

Geographic Coordinates:

1. POINT(float [',' or ' ' or ', '] float)

  • Sample instance(s): POINT(41.919365236,-87.769726946), POINT(41.919365236 -87.769726946)
  • Sample column(s): Location in Traffic Crashes - Crashes, Pickup Centroid Location in Taxi Trips
  • Regex Pattern: POINT(([-+]?[0-9].?[0-9]+)[ ,]+([-+]?[0-9].?[0-9]+))
    • Explanation: From left to right, it matches "POINT", "(", first float, [",", " ", or ", "], second float, and ")". Both float allows an optional +/- sign.

2. (float° [',' or ' ' or ', '] float°)

  • Sample instance(s): (41.90643°, -87.703717°)
  • Sample column(s): Location in Beach Lab Data
  • Regex Pattern: ((-?\d+(.\d+)?°)[ ,]+(-?\d+(.\d+)?°))
    • Explanation: From left to right, it matches "(", first float with "°" & optional "+/-", [",", " ", or ", "], second float with "°" & optional "+/-", and ")".

Street:

1. [NAME] [AVE/BVD/DR/ST/RD]

  • Sample instance(s): ROOSEVELT RD, CICERO AVE, 23RD ST
  • Sample column: Street Name in Traffic Crashes - Crashes
  • Regex Pattern: \b[\w0-9]+ (?i)(AVE|BVD|DR|ST|RD)\b
    • Explanation: From left to right, it matches the first word (possibly with digits), followed by (case insensitive) suffixes.

2. [int] [Directional indicator, allowing abbreviation] [strings]

ZIP Code:

1. [five-digit int]

Region/State:

1. [two-letter str]

  • Sample instance(s): IL, NJ
  • Sample column: State in Traffic Crashes - People
  • Regex Pattern: \b[A-Z]{2}\b
    • Explanation: It matches two-letter strings. (I am worried that some columns (non-states) might also have the same pattern. Do we need to enumerate all 50 states?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment