Skip to content

Instantly share code, notes, and snippets.

@chriscalip
Last active September 18, 2019 21:43
Show Gist options
  • Save chriscalip/413add677dd912c52c7418077f2b0a82 to your computer and use it in GitHub Desktop.
Save chriscalip/413add677dd912c52c7418077f2b0a82 to your computer and use it in GitHub Desktop.
Proposed draft for an address parser through a conyever belt.
<?php
/**
* Description:
*
* Draft process of gaining stuctured address components (street,street2,city,state,zipcode,building) from a unstructured address string.
*
* Legend:
*
* Address String - unstructured address string. eg. "123 Main St, Chicago, IL, 60140", "123 Main Street Chicago Illinois 60640-1242", etc..
* Address Components - structured data representing an address (street,street2,city,state,zipcode,country,building), required input of Address Standardizers
* Standardized Address Components - similar to address components, usually from commercial usps databases. Includes postnet barcode.
* Parser - processor that accepts unstructured address string and possibly returns address components.
* Preparser - processor that accepts unstructured address string and returns an address string thats easier for parsers.
* Standardizer - processor that commonly accepts Address Components and returns USPS standardized address components (usually has Postnet Barcode)
* Parser Cache - custom parser, accepts Address String; returns both address component and standardized address components. Uses cache of previous attempts.
*
* Pseudo Code Scenarios:
*
* - Parser accepts Address String returns Address Components. successful process.
*
* - Parser accepts Address Strings and fails to parse address components from input. Parser returns empty.
*
* - A collection of parsers tries to process an Address String, returns Address Components.
*
* - Attempt to gain address components from an address string, given previous attempts failed using a collection of parsers.
* Generate a collection of modified Address Strings from an address string via a collection of Preparsers.
* Each item from collection of modified address string is run through collection of parsers.
*
*/
$parserA = function (string $address) : ?array {
$addressComponents = [
'street' => '',
'street2' => '',
'city' => '',
'state' => '',
'zipcode' => '',
'country' => '',
'building' => '',
];
return $addressComponents;
};
$parserB = function (string $address) : ?array {
return null;
};
$preParserA = function (string $address) : ?string {
$output = null;
return $output;
};
$preParserB = function (string $address) : ?string {
return null;
};
$parsers = [$parserB, $parserB, $parserA];
$preParsers = [
$preParserA,
$preParserB,
function ($string) {
return '123 Main St, Chicago, IL, 60640';
},
];
// $records = ['123 Main St, Chicago, IL, 60640'];
$records = [
'123 Main St, Chicago, IL, 60640',
'123 Main St; Chicago; IL; 60640',
'123 N Beech Rd, Osceola, IN 46561, USA',
'123 E. Washington Street, Suite 123, Athens, GA, 30601',
'Some Department, 123 W Romeo Rd, Romeoville, IL 60446, USA',
'Some County Court 123 Dr. Martin Luther King Jr. Blvd., White Plains, New York, 10601',
'123 Girard Street
Bellingham WA 98225',
'123 Girard Street Bellingham WA 98225',
'Some Health Department 123 Girard Street Bellingham WA 98225',
'Some Health Department 123 Girard Street Bellingham Washington 98225',
];
$hits = $fails = [];
// Scenario A: Parser accepts Address String returns Address Components. successful process.
$result = $parserA('123 Main St, Chicago, IL, 60640');
// end Scenario A.
// Scenario B: A collection of parsers tries to process an Address String, returns Address Components.
$addressString = $records[0] ?? '';
$results = [];
// foreach ([$parserB, $parserB, $parserA] as $parser) { $results[] = $parser($attempt); }
foreach ($parsers as $parser) {
$results[$addressString][] = $parser($addressString);
}
// end Scenario B.
// Scenario C: Attempt to gain address components from an address string, given previous attempts failed using a collection of parsers.
// Generate a collection of modified Address Strings from an address string via a collection of Preparsers.
// Each item from collection of modified address string is run through collection of parsers.
$addressString = '60640, Chicago, IL, 123 Main St.,';
$results = $preParsedAttempts = [];
foreach ($preParsers as $preParser) {
$preParsedAttempts[$addressString][] = $preParser($addressString);
}
$preParsedAttempts[$addressString] = array_filter($preParsedAttempts[$addressString]);
foreach ($preParsedAttempts as $initialAddressString => $addressStrings) {
foreach ($addressStrings as $addressString) {
foreach ([$parserB, $parserB, $parserA] as $parser) {
$results[$initialAddressString][] = $parser($addressString);
}
}
}
// end Scenario C.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment