Skip to content

Instantly share code, notes, and snippets.

@jpoehls
Created April 28, 2015 14:49
Show Gist options
  • Save jpoehls/3e728aec67015016c7d3 to your computer and use it in GitHub Desktop.
Save jpoehls/3e728aec67015016c7d3 to your computer and use it in GitHub Desktop.
Example based parsing in V5
# In collaboration with Microsoft Research, a new ConvertFrom-String cmdlet has been added.
# This cmdlet supports two modes: basic delimited parsing, and auto generated example-driven parsing.
# Delimited parsing, by default, splits the input at white space, and assigns property names to the resulting groups. You can customize the delimiter:
1 [C:\temp]
>> "Hello World" | ConvertFrom-String | Format-Table -Auto
P1 P2
-- --
# The cmdlet also supports auto-generated example-driven parsing based on the FlashExtract research work in Microsoft Research.
# To get started, consider a text-based address book:
# Ana Trujillo
# Redmond, WA
#
# Antonio Moreno
# Renton, WA
#
# Thomas Hardy
# Seattle, WA
#
# Christina Berglund
# Redmond, WA
#
# Hanna Moos
# Puyallup, WA
# Copy a few examples into a file, which you will use as your template:
# Ana Trujillo
# Redmond, WA
#
# Antonio Moreno
# Renton, WA
# Put curly braces around data that you want to extract, giving it a name as you do so. Because the Name property (and its associated other properties) can appear multiple times, use an asterisk (*) to indicate that this results in multiple records (rather than extracting a bunch of properties into one record):
# {Name*:Ana Trujillo}
# {City:Redmond}, {State:WA}
#
# {Name*:Antonio Moreno}
# {City:Renton}, {State:WA}
# From this set of examples, ConvertFrom-String can now automatically extract object-based output from input files with similar structure.
2 [C:\temp]
>> Get-Content .\addresses.output.txt | ConvertFrom-String -TemplateFile .\addresses.template.txt |
>>> Format-Table -Auto
ExtentText Name City State
---------- ---- ---- -----
Ana Trujillo... Ana Trujillo Redmond WA
Antonio Moreno... Antonio Moreno Renton WA
Thomas Hardy... Thomas Hardy Seattle WA
Christina Berglund... Christina Berglund Redmond WA
Hanna Moos... Hanna Moos Puyallup WA
# To do additional data manipulation on extracted text, the ExtentText property captures the raw text from which the record was extracted. To provide feedback on this feature, or to share content that you are having difficulty writing examples for, please email psdmfb@microsoft.com.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment