Created
April 28, 2015 14:49
-
-
Save jpoehls/3e728aec67015016c7d3 to your computer and use it in GitHub Desktop.
Example based parsing in V5
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# In collaboration with Microsoft Research, a new ConvertFrom-String cmdlet has been added. | |
# This cmdlet supports two modes: basic delimited parsing, and auto generated example-driven parsing. | |
# Delimited parsing, by default, splits the input at white space, and assigns property names to the resulting groups. You can customize the delimiter: | |
1 [C:\temp] | |
>> "Hello World" | ConvertFrom-String | Format-Table -Auto | |
P1 P2 | |
-- -- | |
# The cmdlet also supports auto-generated example-driven parsing based on the FlashExtract research work in Microsoft Research. | |
# To get started, consider a text-based address book: | |
# Ana Trujillo | |
# Redmond, WA | |
# | |
# Antonio Moreno | |
# Renton, WA | |
# | |
# Thomas Hardy | |
# Seattle, WA | |
# | |
# Christina Berglund | |
# Redmond, WA | |
# | |
# Hanna Moos | |
# Puyallup, WA | |
# Copy a few examples into a file, which you will use as your template: | |
# Ana Trujillo | |
# Redmond, WA | |
# | |
# Antonio Moreno | |
# Renton, WA | |
# Put curly braces around data that you want to extract, giving it a name as you do so. Because the Name property (and its associated other properties) can appear multiple times, use an asterisk (*) to indicate that this results in multiple records (rather than extracting a bunch of properties into one record): | |
# {Name*:Ana Trujillo} | |
# {City:Redmond}, {State:WA} | |
# | |
# {Name*:Antonio Moreno} | |
# {City:Renton}, {State:WA} | |
# From this set of examples, ConvertFrom-String can now automatically extract object-based output from input files with similar structure. | |
2 [C:\temp] | |
>> Get-Content .\addresses.output.txt | ConvertFrom-String -TemplateFile .\addresses.template.txt | | |
>>> Format-Table -Auto | |
ExtentText Name City State | |
---------- ---- ---- ----- | |
Ana Trujillo... Ana Trujillo Redmond WA | |
Antonio Moreno... Antonio Moreno Renton WA | |
Thomas Hardy... Thomas Hardy Seattle WA | |
Christina Berglund... Christina Berglund Redmond WA | |
Hanna Moos... Hanna Moos Puyallup WA | |
# To do additional data manipulation on extracted text, the ExtentText property captures the raw text from which the record was extracted. To provide feedback on this feature, or to share content that you are having difficulty writing examples for, please email psdmfb@microsoft.com. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment