Skip to content

Instantly share code, notes, and snippets.

@senthilsweb
Created March 28, 2021 02:35
Show Gist options
  • Save senthilsweb/5f39eb244b2563bb0de46c49c7cef5cb to your computer and use it in GitHub Desktop.
Save senthilsweb/5f39eb244b2563bb0de46c49c7cef5cb to your computer and use it in GitHub Desktop.

lambda.data.extractor.regex

Introduction

Installation Instruction

Developer and End user guide

Tips and tricks

  1. Follow the instructions given in the simple sample .yml in the example section. It has self explanatory inline comments
  2. Validate the final .yml file in yaml validator
  3. Use online RegEx tool to create and test the regular expressions
  4. Follow file name convention vendorname_templatename_NN.yml
  5. For fileds use bb_ prefix and other fields should be treated as dummy (workaround to mitigate the known issues in the underlying library)
  6. During develeopment, Keep the .yml file in a designated s3 bucket folder, the pattern is always like [bucketname]\templates\[vendorname]\your.yml
  7. Once the .yml file is tested and certified as working, keep the file in source control. The folder structure should match the s3 bucket

This lambda function

Examples

Simple example # 1

The same is available in the root of thid Git repository

Raw example text

"\"DHDHMI6\\nBangladesh Rural\\nElectrification Board.\\nBREB\\nDate 2020-01-05 16:34:48\\nNo 0000000132028173\\nMetar No.: 040530001029\\nCustomer\\n010513063921627\\nNo.\\nYedot-Co\\nCustomer\\nName:\\nBangladesh Co.\\nDepartment: Salma\\nOperator\\nSalma\\nSequence\\n9\\nLtd.\\n37859.95 TK\\n3675.72kWh\\n250 TK\\nEnergy\\nCost:\\nEnergy\\n(10.3/kWh):\\nMeter Rent-\\n3P\\n(250/month):\\nDemand\\nCharge\\n(30/kW):\\nVAT(5%):\\nRebate{ 1%):\\n\\u0410\\u0442\\u0435\\u0430\\u0433\\nRecovery:\\n360 TK\\n1904.76 TK\\n-374.71 TK\\nOTK\\n40000 TK\\nGross\\nAmount:\\n5313-7505-7686-7027-7399\\nPlease press Enter after each 20-0GILS\\nToken chan continue to another new Token\\n\" [*****->Dynamic content (start)] GrandTotal:319.00 InvoiceDate:20-10-2015 InvoiceNo:#BLR_WFLD20151000982590 [<-*****Dynamic content (end)]"

RegEx patterns in `.yml' file

#This is mandatory field # 1. Hard code the name of the vendor / supplier
issuer: BREB
fields:
  #This is mandatory field # 2. Always hard code the below RegEx pattern
  amount: GrandTotal:(\d+\.\d+)
  #This is mandatory field # 3. Always hard code the below RegEx pattern
  date: InvoiceDate:(\d{1,4}\-\d{1,2}\-\d{1,4})
  #This is mandatory field # 4. Always hard code the below RegEx pattern
  invoice_number: InvoiceNo:(\S+)
  # Actual Field extracton Regualr Expression starts from here.
  # Tips & key points: 
  #    Prefix with "bp_"
  #    If there is no match, you will not get this attribute in the result
  bp_department: (?s)\\nDepartment:(.*)\\nOperator
  bp_token: \d{4}\-\d{4}\-\d{4}\-\d{4}
#"Keyword" is mandatory. Add one or more unique keyword(s) from the bills or documents in question.
keywords:
  - BREB
options:
  remove_whitespace: false

#---------------------------------------------
# Important: For this template to work, the below dummy one line string should present in the "Text" on which the RegEx going to be applied
#[*****->Dynamic content (start)] GrandTotal:319.00 InvoiceDate:20-10-2015 InvoiceNo:#BLR_WFLD20151000982590 [<-*****Dynamic content (end)]
#---------------------------------------------

REST End-point

Method

POST

End-point

https://h2iwgv44ul.execute-api.us-east-2.amazonaws.com/dev/regex

Request Body

  • templateFile can be any valid public url or relative bucket path as shown in the below example
{
    "bucketName": "toji.docs.bills-dev",
    "templateFile": "templates/BREB/{templatename.yml}",
    "rawData": "{copy the raw text against which the RegEx pattern to be applied}",
    "documentCode": "{{Any dummy valid numeric number}}"
}

Response Body

  • If everything goes well, you will get response as valid JSON string
  • In the response, the amount, date, invoice_number are dummy values as said above
  • In the response, the attributes start with bb_ are valid data attributes.
[
    {
        "message": "success",
        "data": {
            "issuer": "BREB",
            "amount": 321.0,
            "date": "2015-10-20 00:00:00",
            "invoice_number": "#BLR_WFLD20151000982590",
            "bp_department": " Salma",
            "bp_token": "5313-7505-7686-7027",
            "currency": "EUR",
            "desc": "Invoice from BREB"
        },
        "documentCode": "23",
        "templateFile": "templates/BREB/breb_ebbill_senthil_test.yml",
        "execution_time": "0:00:02"
    },
    200
]

References

Follow these online blog articles and video tutorials on how to use the invoice2data the underlying open source code forked to suite our requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment