Skip to content

Instantly share code, notes, and snippets.

@Alveel
Last active January 3, 2020 19:19
Show Gist options
  • Save Alveel/64f7adfb94095a2de3f8f744255d53df to your computer and use it in GitHub Desktop.
Save Alveel/64f7adfb94095a2de3f8f744255d53df to your computer and use it in GitHub Desktop.
Multiline regex, end of string anchor and newlines
#!/usr/bin/env python3
import re
volume_statistics_pattern = re.compile(r'''
^Starting\ time\ of\ crawl:\ (?P<crawl_start>\w{3}\ \w{3}\s+\d{1,2}\ \d{2}:\d{2}:\d{2}\ \d{4})\s+
Ending\ time\ of\ crawl:\ (?P<crawl_end>\w{3}\ \w{3}\s+\d{1,2}\ \d{2}:\d{2}:\d{2}\ \d{4})\s+
Type\ of\ crawl:\ (?P<crawl_type>\w+)\s+
No\.\ of\ entries\ healed:\ (?P<entries_healed>\d+)\s+
No\.\ of\ entries\ in\ split-brain:\ (?P<entries_splitbrain>\d+)\s+
No\.\ of\ heal\ failed\ entries:\ (?P<entries_failed>\d+)$
''', re.MULTILINE | re.VERBOSE | re.ASCII)
working_string = "Starting time of crawl: Fri Jan 3 08:43:36 2020\n\nEnding time of crawl: Fri Jan 3 08:43:37 2020\n\nType of crawl: INDEX\nNo. of entries healed: 0\nNo. of entries in split-brain: 0\nNo. of heal failed entries: 0\n\n"
failing_string = "Starting time of crawl: Fri Jan 3 08:43:36 2020\r\n\r\nEnding time of crawl: Fri Jan 3 08:43:37 2020\r\n\r\nType of crawl: INDEX\r\nNo. of entries healed: 0\r\nNo. of entries in split-brain: 0\r\nNo. of heal failed entries: 0\r\n\r\n"
working = volume_statistics_pattern.findall(working_string)
# This prints the matched result
print(working)
failing = volume_statistics_pattern.findall(failing_string)
# This prints an empty list
print(failing)
# If I change the end of the last line of my pattern to be `\s*$`, it works.
# But why? In that sense the working string should not work either.
# Does $ handle `\n` differently than `\r\n`?
# ANSWER: https://docs.microsoft.com/en-us/dotnet/standard/base-types/anchors-in-regular-expressions#end-of-string-or-line-
# Thanks to https://stackoverflow.com/a/31400257/7647292
# So the right way to end the pattern is with `\r?$`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment