Skip to content

Instantly share code, notes, and snippets.

@hamletbatista
Created July 24, 2019 21:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hamletbatista/e652f69ca9b23e43b7891b803ffe2d86 to your computer and use it in GitHub Desktop.
Save hamletbatista/e652f69ca9b23e43b7891b803ffe2d86 to your computer and use it in GitHub Desktop.
#https://regex101.com/r/ElmF2y/2/
p= r'^(\S+) \S+ \S+ \[([^\]]+)\] "[A-Z]+\s([^\s]+) [^"]+" (\d+) \d+ "[^"]*" "([^"]*)"$'
#example CSV output
#ip,date,url,status_code,ua
#66.249.69.196,30/Jun/2019:08:05:31 -0400,/Crawl-Your-Ecommerce-Site-with-Python-Scrapy-,301,Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
#66.249.69.196,30/Jun/2019:08:05:32 -0400,/category/design-development,200,Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
#66.249.69.196,30/Jun/2019:08:05:32 -0400,/Crawl-Your-Ecommerce-Site-with-Python-Scrapy,200,Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
#66.249.69.196,30/Jun/2019:08:05:38 -0400,/wp-content/uploads/2015/05/practical-ecommerce-icon.png-144,404,Googlebot-Image/1.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment