Skip to content

Instantly share code, notes, and snippets.

@dwgill
Created September 28, 2013 00:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dwgill/6736928 to your computer and use it in GitHub Desktop.
Save dwgill/6736928 to your computer and use it in GitHub Desktop.
This is an attempt at a regular expression that will hopefully match the headers and footers that are included at the beginning of txt formatted ebooks hosted on Project Gutenberg (http://www.gutenberg.org/). The intent is that any expressions matching this regex might be safely replaced with an empty string before computationally processing the…
((\b(Project Gutenberg's)|(The Project Gutenberg EBook))[\s\S]+?\*{3}[\s\S]+?\*{3}(\s+?[pP]roduced by.+))|(([/bEe][nD][dD] of.+?[Pp]roject [Gg]utenberg[\s\S]+?)?\*{3}\s[eE][nN][dD].+?\*{3}[\s\S]*)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment