Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save teaforchris/0024c0f5b01be6eb80a0404c23908fd4 to your computer and use it in GitHub Desktop.
Save teaforchris/0024c0f5b01be6eb80a0404c23908fd4 to your computer and use it in GitHub Desktop.
Lorem Ipsum finder for Screaming Frog

Lorem Ipsum finder for Screaming Frog

Pre Requisites / Disclaimer

  • This assumes you have the full licenced version of Screaming Frog - no idea if this works in the free version.
  • The stopwords list is not definitive. It is simply a unique list of definite non-english words from the first 5 paragraphs of Lorem Ipsum generated at https://www.lipsum.com/
  • The regex is fairly long and can greatly slow down the crawl process and hog machine resources.
  • This search may not be 100% accurate if you have unusual lorem ipsum text. You might want to consider generating your own stopwords list.

Custom Search

Sets up a Custom Search in Screaming Frog to find pages with "lorem ipsum" placeholder text - ideal for scanning a website prior to go-live. The search only shows the number of occurences of possible lipsum text on a page. You need the Custom Extraction to drill down into the specific words found.

  1. Go to Configuration > Custom > Search
  2. Add a new custom search with the following parameters:
    • Name: Lipsum
    • Contains
    • Regex
    • Paste in the full raw content of stopwords below
    • Page Text
  3. Click OK
  4. Run a crawl and the Custom Search tab will show the results

Custom Extraction

Sets up a Custom Extraction in Screaming Frog to search pages with "lorem ipsum" placeholder text present and show matches in the crawl results.

  1. Go to Configuration > Custom > Extraction
  2. Add a new custom extraction with the following parameters:
    • Name: Lipsum
    • Regex
    • Paste in the full raw content of stopwords below
  3. Click OK
  4. Run a crawl and the Custom Extraction tab will show the results

Usage

Generally I like to do the following:

  1. Run a crawl and use the Custom Search to find pages with possible lipsum text.
  2. Any pages found, dig deeper in the Custom Extraction pages
  3. Assuming you have HTML extraction enabled, you can then search the page content for the offending lipsum stopwords to see the context of the lipsum text and take action accordingly.
\b(dui|hac|nam|nec|sed|sem|vel|amet|arcu|cras|duis|eget|elit|enim|erat|eros|nibh|nisi|nisl|nunc|odio|orci|quam|quis|urna|augue|dolor|donec|etiam|felis|ipsum|justo|lacus|lorem|magna|massa|morbi|neque|nulla|porta|proin|purus|risus|velit|aenean|auctor|congue|lectus|ligula|luctus|mattis|mauris|mollis|nullam|ornare|platea|primis|rutrum|semper|tellus|tortor|turpis|aliquam|aliquet|blandit|commodo|dapibus|egestas|euismod|feugiat|finibus|gravida|iaculis|lacinia|laoreet|posuere|potenti|pretium|quisque|rhoncus|sodales|vivamus|viverra|accumsan|bibendum|dictumst|eleifend|faucibus|interdum|lobortis|maecenas|molestie|pharetra|placerat|praesent|suscipit|ultrices|vehicula|volutpat|consequat|convallis|curabitur|dignissim|efficitur|elementum|facilisis|fermentum|fringilla|habitasse|hendrerit|imperdiet|malesuada|porttitor|tincidunt|tristique|ultricies|venenatis|vulputate|adipiscing|vestibulum|condimentum|consectetur|scelerisque|suspendisse|ullamcorper|pellentesque|sollicitudin)\b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment