- This assumes you have the full licenced version of Screaming Frog - no idea if this works in the free version.
- The stopwords list is not definitive. It is simply a unique list of definite non-english words from the first 5 paragraphs of Lorem Ipsum generated at https://www.lipsum.com/
- The regex is fairly long and can greatly slow down the crawl process and hog machine resources.
- This search may not be 100% accurate if you have unusual lorem ipsum text. You might want to consider generating your own stopwords list.
Sets up a Custom Search in Screaming Frog to find pages with "lorem ipsum" placeholder text - ideal for scanning a website prior to go-live. The search only shows the number of occurences of possible lipsum text on a page. You need the Custom Extraction to drill down into the specific words found.
- Go to Configuration > Custom > Search
- Add a new custom search with the following parameters:
- Name: Lipsum
- Contains
- Regex
- Paste in the full raw content of
stopwords
below - Page Text
- Click OK
- Run a crawl and the Custom Search tab will show the results
Sets up a Custom Extraction in Screaming Frog to search pages with "lorem ipsum" placeholder text present and show matches in the crawl results.
- Go to Configuration > Custom > Extraction
- Add a new custom extraction with the following parameters:
- Name: Lipsum
- Regex
- Paste in the full raw content of
stopwords
below
- Click OK
- Run a crawl and the Custom Extraction tab will show the results
Generally I like to do the following:
- Run a crawl and use the Custom Search to find pages with possible lipsum text.
- Any pages found, dig deeper in the Custom Extraction pages
- Assuming you have HTML extraction enabled, you can then search the page content for the offending lipsum stopwords to see the context of the lipsum text and take action accordingly.