Skip to content

Instantly share code, notes, and snippets.

@ggorlen
Last active July 5, 2024 22:38
Show Gist options
  • Save ggorlen/6eb6c550e80e692710abcd4a82375ce0 to your computer and use it in GitHub Desktop.
Save ggorlen/6eb6c550e80e692710abcd4a82375ce0 to your computer and use it in GitHub Desktop.
Creating Reproducible Browser Automation Examples

Creating Reproducible Browser Automation Examples

Motivation

  • Without a minimal, reproducible example (reprex), in most cases, it's impossible to answer your question.
  • Without a reprex, the question isn't likely to provide value to future visitors with the same problem (the main purpose of Q&A sites like Stack Overflow).

Question Checklist

  1. Include your reprex code as text, not an image in the question itself (not an external link).
    • Can someone copy and paste the code into an editor and run it as-is? If not, it's not complete.
    • Does running the code reproduce the problem? If not, it's not reproducible.
    • Is there anything in the code that can be removed, while still causing the failure? If so, then it's not minimal.
    • Autoformat the code (Prettier for HTML, CSS and JS, Black for Python).
  2. Show any error messages, with the full stack trace, as text, as generated by the reprex code.
  3. Include the site you're automating, preferably with the URL or HTML string in the code itself.
    • If the site is private and you can't provide access, then your problem is not reproducible.
    • Try to come up with a sample site that reproduces the relevant issue, either a custom HTML/JS page, or a public site (preferably a site specifically created for testing browser automation).
  4. Include versions for all packages. Include versions and details for system or environment, if relevant.
  5. Show exact expected output. If you're scraping to JSON or CSV, show a couple of objects or rows so the desired result is clear. If the automation involves filling out a form, show screenshots of the steps and completed form filled manually.
  6. Finally, ask a specific technical question about your code.

Examples

Great

OK

Bad

Watch out for XY Problems

Simplifications can be good, but always provide context for what you're trying to achieve. When askers aren't able (or don't bother) to provide the actual page they're automating, they often simplify the problem in a way that invalidates answers or makes answers have to use approaches that don't make sense, frustrating bother the asker, answerer and future visitors.

For example, it's OK to ask "how do I select and click a button on page X?", but also mention your broader goal in clicking the button. If your goal is to scrape some data that can actually be accessed without the DOM, then clicking the button (problem Y) wasn't even a necessary thing to need to do in order to get the actual result (problem X).

Using HTML-only Snippets

If you're providing an HTML snippet, make sure async JS behavior, iframes, shadow roots or cloudflare blocks aren't the real reason you can't select an element.

Provide at least the full HTML tree up to the document root, possibly removing irrelevant excessive <div>s. Often, the correct and best way to locate an element is not using its own attributes, but relying on its ancestor tree.

If an element is part of a list, provide at least two items of the list so it's clear what distinguishes one from the other.

Providing Screenshots

Screenshots can be misleading if it's not clear what state the page is in when the screenshot was taken, or whether the screenshot is from an automated browser or human session. Things change dramatically when you switch to browser automation from a human browsing session, and even more so when you go headless. Don't assume things are the same.

Use caution with screenshots of dev tools component trees. These are dynamic and may not reflect what you see when your automation script runs.

See Also

Browser Automation Playground Sites

These are nearly perfect for a reprex, because they're unlikely to change and isolate a single piece of functionality cleanly. The only downside is that they sometimes disappear over time, so they're a bit suboptimal over making your own simple site.

Of course, if the behavior is too complex to capture in a simple site or a browser playground, you can share the actual site as a last resort--better than nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment