Skip to content

Instantly share code, notes, and snippets.

@SimplyAhmazing
Created October 30, 2017 20:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SimplyAhmazing/0d37acaa27b15bd9a0d6eeca631f5ed5 to your computer and use it in GitHub Desktop.
Save SimplyAhmazing/0d37acaa27b15bd9a0d6eeca631f5ed5 to your computer and use it in GitHub Desktop.

Made some updates to this PR due after following up on the error Akash got in running the tests locally. In turn this past week I have to come to learn a lot about Visual Regression Testing that I want to share here and discuss with you all. The things I want to share are organized in these topics,

  • Non deterministic rendering of images
  • Electron/Chrome non deterministic states
  • Visual regression work flows
  • Tools & Literature

On Non deterministic rendering

I was aware that rendering is non-deterministic across platforms (win vs mac vs linux). It also turns even rendering even on the same platform can be different (e.g. my Mac book air vs Travis OS X build worker). These differences can occur in several layers in the stack (hardware, OS, Application GUI framework). Specific things include GPUs can generating different pixel values due to floating point calculations, display DPI, how anti-aliasing is handled, how fonts are rendered, etc.

From reading around I’ve learned that the best way to control for these sorts of issues is to use near identical environments for generating screenshots and comparisons.

Before I was using a custom algorithm for comparing screenshots (borrowed from WebTorrent) but I found the library looks-same is much better and can accounts for,

  • Anti-aliasing differences
  • Ignoring the caret on input fields (the blinking text cursor)
  • DPI adjustments

looks-same can also generate screenshot diffs to show you where the comparison failed which was really helpful in debugging.

There’s another popular JS library for generating diffs, Resemble.JS but it required Node 8. It may perhaps have better anti-aliasing support than looks-same but I haven’t been able to compare thoroughly.

Electron/Chrome non-deterministic rendering states

Sometimes the Zulip app doesn’t render consistently. Two things I came across are,

  • scroll-bar auto-hiding - When the automated test added a server URL and the page navigated to the login page for a community. Sometimes the scroll bars in that web view would be hidden and sometime they would be displayed. In general this is a feature of Chrome where it can auto-hide the scroll bar for you. I wasn’t able to find a setting where you can force electron be consistent with this setting. I was able to get around it by injecting CSS to auto-hide the scrollbars on the page before every screenshot.
  • Auto-focus location - When you go to the login page, the login page will not be scrolled to the same exact location. I was talking to @Lev about this and says it seems that Zerver auto-focuses on the input element. I noticed that the scroll location can be off by a pixel or two upwards or downwards between test runs. I was able to get around this by telling the webview to the bottom before a screenshot is taken.

There may be more such intricacies we’ll have to handle as they come up.

Visual Regression Testing Workflows

My research on visual regression workflows shows that they almost always require human intervention to overcome technical issues in image comparisons. Things like font rendering, blinking cursors, scrollbars / scroll location, anti-aliasing in sensitive areas such as a logos/icons don’t equate to a reasonable test failures. For these sort of differences a human operator can choose to ignore the subtleties reported by the image comparisons.

Brett Slatkin has this simple workflow that they used at Google which I’ve copied here:

  1. Establish a baseline release with an initial set of screenshots of your site.
  2. Create a new release with a new set of screenshots of your new version.
  3. Manually approve or reject each difference the tool finds.
  4. Manually mark the new release as good or bad.
  5. Repeat. Your approved release will become the baseline for the next one.

There are tools that are used to facilitate this reviewing.

Tools

There are a lot of tools that can integrate with the PR review process to facilitate the manual review process for screenshots. A lot of these tools are geared for the web (e.g. percy.io), some are open source but not feature complete/usable for Zulip (e.g. Pitsa).

I did come across a potentially very promising service: Applitools. Applitools notes that they support desktop apps and mobile. They also claim to have “advanced computer vision and AI” algorithms so they may be better to use than the aforementioned image comparison libraries.

Conclusion

When I was developing the screenshot tests for windows/linux I was able to see how the app rendered on those platforms using their CIs and test suite. I found this to be much easier than having to setup a VM, clone the project, and run the app to verify things work as they should. A workflow that would have to be repeated for every platform.

But for visual regression testing we’ll need to use consistent machines for generating screenshots, perhaps CI machines only. Using CI machines as the source of truth for generating screenshots assumes that the CI services use consistent hardware, worst case scenario is we use privately hosted machines that we can control.

I’m curious about looking into a tool like Applitools that can facilitate the PR review process and image comparison step. Not sure if would be more tooling/integration than is desired at this point in the project. This is also a paid service. Alternatively consider other free options. In the long term I’m not sure how maintainable it will be to commit images to the repo and try to use diffing algorithms with our own manually tuned thresholds. This maybe be a hindrance down the road.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment