A while back, I was tasked with assessing the viability of doing Automated Visual QA within the organization I worked in. The idea was that we would create Automated Selenium tests that would be able to perform run time visual checks on the pages and elements as we needed and return a pass or fail to say if the content had changed enough to warrant a bug being raised.
I assessed a few options, one of the commercial tools considered was Applitools, which integrates seamlessly into Selenium. In fact, their own AppliTools driver is merely a custom build IWebDriver class, so it can be wrapped around a Chrome WebDriver just as easily as an Internet Explorer driver.
All in all, Applitools was a good candidate, however, I also wanted to see if it was feasible to perform the validations ourselves internally, as our company had some concerns around the idea of sending screenshots of our test environments or UAT environments to a third party provider such as Applitools. After all, while we are agreeing to Applitools' Terms and Conditions, they are not necessarily signing an Non Disclosure Agreement to ensure that our screenshots remain private and deleted when no longer needed.
So to begin with developing our own solution, I looked at some open source libraries and AForge looked very promising.
It had Imaging libraries that were designed to give % ranking on similarity between images.
All in all, it looked like AForge would be a simple solution to use, but then I encountered other issues.
For any visual comparison, you need two things.
- The image you are comparing
- The exemplar image you are comparing against
For number 2, this was going to be a static image taken of the element or page as it should appear. To ensure that we captured the exemplar image in such a state that it has a chance of matching the image taken at test execution time, we used our test itself to capture the image.
This was essentially accomplished by calling a custom screenshot method that I created that would use X,Y coordinates to capture the bounding box of the element we are interested in.
It should be noted that if you perform your validations at a lower level, you stand a higher chance of getting more accurate results.
Capturing the whole page may be quick, but it can also be dirty. You can end up with false positives being raised due to text differences, offsets, date times etc.
When using AForge, I discovered that a 'DifferenceMask' was a really good way to perform the comparison. If the comparison fails, it can output a difference image to show the actual differences as they appear on screen. This also allowed me to debug and tinker with the tool to make it more robust and reliable.
I found that when the tool ran on other browsers and even if just ran remotely on selenium grid node, sometimes this would result in 1-2 pixel offsets in the images.
1-2 pixels doesnt sound like a whole lot, but it can be enough to make the entire image register as 'different'.
How do we solve that?
The way I solved it was to create an algorithm that would eliminate the false positives, but also extrude and grow the real issues we care about. This was accomplished through a sequence of brightening the image, while burring it at the same time.
Taking into consideration that the image at this stage is a Difference Mask image - so it is typically inverted before it gets to this stage:
public static Bitmap EnlargeDifferences(Bitmap btm)
FiltersSequence filterSequence = new FiltersSequence();
GaussianBlur filterBlur = new GaussianBlur(3.4D, 800);
HSLLinear filterBrighten = new HSLLinear();
// configure the filter
filterBrighten.InLuminance = new Range(0.00f, 1.00f);
filterBrighten.OutLuminance = new Range(0.00f, 10.00f);
// Do 5 passes - to try and expand the changes
FilterIterator fi = new FilterIterator(filterSequence, 5);
Bitmap bReturn = fi.Apply(btm).To24bppRgbFormat();
What we are aiming to accomplish with this code, is that 1-2 pixel differences will be blurred out of existence, but anything that remains will be brightened so the next pass of the blur will not remove it from existence.
An example can be seen below:
Lets imagine our Exemplar image is this:
However, at run time, we capture this:
Selenium will have no easy way to determine that the image has not loaded due to a 404 issue.
It will see that a div or img is in the DOM, and assume that it must be fine.
With AForge, you can however build a difference map.
It then shows something like what you see above.
The 1-2 pixel false positives I told you about, looks like this - see below.
To eliminate these false positives, but retain the real difference, namely that the car image has not loaded, we use the process of Blur and Enhance.
An other example of how it might look would be :
In the above difference map, the car is not meant to be hidden by a popup, but it is.
The important thing with Visual QA is not that the tool can understand what the differences are, but that it can spot them and distinguish them from false positives. After all, we only want a boolean result. Does it match, or does it not.
For the final comparison, I recommend using an ExhaustiveTemplateMatching from AForge.
Compare your blurred image against a black image of the same dimensions. (Difference mask images are always set against a black background - assuming the images match)
var tm = new ExhaustiveTemplateMatching(similarityThreshold);
var results = tm.ProcessImage(blurred, black.To24bppRgbFormat());
Your results class will then contain a % match, which you can then fail or pass your test on.
If you really want to identify the causes of the image differences, then AForge provides the ability for you to identify blobs in your blurred image, and then draw blob boundaries around the same coordinates.
I would recommend drawing these boxes on the un-blurred image, so the image makes more sense to the tester reviewing the results.