Ninja QA

Visual QA with AForge

A while back, I was tasked with assessing the viability of doing Automated Visual QA within the organization I worked in. The idea was that we would create Automated Selenium tests that would be able to perform run time visual checks on the pages and elements as we needed and return a pass or fail to say if the content had changed enough to warrant a bug being raised.

I assessed a few options, one of the commercial tools considered was Applitools, which integrates seamlessly into Selenium. In fact, their own AppliTools driver is merely a custom build IWebDriver class, so it can be wrapped around a Chrome WebDriver just as easily as an Internet Explorer driver.

All in all, Applitools was a good candidate, however, I also wanted to see if it was feasible to perform the validations ourselves internally, as our company had some concerns around the idea of sending screenshots of our test environments or UAT environments to a third party provider such as Applitools. After all, while we are agreeing to Applitools' Terms and Conditions, they are not necessarily signing an Non Disclosure Agreement to ensure that our screenshots remain private and deleted when no longer needed.

So to begin with developing our own solution, I looked at some open source libraries and AForge looked very promising.
It had Imaging libraries that were designed to give % ranking on similarity between images. 

All in all, it looked like AForge would be a simple solution to use, but then I encountered other issues.

For any visual comparison, you need two things.

  1. The image you are comparing
  2. The exemplar image you are comparing against

For number 2, this was going to be a static image taken of the element or page as it should appear. To ensure that we captured the exemplar image in such a state that it has a chance of matching the image taken at test execution time, we used our test itself to capture the image.
This was essentially accomplished by calling a custom screenshot method that I created that would use X,Y coordinates to capture the bounding box of the element we are interested in.
It should be noted that if you perform your validations at a lower level, you stand a higher chance of getting more accurate results.
Capturing the whole page may be quick, but it can also be dirty. You can end up with false positives being raised due to text differences, offsets, date times etc.


When using AForge, I discovered that a 'DifferenceMask' was a really good way to perform the comparison. If the comparison fails, it can output a difference image to show the actual differences as they appear on screen. This also allowed me to debug and tinker with the tool to make it more robust and reliable. 

I found that when the tool ran on other browsers and even if just ran remotely on selenium grid node, sometimes this would result in 1-2 pixel offsets in the images.
1-2 pixels doesnt sound like a whole lot, but it can be enough to make the entire image register as 'different'.

How do we solve that?
The way I solved it was to create an algorithm that would eliminate the false positives, but also extrude and grow the real issues we care about. This was accomplished through a sequence of brightening the image, while burring it at the same time.
Taking into consideration that the image at this stage is a Difference Mask image - so it is typically inverted before it gets to this stage:

public static Bitmap EnlargeDifferences(Bitmap btm)
            FiltersSequence filterSequence = new FiltersSequence();
            GaussianBlur filterBlur = new GaussianBlur(3.4D, 800);
            HSLLinear filterBrighten = new HSLLinear();
            // configure the filter
            filterBrighten.InLuminance = new Range(0.00f, 1.00f);
            filterBrighten.OutLuminance = new Range(0.00f, 10.00f);


            // Do 5 passes - to try and expand the changes
            FilterIterator fi = new FilterIterator(filterSequence, 5);

            Bitmap bReturn = fi.Apply(btm).To24bppRgbFormat();
            return bReturn;

What we are aiming to accomplish with this code, is that 1-2 pixel differences will be blurred out of existence, but anything that remains will be brightened so the next pass of the blur will not remove it from existence.

An example can be seen below:
Lets imagine our Exemplar image is this:

However, at run time, we capture this:

Selenium will have no easy way to determine that the image has not loaded due to a 404 issue.
It will see that a div or img is in the DOM, and assume that it must be fine.

With AForge, you can however build a difference map.

It then shows something like what you see above.

The 1-2 pixel false positives I told you about, looks like this - see below.

To eliminate these false positives, but retain the real difference, namely that the car image has not loaded, we use the process of Blur and Enhance.


An other example of how it might look would be :

In the above difference map, the car is not meant to be hidden by a popup, but it is.

The important thing with Visual QA is not that the tool can understand what the differences are, but that it can spot them and distinguish them from false positives. After all, we only want a boolean result. Does it match, or does it not.

For the final comparison, I recommend using an ExhaustiveTemplateMatching from AForge.
Compare your blurred image against a black image of the same dimensions. (Difference mask images are always set against a black background - assuming the images match) 

var tm = new ExhaustiveTemplateMatching(similarityThreshold);
var results = tm.ProcessImage(blurred, black.To24bppRgbFormat());

Your results class will then contain a % match, which you can then fail or pass your test on.

If you really want to identify the causes of the image differences, then AForge provides the ability for you to identify blobs in your blurred image, and then draw blob boundaries around the same coordinates.
I would recommend drawing these boxes on the un-blurred image, so the image makes more sense to the tester reviewing the results.


Page object model - with added Synchonization

Perhaps one of the most common approaches to Web UI Automation is the concept of 'Page Object' model. The idea that you can represent the page before you via code or some other interface to facilitate interactions with its various elements. For most, the page object approach is a simple case of cramming your objects into a class and using the objects directly, not caring about the page they are in or any of the logic that may need to be considered beforehand.

For me however, I have a somewhat different approach..

It is true that Selenium provides an implicit synchronization feature which allows you to 'not care' about waiting for objects to display. Selenium will naturally have to wait for the object to appear before it can interact with it. While this feature works well, sometimes it is not enough. It can be augmented with Page Object Synchronization.


 Firstly, the interface I am going to be using for this system is really just added for best practice. It is designed to encourage users to provide a page name and description for the page in question.

//Use of interface helps to prevent user error when creating new page object models
    interface IPage
        void GetPage();
        string GetPageName();
        string GetPageDescription();


When I use this interface to create the base page class we end up with a class like following. Yes, I am also implementing IDisposable, I will explain why soon.

public class Page : IPage, IDisposable
        /// <summary>
        /// Use this constructor if you are not concerned about synchronization
        /// </summary>
        public Page()


        /// <summary>
        /// Use this constructor if you wish to automatically synchronize on an object on screen.
        /// </summary>
        /// <param name="locator"></param>
        public Page(By locator)
            this.LocatorForPage = locator;

        public void GetPage()

	public string GetPageName()
            return null;

        public string GetPageDescription()
            return null;

Anything that returns 'null' above- is self explanatory. It is recommended you populate those with useful information. These may be useful when you want to use logging and the like to tell the test framework what page it failed on.

Similar to how we use our BaseElement / Specialized object classes, we can also feed in a 'locator' or 'By' class to the constructor for this class. The purpose for this is so we can tell the framework what objects we should 'synchronize' on. There is no point in looking for the Username field on a login form, if the login button has not appeared yet for instance. As a rule of thumb, I recommend your synchronization object should be the last object you anticipate appearing on screen.


IDisposable will require us to implement a dispose method.

public void Dispose()
            LocatorForPage = null;
public By LocatorForPage { get; set; }

When we use the constructor, we are storing the By information to this variable, when we dispose of the page class, we want to release it.
.Net C# should take care of this cleanup for us, but it is generally a good idea / principle the to try and dispose / clear up resources whenever you can.

Once again, similar to the BaseElement class and the 'FindObject' method. The 'GetPage' method is going to do much of the heavy lifting, but because we have already implemented most of the object functionality in our BaseElement class, we can reuse that functionality here.

public void GetPage()
            if (DetectErrorPageEventHandler != null)
            //If a locator is provided, sync on it, else dont
            if (LocatorForPage != null)
                    //Syncs on the object if found
                    //Else raises an exception
                    BaseElement be = new BaseElement(LocatorForPage,true);
                catch (Exception e)
                    if (GetPageName() == null)
                        this.PageName = this.GetType().Name;
                    throw new Exception("Page not loaded:(" + GetPageName() + "), The object identified by " + LocatorForPage + " was not located within the timeout period. This may indicate that the page has not loaded.");

You can see that this method does not concern itself with any timeouts or WebDriverWaits - instead it is just trying to instantiate the BaseElement class on the locator you have specified for the page. In the event the object is not found within the implicit wait timespan and exception will be thrown to say that the page did not load.

You may be wondering what the DetectErrorPageEventHandler is...
In most company / product websites - the product will typically have an 'error' page. A catch all page that is displayed in the event that something went wrong.
This could be a maintenance page or perhaps just the bland IIS error page. 
When these pages appear, you know instantly your page has not loaded and will not load from that point on. However, Selenium would continue to wait for the timeout to occur. It has no idea or concept of what an error page looks like.

With event handlers, you can tell it how to recognize these error pages.

Add the following region to your page class.

#region Error Detection
        /// <summary>
        /// This delegate is used to define an Error Page detection method
        /// </summary>
        public delegate void DetectErrorPageDelegate();

        /// <summary>
        /// This event is fired when an error page is detected.
        /// </summary>
        public static event DetectErrorPageDelegate DetectErrorPageEventHandler;

First thing to be aware of is that these handlers are static, so they will be applied globally within your test project. You do not set error page detection for a single page class, you will be setting it for all page classes.

How do you inject your detection code into the class?

At the start of your test, you need to call something like this.
This is still wrote in the syntax of using Selenium along with Specflow. If you are just using nUnit, then you might use 'TestFixtureSetup' instead.

public static void SetupErrorDetection(){
	Page.DetectErrorPageEventHandler += ErrorDetection;

public static void ErrorDetection(){
	BaseElement errorMessage = new BaseElement(By.Id("error_warn"),false);
		throw new Exception("The test environments error page was detected - please investigate...");


This basically means that when you instantiate a Page class, it will automatically check for the existence of the 'error_warn' element.
Note the false being provided to the second argument in the BaseElement. This is to prevent it from performing the default implicit wait.
We do not want to make our page object classes wait for 15 seconds for an object that we do not want to appear.
This will allow the code to instantiate the errorMessage object, and then perform a straight assertion on whether the errorMessage exists.
Note - in Selenium there is the difference between existence and displayed. So you may need to adjust your code to the way your DOM works. More recent technologies such as bootstrap and jquery and the like result in objects always being present, but not visible. 

Putting this all together, how does this look in a real live example...

public class LoginPage : Page
        public LoginPage()
            : base(By.Id("signin_page"))


        public Textbox AccountNumberTextbox()
            return new Textbox(By.Id("AccountNumber"));

        public Element AccountNumberValidationElement()
            return new Element(By.Id("AccountNumberValidation"));


If our login page class looks like the above, then our usage of it could look like...

	using(LoginPage page = new LoginPage()){

It is my opinion that the 'using' statement helps keep your test code clean and concise - it prevents you from interacting with objects that exist on 'other' page classes and allows you to synchronize on pages more accurately than simple implicit waits.

C# also allows you to do more complex page object model approaches using inheritance.

Lets imagine you have a menu bar that is accessible on ALL pages within your application.
You could define this as one page object class, and then have it inherited - this would allow you to merge the contents of two page object classes.

public class MenuBar : Page
          public MenuBar() : base(By.Id("menu_bar_authed")){


          public Link AccountInformation(){
                     return new Link(By.Id("acc_info"));


If this is your menu bar page, it is synchronizing on an element 'menu_bar_authed' - this is just something I made up...

public class AccountSummary : MenuBar
          public AccountSummary() : base(By.Id("acc_summary_index")){


          public Element AccountNumber(){
                     return new Element(By.Id("acc_number"));

You can see in the example above, we are inheriting from MenuBar instead of the 'Page' class.

The side effect of us doing this and then feeding in a locator for something other than the MenuBar locator, is that the 'using' statement that we use will only synchronize on the locator specified by the inheriting class.
Eg: Instead of synchronizing on the MenuBar, we will Synchronize on something called 'acc_summary_index'.

This is fair enough, because maintaining with object oriented principles, anything that inherits is naturally a 'superior' class. We do not care about the menu bar at this stage, we want to synchronize on something relevant to the AccountSummary page.

	using(LoginPage page = new LoginPage()){

Using the approach above, you can see that the account information link and the account number text element are defined in separate classes, but accessible within the same using statement using an inheritance model.