Tuesday Tip Day – WebDriver Screenshots
As part of a new series of articles for the site, I’m going to post a brief article containing a random tip for improving your automation skillset, from Selenium to Powershell.
In this first week, we’re going to look at taking screenshots in Selenium WebDriver tests, which can be extremely useful for providing evidence for auditing as well as giving more feedback for your test failures.
Some sites will just teach you how to use the basic WebDriver call GetScreenshot(), which while useful, can be quite limiting in terms of being able to control exactly what is captured. So in this article, we’re going to go in depth to make sure you’re getting the most out of your screenshots when running your tests.
First off, let’s start with that basic WebDriver call:
public void TakeFullScreenshot(IWebDriver driver, String filename) { Screenshot screenshot = ((ITakesScreenshot)driver).GetScreenshot(); screenshot.SaveAsFile(filename, ImageFormat.Png); }
Using this will take a screenshot in the file name location provided. Simple as that, it’s extremely easy to use and can be called anywhere in your tests. It will capture the current view inside of the browser running tests. So if you’re planning on capturing evidence for testing, then it’s probably a good idea to maximise your Driver windows at the start of the test to ensure you’re capturing as much as possible.
An improvement on this would be to make sure no matter the size of the browser, we capture the current page in its entirety. As quite often, even in a maximised window you might not be seeing everything if there is content that requires scrolling down or across.
For us to get the entire web page in a screenshot, we’ll have to write our own screenshot method from scratch. So what does that look like:
public Image GetEntireScreenshot() { // Get the total width and height of both the entire page and the current viewport var totalWidth = (int) (long) ((IJavaScriptExecutor) _driver).ExecuteScript("return document.body.offsetWidth"); var totalHeight = (int) (long) ((IJavaScriptExecutor) _driver).ExecuteScript("return document.body.parentNode.scrollHeight"); var viewportWidth = (int) (long) ((IJavaScriptExecutor) _driver).ExecuteScript("return document.body.clientWidth"); var viewportHeight = (int) (long) ((IJavaScriptExecutor) _driver).ExecuteScript("return document.body.clientHeight"); // We only care about taking multiple images together if it doesn't already fit var ss = (ITakesScreenshot)_driver; ((IJavaScriptExecutor)_driver).ExecuteScript($"window.scrollTo({0}, {0})"); if (totalWidth <= viewportWidth && totalHeight <= viewportHeight) return ScreenshotToImage(ss.GetScreenshot()); var rectangles = new List<Rectangle>(); // Loop until the totalHeight is reached for (var y = 0; y < totalHeight; y += viewportHeight) { var newHeight = viewportHeight; // Fix if the height of the element is too big if (y + viewportHeight > totalHeight) { newHeight = totalHeight - y; } // Loop until the totalWidth is reached for (var x = 0; x < totalWidth; x += viewportWidth) { var newWidth = viewportWidth; // Fix if the Width of the Element is too big if (x + viewportWidth > totalWidth) { newWidth = totalWidth - x; } // Create and add the Rectangle rectangles.Add(new Rectangle(x, y, newWidth, newHeight)); } } // Build the Image var stitchedImage = new Bitmap(totalWidth, totalHeight); var previous = Rectangle.Empty; foreach (var rectangle in rectangles) { // Calculate the scrolling (if needed) if (previous != Rectangle.Empty) { ((IJavaScriptExecutor)_driver).ExecuteScript($"window.scrollBy({rectangle.Right - previous.Right}, {rectangle.Bottom - previous.Bottom})"); } // Calculate the source Rectangle var sourceRectangle = new Rectangle(viewportWidth - rectangle.Width, viewportHeight - rectangle.Height, rectangle.Width, rectangle.Height); // Copy the Image using (var graphics = Graphics.FromImage(stitchedImage)) { graphics.DrawImage(ScreenshotToImage(ss.GetScreenshot()), rectangle, sourceRectangle, GraphicsUnit.Pixel); } previous = rectangle; } return stitchedImage; } private Image ScreenshotToImage(Screenshot screenshot) { using (var memStream = new MemoryStream(screenshot.AsByteArray)) { return Image.FromStream(memStream); } }
There’s quite a lot of code here, and it might seem daunting but let’s break it down in to more manageable chunks.
The first thing we’re doing is taking measurements of the webpage. Remember, we’re not just capturing what we can see inside the current window, we’re capturing stuff that might be off screen as well so we have to find out the maximum width and height of the current page. We do that with some Javascript execution that returns the measurements we’re looking for. Next we’re getting the height and width of our view, basically what can be seen currently in our window.
Once we have those, we’re going to be using these measurements to break our webpage down in to rectangles. The reason for this is that the method we are writing is taking several screenshots and then stitching them together to create our full webpage screenshot. It will do this by taking a screenshot of what’s currently in view, and then scrolling across, down or even both if required to take the next screenshot. Once it has done this and covered the entire page, it will use a stitching tool to create one big screenshot. For us to do this, we need to work out what we can currently see against how much there is left to see so we know exactly how much we need to move the view to take the next screenshot.
So our first rectangle size is based on the size of our view, and the remaining number of rectangles are decided on how many same size rectangles we can fit in what’s left of the webpage.
Once it has done this, the method simply determines if and when it needs to scroll, if at all (if it doesn’t, it just takes a screenshot right at the start and returns that), and takes a screenshot at each point where required using the driver.TakeScreenshot method.
The remaining code is where the clever bit happens and it builds our Bitmap image and stitches it all together. I’m not going to pretend I understand the technical side of what is going on there, it’s one of those things where you just need to understand that it works, not necessarily how it works.
And there you have it, the resulting image returned from this code will now show everything that is on that page. This can be used to compliment the view based screenshot that’s built in to Selenium or used instead of, depending on your evidence requirements.