If the application that you are performance testing will be operating over a WAN, it is really important to test whether the reduced bandwidth and higher latency of the network link will increase transaction response times to an unacceptable level.
If you are being thorough, you should also run some tests to determine whether your response times will be impacted by other network traffic on the WAN (and vice versa).
Let the following anecdotes about testing two different applications serve as examples:
The first application was to be used by staff who would be interacting directly with customers. There would be up to 10 concurrent users per site, each spending the majority of their time using this application. In addition to the network traffic from the application, there would be back-office traffic like web browsing, email, and software packages being deployed to workstations. The Project had some concerns about the impact of this traffic (particularly package deployment) on end-user response times for this business-critical application.
A leased line with the same specifications as the (future) Production WAN links was provisioned for the test environment. A LoadRunner agent was set up at the end of the WAN link to generate traffic equivalent to 10 concurrent users. The background network traffic was generated with NetIQ’s Chariot tool.
Chariot sends packets (with a typical packet size for each type of traffic to be emulated) at specified volumes between user-defined ports on each end-point, which sit at either end of the WAN link.
The network team spent several days determining the profile of the background traffic and creating a number of traffic scenarios in Chariot. The different traffic scenarios were run at the same time as a worst-case profile of heavy application usage. The response times from these scenarios were compared with end-user performance from a scenario with no background traffic (and, obviously, against the performance requirements).
The second application was to by used by staff working in a warehouse. There would only be 1 or 2 concurrent users per site, and the traffic would be low volume – mainly related to printing picking slips and labels. This was tested using one of the warehouse WAN links; using real Production infrastructure meant that the tests had to be run ridiculously early in the morning to ensure that they would not interfere with normal Business activities.
The testing was nowhere near as sophisticated as the first example. A user was located at the warehouse with a stopwatch to measure response times. It was not necessary to use LoadRunner as there would only be a small number of users at the site, and LoadRunner would not be able to measure the time it took for output to appear at the printers.
A baseline with no additional network traffic was run. Multiple samples were taken for each business process to provide a meaningful average response time that could be reported on.
To emulate network activity, I ran a batch file that repeatedly copied a zipped file across the WAN (the file was zipped in case any network components were trying to compress traffic). The network team, who were monitoring the link, reported that the link was at 40% utilisation. As file transfers had a lower QoS priority than application or printer traffic, no significant change in response time was expected.
The business processes were re-run, and results were recorded. As expected, there was no statistically significant difference in response times.
Running a second instance of my batch file increased the network link utilisation to 90%. This time response times were higher, but still within requirements.
Just for the sake of completeness, here is the batch file:
:: Copy file.zip from shared drive
:: to the local D:\TEMP directory.
:: Loop forever (or until Ctl-C).
:: Rename the copied file as file<iteration>.zip.
SET /A i=1
echo copying file%i%...
copy /Y \\dws0368\Temp\file.zip D:\TEMP\file%i%.zip
SET /A i=%i+1
As with any sociability test, you can put in a level of effort that is likely to reduce the risk of unexpected behaviour in Production to an appropriate level. You could spend a lot of time making your usage profile “perfect” and testing every scenario, but it saves valuable time to concentrate on a probable-worst-case and simplify you usage profile as much as is practical.
Of the two examples, one took a couple of hours to prepare and execute, the other took a few days and involved significant work from other groups. While I would prefer to do this testing the first way if possible, the quick-and-dirty second way is an example of “good enough” testing (and, with historical graphs of network usage, is almost as defensible as the first example).
Comments are closed.
Very interesting post, thanks.
I remember you doing this …