Errors are bad

facepalm It seems unnecessarily obvious to even bother making the statement that “errors are bad”; surely this is an idea that everyone agrees with, like “crime is bad” or “you shouldn’t put your underwear on backwards”. But a lot of performance testers that I have worked with don’t seem to care too much about windows errors. A senior performance tester with more than 15 years experience at some of the largest companies in Australia recently said to me:

“Yeah, as long as the error rate is below 5%, I don’t bother putting it in my report.”

Are. You. Kidding?

How could someone with so much experience be so wrong? How much bad advice had he given to big IT projects over his career? Did he really have 15 years experience, or just the same 6 months repeated over and over again?

Performance Testing is not just about response times

I know that most people call it “performance testing”, but a performance tester’s job is a bit broader than just measuring response times while the system is under load.

If response times are fine when there is no load on the system, but increase when the system is under load, this is a sign that there is resource contention, and threads of execution must queue while waiting for access to a resource (whether this is a slice of CPU time, or an allocation from the database connection pool, or whatever).

Hitting a resource limit (e.g. a max connections configuration setting) doesn’t always cause response times to increase; it can also cause errors, which might only happen for an instant (e.g. when garbage collection runs)

Another kind of problem that becomes apparent under load is related to threadsafety. Sometimes code (or architectural design decisions) mean that – deadlocks on the database. static java variables, using components that are not designed to be threadsafe. race conditions.
making a variable static (belonging to the class, rather than the object instance) in a Java class. as easy as accidentally putting the modifier “static” before a variable in a Java class, making it a class variable (that is shared) instead of an instance variable (that belongs to an oject).

Lots of people seem to have difficulty understanding errors if they are not consistent. If errors occur intermittently, they have a tendency to ignore them.

Here are some examples of errors I have seen:

A system at a large bank that that would sometimes throw an XML parsing exception on the steps that did a balance enquiry or a value transaction (a deposit or a withdrawal). Project management wanted to deploy the system, so they created a new requirement: that the system could throw errors, but they must occur for less than 0.05% of transactions (5 in 10,000). It was discovered that the system was occasionally getting the wrong message from the MQ queue. The real error rate was actually much higher than 0.05%, because the system would only throw an XML parsing exception when it received the wrong type of message (e.g. a balance enquiry response instead of the response for a value transaction, or vice versa). It would not throw an error when it received a response for the wrong account – as long as it was the right type of response.
a system that stores medical records that has “data saved for user98”, but I was
A large online when this had happened 4 times on each server (4 CPUs), response times became poor, even with no load on the server. 1 user would get an error (under load), . Collection class for the shopping cart was not threadsafe. infinite loop.
a node has crashed, and the load balancer has not automatically removed it from the pool. node restart would take 15 minutes (they ran a big inefficient Java app on WebSphere).

These kinds of problems can be difficult to fix, but they are solvable; and the whole point of conducting a performance testing cycle for an IT system is to detect problems like this, and make sure that they are fixed.

The bottom line is: if you are generating load from the internal network (not the unreliable public Internet), then your performance tests should have a 0% error rate.

Severity = probability x impact

Not all software errors are created equal. I will care somewhat more if your aircraft control system has glitches (extreme turbulence, huh?) than if your ticketing system loses my booking.

I once worked on an e-commerce project where the website didnt’ work on Safari. The development team said “don’t worry, only about 5% of users have Safari”. The business owner quickly made it clear that he wasn’t keen on an instant 5% drop in orders when the new website was launched.

Even extremely rare errors can be a problem, if the impact is high enough. A 1 in 9 billion bug in the floating point division operation for Pentium processors was enough to force Intel to offer replacements in 1994, causing a loss of $475M.

An error may have consequences for:

human life: e.g. if the IT system sometimes mixes up blood test results, then you risk giving a patient a transfusion of the wrong blood type or infecting them with HIV
financial
legal/regulatory
reputation
customer annoyance
staff annoyance

If the error occurs in an obvious way, there might be a workaround (retry?) which reduces the impact but it shouldn’t be assumed that a workaround works until it has been investigated.

A performance tester working on an administration system for hospital patients found that approximately 1% of the “admit patient” transactions were failing with an HTTP 500 error. Without further investigation, it was decided that the impact was low, as the operator could just re-enter the data and re-submit if they got the error. No thought was given to whether it was actually possible to re-enter the data (Record locked? Duplicate record created? Partial record created?) or to the feelings of the operators who are likely experience the error a couple of times each week and have to ask the emergency room patient for their name/address/birthdate/medicare number a second time.

Most people are bad at maths

Percentages can be a little bit too abstract for the human brain to really grasp. For example, 1% seems like a trivially small number, but if you have a 1% error rate, and you are processing 50,000 orders/week, then you are going to have to deal with 500 irate customers per week (or 26,000 customers per year) demanding to know why you took their money but didn’t send them their product.

People tolerate behavior in IT systems that they wouldn’t accept from real people. Imagine a business person’s horror if there was an employee who would hang up on other people’s calls every time they walked through the call centre; or who would occasionally visit the fulfilment centre, unload a couple of boxes from the back of the delivery truck, unpack the box and put the items back on in the warehouse.

One of Australia’s largest companies spent billions on a new IT system. During performance testing, a small percentage of messages were timing out in the system and getting stuck on queues. Project management decided to switch over to the new system without waiting to fix the problem. Ever since deployment they have had a staff of 300 people (in a third-world country) whose sole purpose is to track down orders that are lost in the system.

As a performance tester you should help your stakeholders understand your test results and their consequences.

You are probably calculating your error rate the wrong way

Let’s do some maths…

Here is a table showing transaction volumes from a performance test (let’s assume that it has been filtered to show an hour-long steady-state period). For each order that is placed, the virtual user will perform 8 searches, and add 4 items to their shopping cart. Your task is to calculate the error rate.

Copy and paste it into a spreadsheet. Go on, I’ll wait…

Transaction	Pass	Fail
01_load_front_page	1000	0
02_login	1000	0
03_search_for_product	8000	0
04_add_to_cart	4000	0
05_proceed_to_checkout	900	100
06_confirm_order	900	0
07_logout	900	0
Total	16700	100

Did you get around 0.6%? Did you do it by dividing the number of failed transactions by the total number of transactions? 100/(16700 + 100) = 0.00595238095238095238095238095238?

If you did, then you are looking at it the wrong way. Over the hour, users attempted to place 1000 orders. 100 of these failed at the proceed_to_checkout step. At a business-process level, your application actually has a 10% failure rate.

If a business stakeholder is trying to make an informed decision on whether to deploy a new system or to delay (and keep burning money), then feeding them bad data can lead to (commercially) fatal business decisions. Most Performance Test Summary Reports that mention a specific error rate have calculated it the wrong way.

To make it easy to calculate error rates at a business process level, not just at a transaction level, you should put “wrapper” transactions in your LoadRunner scripts, so you can clearly see how many business processes were completed during your test, and how many passed and failed.

As a performance tester you should…

In addition to calculating your error rates correctly, you need to make sure that your automated tests are detecting all the errors that occur. This means:

Your scripts have checks for each step of the business process.
Your scripts are checking for error messages.
You check the server logs for error messages at the end of the test.
You perform a reconcilliation at the end of the test, to see if all the orders you think you created are actually in the system (maybe the system is failing silently).
You don’t modify your test until the error error you see under load “goes away”.

Even if testers are dilligent in putting checks in their scripts, they will sometimes dismiss errors as being “due to the script” and not bother to investigate further. I worked with someone who was seeing a large number of intermittent errors for his SAPGUI scripts during a load test. The errors were ignored for weeks because they were thought to be due to GDI limitations on the load generators. When the tester finally organised more load generators to avoid the GDI limit, most of the errors went away…most. Legitimate errors had been hidden by the assumption that all errors were due to a known problem.

Don’t assume that you know the root cause of all your errors unless you have investigated each of them individually.

You should not be adding “noise” to your test results by using scripts that are unreliable and generate errors under normal operation. If you have unreliable scripts and are under time pressure, you need to let your manager know that you will need more time to properly finish your scripts.

Here are some examples of things performance testers do that generate errors:

Use unclean input data. If you have data that contains bad values, you should run through it once to clean it, roll back your database (if necessary), then use the “known-good values in your subsequent tests.
“guess” at values that need to be unique. generate a number or a customer name at random. box label number “It’s okay, it doesn’t fail very often”. then treat it as consumable data, and put it in a table. don’t generate stuff at random that will fail some (very small) percentage of the time. You will be inclined to dismiss errors you see as “just due to my script”.
Write scripts that do not handle n. scripts that do not handle normal input values.

TL;DR (Summary)

Performance testing is not just about response times, or finding the point of total system failure. Testers also need to identify bugs that cause intermittent errors under load.
You should aim for a 0% error rate if your load is generated from the internal network (and avoids the unreliable public Internet).
Intermittent errors under load have a root cause, and can be solved.
Percentages can be hard to grasp. Business people might understand the implications better if you put your error rate in terms of the number of orders (or users) that are expected to have errors in a month or a year, based on your test results.
There is no excuse for causing errors yourself. Make sure your scripts are correct, and your input data is clean.
If your scripts do not have good checks in them, then you might not detect that errors are occurring. (NOTE: I am planning on writing an article on how to do this…soon)
Error rates must be calculated at the business-process level, rather than at the transaction step level, or you will be dramatically under-reporting your true error rate. Add “wrapper” transactions to your LoadRunner scripts.

Published On: July 14, 2012Tags: load testing, migrated

6 Comments

Stuart Moncrieff July 14, 2012 at 2:18 pm

There is an inherant suspicion of any errors that are not possible to reproduce manually (intermittent or load-related errors). The first assumption is always that the error is being caused by the testing tool.

No matter how many successful iterations have been performed with the same data, the one unsuccessful iteration is due to the tool.
Tim Koopmans July 23, 2012 at 4:58 pm

I agree with you wholeheartedly. I think we’re all guilty of this though, particularly when pressed for time.

Unfortunately a lot of production / test sites carry existing errors before you even begin testing. Rather than 0% tolerance, I prefer to baseline and categorize the type of errors you already have. Get jiggy with grep an awk, clarify error messages with devs, look for significance, establish patterns etc. Then look for relative degradation under load.

Testers often focus only on script/tool related/generated errors without ever checking things server side! You need to look at both sides of the equation. Poorly defined APIs can easily mask errors behind 200’s and the like, the last point in the chain of events. It’s important to cast the net wider than the first client -> server interaction and look deeper into the guts of what you’re testing.

Don’t have access or don’t know what to monitor, no excuse!

Thanks for your post.
Graham Perry July 26, 2012 at 9:24 am

Stuart – couldn’t agree with you more. Many performance testers pass on error rates to project managers assuming they will explain the impact to business stakeholders. The trouble is project managers can suffer from ‘error rate oversight syndrome (EROS – I like that) and fail to articulate the impact to the business.
Stuart Moncrieff July 30, 2012 at 10:02 am

…and sometimes your testing tool gives you bad advice.

create_order_basic.c(10): Error -27727: Step download timeout (120 seconds) has expired when downloading resource(s). Set the “Step Timeout caused by resources is a warning” Run-Time Setting to Yes/No to have this message as a warning/error, respectively, Snapshot Info [MSH 2 6]
Alexander B September 28, 2012 at 3:11 am

Absolutely agree.
But in real life it rarely happens. Time constrains, lack of access to the environment (besides the front end), lack of cooperation from DBAs, sysadmins etc..
Sakthi February 25, 2016 at 8:50 pm

Hi Stuart,
I am facing one critical issue in my LR scripts. i was tried to record a script with web http/html protocol for my application it was recorded all the events but at the time of displaying the scripts some objects and item datas are missing. could you tell me please to resolve this issue.

Comments are closed.