As a performance testing consultant, I get to see a lot of the work that other performance testers do…and a lot of the time it horrifies me. If performance testing was a licensed profession, like law or medicine, then it would be necessary to revoke the licenses of 90% of testers. Their work is not just “low quality”, it is actually wrong or misleading, or based on such a shaky foundation that any predictions made from the test results are tenuous at best.

Bad performance testers frequently treat their testing as a ritual, rather than as a science and they either ignore errors, or don’t bother to check for them in the first place.

Here is an example that a saw last week…

I was sent a report on a 12-hour Soak Test that had been run for a business-critical, public facing website with a large userbase – the kind of website that journalists love to write snarky little articles about when they don’t work. The report was one of the ones that can be automatically exported from LoadRunner Analysis (this is usually a bad sign). Unfortunately I can’t show you the whole report, but here is the graph of Running Users:

LoadRunner Running Users graph

The Soak Test attempted to maintain a steady-state workload of 21,000 vusers for 12 hours (minus some time for ramp-up/ramp-down) but, over the course of the test, more than 10,000 vusers experienced a fatal error. The test report does not note that the test showed a catastrophic failure, and no severity 1 defect was ever raised.

If they believed that the errors they were seeing during their test were due to problems with their LoadRunner scripts, rather than the system they were testing, then they should have fixed their scripts and re-run the test before releasing results to stakeholders.

Other WTF moments in the report:

  • It was a Soak Test, which typically finds resource-related problems such as memory leaks, but no system monitoring had been set up.
  • The Throughput graph showed periodic spikes of higher throughput, as if their vusers were synchonised and “marching in step”. This is not very realistic.
  • They have included a graph of errors over time, but have only referred to errors by their code. I hope their audience knows what a -26612 or a -27791 error is.
  • They have included a table of transactions with response times (min, average, max, standard deviation, 90th percentile) and transaction counts (pass, fail, stop); but they have only highlighted a single line – the only transaction with an average response time over 2 seconds. They have not highlighed any of the transactions with thousands of errors.
    Sometimes I think that calling what we do “performance testing” encourages people to think that their job is just about measuring response times, and to ignore application stability under load.
  • They have focussed on average response times (measured over the entire test) and only failed one transaction, but there was a 1 hour period of quite bad response times. If the average response time was measured over just that hour, then average response times for most of the transactions would be above the 2 second SLA.

I see bad testing everywhere, but I see more of it from low-cost outsourcing companies. I can only assume that CIOs who engage these companies are pleased that they can get an incorrect performance test result for a cheaper price than a correct one.

 

Published On: December 22, 2012Tags: ,

7 Comments

  1. Sam December 23, 2012 at 10:37 am - Reply

    I looked up the LoadRunner error codes.
    -26612 is The server responded with the error “INTERNAL SERVER ERROR”.
    -27791 is Server ‘server name’ has shut down the connection prematurely

  2. Kuldeep Arya January 15, 2013 at 7:58 pm - Reply

    It seems lack of objectivity and ownership.

  3. Anu Sankar S January 24, 2013 at 6:55 pm - Reply

    Hi Stuart,
    I think the main issue is that clients who want to test there applications dont understant the importance of performance correctly. they are very happy is the team says “Performance is good”. These low cost outsourcing companies dont want to create some issues with these vendors. If the found some performance issues in multiple iteration which can cause schedule slipage, they might get “into trouble” with management and if they are not good at PE concepts they might loose there credibility. What these low cost outsourcing compniees trying to do is make customer happy and get more project.

    Another main problem that we (employees working in Outsourcing companies) are always compared against funtional testing by the management. Because of this employees are getting missplaced in wrong roles. This happens more in performance testing/engineering.

    Another misconception usually happen in these companies is that (again in low cost outsourcing company). They think like if the are delivering the test report with only client side matrics and graphs, that is performance testing and if they are proactively getting serverside matrics and do actual analysis, thats performance engineering. Since small vendors usually dont have proper understanding about “Actual Performance” they go with low cost solutions.

    I like you site very much. This is the only frequently updating “performance testing blog” which has informative Posts.

    Thanks
    Anu Sankar S
    (A performance engineer who worked in both Low and Premium Outsourcing companies)

    • Britto April 8, 2014 at 2:50 pm - Reply

      @Anu Sankar S: I totally agree with you.

  4. John July 5, 2013 at 2:23 am - Reply

    Seriously? Wow!

    Similar to this rant: http://www.perfbytes.com/PERFRANT-WordQualityReport

  5. Vikram Chandna May 2, 2014 at 4:00 pm - Reply

    This begs further question. If there are issues in performance testing, why they are not reported? Results of poor testing get pretty high visibility usually. The issue is of gap between test and production maintenance/operations team. Two teams are usually very isolated unless you have institutionalized DevOps. So the issues from production don’t get reported or misreported, unless it is catashtrophic failure.

    Looking at graph above, this was never supposed to be a soak test if application could not handle this kind of load and so there is a mistrial. Regardless, this should have caused flutter somewhere and hopefully the problem was fixed. Now the net impact was that this caused some effort transfer. Someone in development or design team would have spent their trying to ‘interpret’ the problem and hopefully solve the right way.

    An old story, but a good reminder.

Leave A Comment