Archive for the ‘General’ Category

I will be presenting at ANZTB 2010

Sunday, February 28th, 2010

On Tuesday March 2nd, I will be presenting on “The Non-Functional Requirements Every Project Forgets” at the ANZTB 2010 testing conference in Melbourne, Australia. Please come and say “hi”.


(more…)

New book on Performance Testing

Wednesday, November 28th, 2007

I’ve spent the last year or so writing notes and fleshing out chapters for a book called Performance Testing Web Applications, so imagine my very slight feeling of annoyance when I did a google search for my book title and found that someone had released a very similar book 3 months earlier…

J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea have collaborated on a book called Performance Testing Guidance for Web Applications, which is available either as a free download, or in dead-tree format through Amazon.

Performance Testing Guidance for Web Applications

The book is really good, so I highly recommend that you grab a copy and have a read.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

James Bach talks to MAST

Thursday, June 7th, 2007

Benny Hill in a beretJames Bach visited Melbourne this week to teach his Rapid Software Testing course. While he was here, he took the time to talk to about 30 developers and testers from the Melbourne XP Enthusiasts Group (MXPEG) and the Melbourne Association of Software Testers (MAST).

James is an acomplished speaker and is passionate about testing. He spent the first few minutes railing against the traditional testing world that is “obsessed with techniquism and artifactism” – blindly following a testing process without thinking (which I guess I talked about a couple of weeks ago in my post on Cargo Cult Testing); and observed that thinking was a skill that could be taught and practiced.

As an example of a training exercise that could develop exploratory testing skills (and therefore thinking skills), James introduced the Art Show game. This game is similar to Mastermind, except played with cards and much more informal. It boils down to teams determining the the algorithm inside a black box (an art critic’s preferences) through iterative exploratory tests (art showings).

I won’t spoil the game for you by giving too many hints but, rather than simply getting the correct answer, the exploratory testing process is more important. It gives valuable practice in forming a hypothesis and validating or refuting the hypothesis though testing while optimising for speed, coverage and cost (because you may lose cards each time you run a test).

The highlight of the night was probably seeing James in vigorous debate with someone over ISEB’s definition of exploratory testing as “an ad-hoc technique”.

I am sure that I will not be the only attendee who will be dusting off my art critic’s beret to run this as a training exercise with my team sometime soon.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Cargo Cult Testing (testing as ritual rather than science)

Tuesday, May 15th, 2007

A cargo cult reproduces a landing strip complete with wooden plane.Richard Feynman popularised the idea of the cargo cult in his essay on cargo cult science, which was “science” that followed all the forms of scientific investigation, but lacked real critical scientific thought. The idea was transferred to the world of software courtesy of the Jargon File with its entry on cargo cult programming.

The Jargon File defines cargo cult programming as:

A style of (incompetent) programming dominated by ritual inclusion of code or program structures that serve no real purpose. A cargo cult programmer will usually explain the extra code as a way of working around some bug encountered in the past, but usually neither the bug nor the reason the code apparently avoided the bug was ever fully understood.

As someone who works in one of the more technical areas of testing, I see the same thing in the testing world too.

Functional testing is pretty straight-forward; ask a test manager why they are testing, they will probably say something like “we are seeing if the business functions work” (really smart test managers will add that they are seeing that the actions that are meant to fail do actually fail too).

But once you add a layer of technology, it becomes a bit harder to grasp. If your functional test regression suite is automated, then the quality of the testing is less easy for most people to see. Were the test cases that were automated good test cases to begin with? After a few months or a year, regression testing becomes a little bit like a ritual; you wave the magical testing tool over a new version of the application and declare it “tested”. Whether it is tested well is an entirely different question…and one which is unlikely to occur to those deeply involved in the ritualistic behaviour.

In the stress/volume/performance/load/reliability testing world things are even worse. Occasionally I visit companies that I have consulted at in the past to see how they are going with their testing. They generally know to run a peak load test for new builds (and platform changes) before applying the change to Prod, but if you ask a question like “have the transaction volumes changed in Production since I wrote the original Detailed Test Plan, and have the LoadRunner scenarios been updated to reflect this?”, you will draw a blank.

Performance testing during the maintenance phase of software frequently seems to degrade to a situation where the testers are more interested in whether the test cases and LoadRunner scripts still run successfully, rather than whether the test cases reflect reality or even cover the point of change that is driving the current round of testing. And a lack of understanding of what each test case is designed to test leads to wasteful re-running of test cases that are not impacted by a change.

Unfortunately I don’t really have any solutions for this besides companies paying me to occasionally come in and review their performance testing activities (which kind of smacks of self-interest).

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

The 10 Commandments of Load Testing

Wednesday, May 9th, 2007

I have made a list of the top ten things load testers frequently fail to do that make me feel like smiting them.

Lightning strike (Image ID: nssl0012, National Severe Storms Laboratory)

  1. Thou shalt know how thy test tool works.
    The worst performance testers I have met were always more concerned about whether they could get their scripts to run, rather than whether the tests they were running were realistic. Read the documentation, practice, spend some time figuring out what all the settings do, then relate how your scripts are running back to how real users exercise your application.
  2. Thou shalt gather realistic usage data.
    Garbage in, garbage out. If your transaction volumes are wrong, then your load test is wrong.
  3. Thou shalt have testable requirements.
    Non-functional requirements (especially load and performance-related requirements) are usually an afterthought for many projects. This shouldn’t stop you from trying to gather the requirements you need for your tests. The business approach of “let us know how fast it is, and we will let you know if that’s okay” isn’t good enough. Get some numbers. The numbers can change in the future (maybe call them “targets” or “guidelines” rather than “requirements”), but you need something to test against before you start.
  4. Thou shalt write a test plan.
    Even if you already know what you’re going to be doing, other people would probably like to know too – they might even be able to help; besides, a signed-off test plan has saved many a tester from the wrath of project management.
  5. Thou shalt test for the worst case.
    Don’t test with transactions from an average day, test for the busiest day your business has ever had. Add a margin for growth. Testing failover? A server doesn’t fall over at midnight when no one is using your application (would we care in this situation anyway?), it falls over in the middle of the day when lots of real people are using it.
  6. Thou shalt monitor your test environment infrastructure.
    I feel that I have to spell it out, because I still see people who don’t do this. Monitoring your servers allows you to more easily figure out where the problem is. You can also make neat observations like “response times for the new version of the application are the identical to the previous version, but CPU utilisation on the servers has increase by 10%” When I say “monitor your servers”, this includes your load generators.
  7. Thou shalt enforce change control on your environment.
    The final thing you tested should be what is deployed into Production – same application version, same system configuration. It’s easy to lose track of what you are actually testing against if people are making uncontrolled changes to your environment, or if people are making tuning changes without tracking what they are changing. Keep a list of changes that are made…even if you are in a hurry; and always make sure you know what you are testing against.
  8. Thou shalt use a defect tracking tool.
    An untracked defect is a little like a tree that fall in the forest when no-one is around – no-one cares. Raising defects lets everyone know there is a problem (not just the people who should be working to fix it). It also provides a neat repository to keep track of all the things that have been tried to fix the problem.
  9. Thou shalt rule out thy own errors before raising a defect.
    “Oops, my bad!” is a great way to lose credibility with the people who are going to be fixing your defects. If you don’t have credibility, you are going to have to work much harder to convince people that the problem you are seeing is due to a fault with the system rather than a fault with your test scripts. Don’t be so afraid of making a mistake that you test “around” errors (like people who see HTTP 500 errors under load and “solve” the problem by changing their scripts to put less load on the system). It always helps if you have followed commandment #1 Thou shalt know how thy test tool works.
  10. Thou shalt pass on your knowledge.
    Write a Test Summary Report and let management know what you found (and fixed) during testing, make some PowerPoint slides, hold a meeting. Let the Production monitoring group know which metrics are useful to monitor, let them re-use your LoadRunner scripts for Production monitoring with BAC. Leave some documentation for future testers; don’t make them gather requirements and transaction volumes again, or re-write all your scripts because they don’t understand them. Retain your test results until you are sure that no-one is going to ever ask about the results of that test you ran all those months ago.
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Java Thread Dump

Tuesday, April 10th, 2007

A Java thread dump is a way of finding out what every thread in the JVM is doing at a particular point in time. This is especially useful if your Java application sometimes seems to hang when running under load, as an analysis of the dump will show where the threads are stuck.

You can generate a thread dump under Unix/Linux by running kill -QUIT <pid>, and under Windows by hitting Ctl + Break.

A great example of where this would be useful is the well-known Dining Philosophers deadlocking problem. Taking example code from Concurrency: State Models & Java Programs, we can cause a deadlock situation and then create a thread dump.

Dining Philosopers applet screenshot

In the example below (shown using tda), we can see that the 5 Philosopher threads each have a lock on a Fork object and are each waiting to obtain a lock on a second Fork object before they can eat. Unfortunately this never happens and all the philosophers starve.

Thread dump of the Dining Philosphers from Thread Dump Analyzer

Download the thread dump from here (7 KB).

Note that not all hangs are going to be due to deadlocks, and there are many tools (including Eclipse) that will help you analyse thread dumps.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Using open source tools for performance testing (Google video)

Thursday, November 2nd, 2006

I just found something interesting on Google Video; it is a 1 hour Google TechTalk presentation by Goranka Bjedov about Using Open Source Tools for Performance Testing.

Goranka has some interesting things to say. She makes the point that there is really no standard terminology in performance testing circles, and goes on to prove this by giving her own definitions of performance, stress, load, scalability, and reliability testing. As an example of reliability testing she notes that “typically, when I was at AT&T, we would run for about a month at a time after everything was done just to find out that the system can actually stand up and can work fine with the load for a prolonged period of time.” In my testing circles, we would call that a soak test, but I would have been interested to hear more about the types of systems she was testing at AT&T.

The main body of her talk is about the different tools available for load and performance testing. These can be broken down into in-house, vendor and open-source tools.

Google has 55 in-house load and performance testing tools that have been developed by different groups for testing different Google products. These are very expensive to maintain and may only be used in-house, which makes any benchmarks impossible to verify by a third-party. Goranka says “Before you decide to develop your own, please take a look at what is out there…”

Goranka slams vendor tools (like LoadRunner, SilkPerformer and WebLOAD) for being overly expensive and using proprietry scripting languages. Personally, I have always thought that it was pointless having to learn another language just to use a load testing tool. Unfortunately she uses LoadRunner’s scripting language as an example “it’s C, minus the pointers”, and is incorrect – unlike many other tools, Mercury uses standard C (and Java and VBScript).

Her recommended solution is the open source tools – “five years ago they just weren’t there, but today they are.” Her personal preference is JMeter, but also recommends OpenSTA and The Grinder. Open-source tools have the advantage of being a good price, and having source code available; she also makes the point that they use standard programming languages for scripting (although this is incorrect when talking about OpenSTA).

The disadvantages of open-source tools are that they have a steep learning curve and do not support many protocols. “The vendor tools support far more protocols than the open-source tools, but as long as you are staying in the web space, and your looking at HTTP/S, IMAP and POP3, the open-source tools are pretty good”.

Goranka does not say that the open-source tools are free because it is occasionally necessary to write code to extend their features. “Free software is free in the sense that a puppy is free.” Features that Google engineers have written for JMeter have been added back to the main code tree by the people maintaining the project, meaning that Google is at least saved the cost of maintaining their forked code.

She uses JMeter for testing web-based applications through the GUI, uses The Grinder for API-based testing, and does not use OpenSTA because it only works on Windows.

Other points during the presentation:

  • You should use the same monitoring for Load testing that you use for Production monitoring (so you don’t have to account for the differences in load that a different monitoring system will put on the system).
  • If you are running Unix-based systems, don’t sustain CPU above 80%.
  • Google tracks a summary of every performance test in a central database. The database also contains information on every piece of software that is installed on the machines in the test environment.
  • If I am unfamiliar with the system, I don’t trust it. One of the things that I have realised is that
    A) the system will fail in the place where they tell me that nothing could go wrong.
    B) developers are totally delusional about their own software, and frequently they will just forget about things that they’ve done two weeks ago.
  • I run every test 5 times. I want to see that I have some sort of statistical consistency.
  • Performance testing should not be used as a tool to find memory leaks; but it can.
  • Performance testing without monitoring? Don’t bother. Why waste your time?
  • If you are going to do any performance testing, make sure that database sizes are somewhat realistic. They don’t have to be exactly the same, but they have to be the same order of magnitude otherwise the results you are getting are completely off.
  • Execute a stress test. Find how your system is failing. Find where it is failing. Do find out how the system handles overload. There are no good defence mechanisms against people out there, and you can’t predict sudden popularity (eg/ Google Earth).
  • Start a test after a decent warm-up period. Don’t start 100000 users all at once.
  • Quite often people don’t know about everything that is running on a complex system. Maybe there is a low priority process that is running with high priority. This can usually be fixed by niceing the process down. Quite often there are debug things that are still running also.
  • Monitor the machines that are collecting the monitoring data and the load generators (not just the system under test).
  • Performance Testing and QA is about risk analysis. If I believe it is high risk, I want to take a look at it.
  • When I am doing performance testing, the first thing I try to do is eliminate the network. I want to simplify my problem. I am interested in the machines, and my hope is that the network provided will handle everything I need. Once everything is profiled and understood, we will do some tests that include the network. If you can, put everything on the same subnet and same switch. It will make you a much happier performance tester in the first pass. Debugging networking problems is (not) fun.
  • (When talking about testing on smaller sized systems than Production). You can’t test on a 386. Extrapolation will kill you. You will run out of some resource that you never expected, and you can’t predict this ahead of time. For final validation, you really want to get some time on the Production hardware before it goes live. If the system is not being used for Production, it should not be that hard to get hold of it for a week or a weekend.
  • Find more open-source performance tools at opensourcetesting.org

There is another summary of her talk available on Robert Baillie’s blog.

You might also want to have a look at Becoming a Software Testing Expert; a 1 hour presentation delivered by software testing expert, and author of Lessons Learned in Software Testing, James Bach on June 13, 2006. His presentation is available for download from his website.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Hacking into VMWare images

Thursday, October 19th, 2006

Last week I posted a question asking how I could recover or change the password for a VMware guest operating system (Windows 2000) that I had forgotten the password for. After receiving no useful suggestions, this week I allocated some time to solving the problem.

Windows password recovery tools usually consist of a bootable CD image containing a version of Linux that will overwrite the NT password with a known value or will extract the hashed password from the filesystem.

To boot your virtual machine from a CD, you must change the boot order in the virtual machine’s BIOS. Press F2 while the VM is starting up to access the BIOS.

VMware BIOS setup

I used the free software available from Windows XP Login Recovery. This is good because it does not try to write to your file system, it just retrieves the hashed password value.

VMware password recovery

Once you have the password hash, you enter it in a form on their website and they look up the hash value in their database and give you a password that matches the hash. Note that even though they ask for your email address the password is displayed on a web page rather than being sent to your inbox.

VMware password retrieval

After all that effort, I discovered that my password was blank.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

SAP Cheat Sheet for Performance Testers

Sunday, October 1st, 2006

Performance testers must switch between different technology platforms almost every time they join a new project. Here are a few things you should know so you don’t look like a complete n00b on your first SAP project.

Transactions and Sessions

Unless a restriction has been put in place, users may create multiple sessions from a single workstation. Using the menu, this can be done through System > Create Session or by left-clicking on the top right of the SAP client window and selecting the Create Session option.

Create new session

This menu is also the only place a transaction may be stopped if a dialog step is taking a long time to complete. Ending a transaction that has stopped responding by using Windows Task Manager to kill your SAP client is a bad solution, and I have seen it leave zombie processes running in the system.

Most of your navigation probably won’t be done using the menu tree, but by entering a transaction code in the command field.

SAP client command field

In the above screenshot I am jumping directly to the Data Browser, which has a transaction code of SE16. There are several other things we can do in the command field:

  • /x is a shortcut for exiting the system
  • A new session can be opened with /o[Transaction Code]. E.g. “/oSE16″ would open another session and start the Data Browser
  • If you are already in a transaction and want to jump to another, you would use /n[TransactionCode]

Be aware of where you invoke a transaction in your LoadRunner script. If you are restarting a transaction every time you iterate through the Action section of your script, you may be creating an unrealistic usage pattern (unless your users really will restart the transaction every time they use it).

Extracting Data

If you need to automate anything within SAP, like small-scale data loads or data extractions through the GUI, it is easier to use QuickTest Pro (if you are licensed) rather than CATT/eCATT scripts, which are a big pain in the butt.

Regular cut and paste does not work for all of the SAPGUI widgets. Where this is the case, you will have to use Ctl-y to select text before you can copy the text to the clipboard.

If a data grid does not have an export button to allow you to export data to a local file in Excel format, you have another option. Choose System > List > Save > Local File, and choose your preferred file format.

Save to local file

If you have access to the Data Browser (SE16), you can extract data from the database tables directly. Because SAP want their software to be independent of the underlying database, they discourage the use of SQL, so you must query the table using their forms.

SAP data browser (SE16)

Extracting data from tables rapidly becomes painful if your data spans multiple tables, because you can’t do a join. The (hacky) solution to this is to either extract both tables to Excel and join there (small data sets only), or right click in the field in the second table that is the key and choose multiple selection from the context menu. This will allow you to run a query on the table to match all the records that have your key value.

SAP data browser multiple selection

Finding the correct database table can be tricky as most tables and fields have cryptic names (very short German abbreviations). The field and table behind an edit field in your application can be found by placing the cursor in the edit field and pressing F1 (help).

SAP performance assistant

Click on the icon with Technical Information icon (with the hammer and spanner), and the technical details behind the field will be displayed.

Technical information

If you are lucky, you will be given a table and field name, otherwise you will be given a Struct. This is a data structure in the ABAP code that may have pulled data from multiple tables (or wherever). You will also be able to see the field in the struct and the underlying field (data element) in the database. Double click on the data element and you will be taken to a data dictionary view. Press the Where-Used List button to see a list of all the tables that have this field. Double click on the field name under the correct table and you will be taken to the data dictionary view of the table. Confirm you have the correct table and then jump back to SE16 to construct your query.

End-user Performance

I’ve talked about this before, but I think that the transaction timer in the SAP client is quite a neat feature.

SAP client response timer

It measures how long each dialog step takes, so if you press a button and you have wait for a response from the server, the timer will run until the response is received. Note that if multiple dialog steps are executed when you press the button, the timer will measure each individually (causing the timer value to update with a new value for each dialog step), and you will finish with only the last value.

Note from the screenshot, you can also see the transaction code for the current transaction in this menu.

LoadRunner Scripting

SAPGUI is one of the easiest vuser types you will ever use. Read all the LoadRunner documentation and you shouldn’t have many problems. Be aware that you can only run a limited number of virtual users on each generator due to GDI resource limitations. The work-around for this is to open up multiple Terminal Services sessions, each of which has its own allocation of GDI objects. You may need to run fewer virtual users in each TSC session if your application is graphically rich.

Further Reading

  • Make sure to read enough that you will understand the SAP architecture for your project, also any tuning guides are useful
  • Read the relevant sections in the LoadRunner manual
  • Search the Mercury Support Knowledge Base for SAP-related tips and tricks. Read Problem ID: 11907 – How-to and troubleshooting guide for SAP Vuser
  • Read the slides from my Mercury World Australia presentation on Mercury Diagnostics for SAP
  • Finally, I plan to make a list of SAP transaction codes that are useful to performance testers, so check back soon
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Network Sociability Testing

Sunday, September 24th, 2006

If the application that you are performance testing will be operating over a WAN, it is really important to test whether the reduced bandwidth and higher latency of the network link will increase transaction response times to an unacceptable level.

If you are being thorough, you should also run some tests to determine whether your response times will be impacted by other network traffic on the WAN (and vice versa).

Network cable, top view (credit: Quark67, Wikimedia Commons)

Let the following anecdotes about testing two different applications serve as examples:

The first application was to be used by staff who would be interacting directly with customers. There would be up to 10 concurrent users per site, each spending the majority of their time using this application. In addition to the network traffic from the application, there would be back-office traffic like web browsing, email, and software packages being deployed to workstations. The Project had some concerns about the impact of this traffic (particularly package deployment) on end-user response times for this business-critical application.

A leased line with the same specifications as the (future) Production WAN links was provisioned for the test environment. A LoadRunner agent was set up at the end of the WAN link to generate traffic equivalent to 10 concurrent users. The background network traffic was generated with NetIQ’s Chariot tool.

Chariot sends packets (with a typical packet size for each type of traffic to be emulated) at specified volumes between user-defined ports on each end-point, which sit at either end of the WAN link.

The network team spent several days determining the profile of the background traffic and creating a number of traffic scenarios in Chariot. The different traffic scenarios were run at the same time as a worst-case profile of heavy application usage. The response times from these scenarios were compared with end-user performance from a scenario with no background traffic (and, obviously, against the performance requirements).

Network cable, bottom view (credit: Quark67, Wikimedia Commons)

The second application was to by used by staff working in a warehouse. There would only be 1 or 2 concurrent users per site, and the traffic would be low volume – mainly related to printing picking slips and labels. This was tested using one of the warehouse WAN links; using real Production infrastructure meant that the tests had to be run ridiculously early in the morning to ensure that they would not interfere with normal Business activities.

The testing was nowhere near as sophisticated as the first example. A user was located at the warehouse with a stopwatch to measure response times. It was not necessary to use LoadRunner as there would only be a small number of users at the site, and LoadRunner would not be able to measure the time it took for output to appear at the printers.

A baseline with no additional network traffic was run. Multiple samples were taken for each business process to provide a meaningful average response time that could be reported on.

To emulate network activity, I ran a batch file that repeatedly copied a zipped file across the WAN (the file was zipped in case any network components were trying to compress traffic). The network team, who were monitoring the link, reported that the link was at 40% utilisation. As file transfers had a lower QoS priority than application or printer traffic, no significant change in response time was expected.

The business processes were re-run, and results were recorded. As expected, there was no statistically significant difference in response times.

Running a second instance of my batch file increased the network link utilisation to 90%. This time response times were higher, but still within requirements.

Just for the sake of completeness, here is the batch file:

:: Copy file.zip from shared drive
:: to the local D:\TEMP directory.
:: Loop forever (or until Ctl-C).
:: Rename the copied file as file<iteration>.zip.
@echo off
SET /A i=1
:copy
echo copying file%i%...
copy /Y \\dws0368\Temp\file.zip D:\TEMP\file%i%.zip
SET /A i=%i+1
goto copy

As with any sociability test, you can put in a level of effort that is likely to reduce the risk of unexpected behaviour in Production to an appropriate level. You could spend a lot of time making your usage profile “perfect” and testing every scenario, but it saves valuable time to concentrate on a probable-worst-case and simplify you usage profile as much as is practical.

Of the two examples, one took a couple of hours to prepare and execute, the other took a few days and involved significant work from other groups. While I would prefer to do this testing the first way if possible, the quick-and-dirty second way is an example of “good enough” testing (and, with historical graphs of network usage, is almost as defensible as the first example).

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]