It’s always nice to base your load test on real-world usage patterns rather than someoneâ€™s â€œbest guessâ€. If the system is used internally, Business Analysts are usually quite good at providing this information, but if the system is used by external users (like most high-traffic web applications), then they can typically only tell you â€œbig pictureâ€ information; e.g. they know how many orders are created, but not how many searches are made per hour.
Basic web log analysis tools provide low-level technical detail that isnâ€™t very useful â€“ hits per hour. To be useful for load testing, we need to translate the data in the logs into end-user business processes and their respective volumes during a peak hour.
Typically this analysis effort is worthwhile when:
- You are testing a web application that is a replacement for an existing web application, and can therefore be assumed to have a similar usage profile.
- Your web application has gone live, but you want to validate the â€œbest guessâ€ usage profile that was used for load testing against actual usage patterns. If you tested with less load or used a significantly different usage profile, you might want to run some more tests to make sure that you wonâ€™t see problems in the future. You will also be able to use this new profile for future regression tests.
An e-commerce website may have a number of common functions – register, login, update details, search for product, browse catalogue, add to basket, checkout, logout, etc. Using an analysis tool such as LogParser (which I have discussed before), we can map the page requests in the log file onto these functions.
If you are in a hurry to get started, the steps are as follows:
- Get a copy of the log files for a peak day.
- Extract the top page requests from the file.
- Determine which page requests map to which business processes.
- Query the log file for the relevant page request that identifies each business process. Pick the busiest hour.
Everyone else, read on…
Hopefully your system administrators are keeping the web server logs for at least a week. Log files will probably be split into a separate file for each day. Ask for the largest log file. Make sure you get the log files for the same day from each web server.
COUNT(cs-uri-stem) as Hits,
cs-uri-stem AS URL
sc-status <> 404 AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'jpg' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'gif' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'css' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'ico' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'js'
The overview will generally show that the majority of the requests to the web application are hitting a small number of pages (80/20 rule).
Next, record the key business processes (that you know of) with VuGen using URL mode. This is a simple way of creating a list of all the URLs that each business process covers. Try to pick a URL that uniquely identifies each business process. Take note of the following attributes that are included in the web server log files:
- cs-uri-stem – usually the filename on the webserver, like “/search.php”
- cs-uri-query – if the URL has any arguments that are passed in after the filename, these are included here; like “?query=foo&lang=en”
- cs(Referer) – the complete URL of the previous page
- cs-method – the HTTP method of the request, usually GET or POST.
Check that there aren’t any popular requests in your overview list that aren’t covered by the business processes that you have recorded.
Finally, write a query using the above information to count the number of times a given business process was run during your peak hour. E.g.
TO_LOCALTIME(QUANTIZE(TO_TIMESTAMP(date, time), 3600)) AS TimeStamp,
COUNT(cs-uri-stem) as Hits
cs-uri-stem = '/addToBasket.php' AND
cs-uri-query LIKE 'item=%' AND
cs-method = 'GET' AND
cs(Referer) = 'https://www.store.com/display.php?item=%'
…which gives the following output (in CSV format). Note that the original timestamps were in UTC+0, which was converted to Australian Eastern Standard Time (UTC+10).
From this, we can easily see that the peak hour for this business process is between 11 and 12, and we can feed the transaction volumes into the usage profile that will be used for load testing. Rinse and repeat for each business process.