It’s always nice to base your load test on real-world usage patterns rather than someone’s “best guess”. If the system is used internally, Business Analysts are usually quite good at providing this information, but if the system is used by external users (like most high-traffic web applications), then they can typically only tell you “big picture” information; e.g. they know how many orders are created, but not how many searches are made per hour.
Basic web log analysis tools provide low-level technical detail that isn’t very useful – hits per hour. To be useful for load testing, we need to translate the data in the logs into end-user business processes and their respective volumes during a peak hour.
Typically this analysis effort is worthwhile when:
- You are testing a web application that is a replacement for an existing web application, and can therefore be assumed to have a similar usage profile.
- Your web application has gone live, but you want to validate the “best guess” usage profile that was used for load testing against actual usage patterns. If you tested with less load or used a significantly different usage profile, you might want to run some more tests to make sure that you won’t see problems in the future. You will also be able to use this new profile for future regression tests.
An e-commerce website may have a number of common functions – register, login, update details, search for product, browse catalogue, add to basket, checkout, logout, etc. Using an analysis tool such as LogParser (which I have discussed before), we can map the page requests in the log file onto these functions.
If you are in a hurry to get started, the steps are as follows:
- Get a copy of the log files for a peak day.
- Extract the top page requests from the file.
- Determine which page requests map to which business processes.
- Query the log file for the relevant page request that identifies each business process. Pick the busiest hour.
Everyone else, read on…
Hopefully your system administrators are keeping the web server logs for at least a week. Log files will probably be split into a separate file for each day. Ask for the largest log file. Make sure you get the log files for the same day from each web server.
Run a LogParser query to extract all the URLs from the log and sort them by popularity. This gives a good overview of system usage. We can safely exclude any static content (images, stylesheets, JavaScript files) and any errors. The size of the output file can be substantially reduced by excluding pages with a small number of hits (we don’t care about any pages that are low-volume anyway).
SELECT
COUNT(cs-uri-stem) as Hits,
cs-uri-stem AS URL
FROM
C:\TEMP\logs\ex060910_*.log
WHERE
sc-status <> 404 AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'jpg' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'gif' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'css' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'ico' AND
EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE 'js'
GROUP BY
URL
HAVING
Hits >=5
ORDER BY
Hits DESC
The overview will generally show that the majority of the requests to the web application are hitting a small number of pages (80/20 rule).
Next, record the key business processes (that you know of) with VuGen using URL mode. This is a simple way of creating a list of all the URLs that each business process covers. Try to pick a URL that uniquely identifies each business process. Take note of the following attributes that are included in the web server log files:
- cs-uri-stem – usually the filename on the webserver, like “/search.php”
- cs-uri-query – if the URL has any arguments that are passed in after the filename, these are included here; like “?query=foo&lang=en”
- cs(Referer) – the complete URL of the previous page
- cs-method – the HTTP method of the request, usually GET or POST.
Check that there aren’t any popular requests in your overview list that aren’t covered by the business processes that you have recorded.
Finally, write a query using the above information to count the number of times a given business process was run during your peak hour. E.g.
SELECT
TO_LOCALTIME(QUANTIZE(TO_TIMESTAMP(date, time), 3600)) AS TimeStamp,
COUNT(cs-uri-stem) as Hits
FROM
C:\TEMP\logs\ex060910_*.log
WHERE
cs-uri-stem = '/addToBasket.php' AND
cs-uri-query LIKE 'item=%' AND
cs-method = 'GET' AND
cs(Referer) = 'https://www.store.com/display.php?item=%'
GROUP BY
TimeStamp
ORDER BY
TimeStamp
…which gives the following output (in CSV format). Note that the original timestamps were in UTC+0, which was converted to Australian Eastern Standard Time (UTC+10).
TimeStamp,Hits
2006-09-10 10:00:00,933
2006-09-10 11:00:00,935
2006-09-10 12:00:00,761
2006-09-10 13:00:00,655
2006-09-10 14:00:00,705
2006-09-10 15:00:00,680
2006-09-10 16:00:00,565
2006-09-10 17:00:00,231
2006-09-10 18:00:00,105
2006-09-10 19:00:00,66
2006-09-10 20:00:00,60
2006-09-10 21:00:00,65
2006-09-10 22:00:00,98
2006-09-10 23:00:00,36
2006-09-11 00:00:00,26
2006-09-11 01:00:00,15
2006-09-11 02:00:00,16
2006-09-11 03:00:00,17
2006-09-11 04:00:00,23
2006-09-11 05:00:00,24
2006-09-11 06:00:00,20
2006-09-11 07:00:00,43
2006-09-11 08:00:00,182
2006-09-11 09:00:00,398
From this, we can easily see that the peak hour for this business process is between 11 and 12, and we can feed the transaction volumes into the usage profile that will be used for load testing. Rinse and repeat for each business process.
2 Comments
Comments are closed.
One of the most interesting things about doing this for a real web application is finding out that some of the “really important” features (that took so much development and testing effort) are hardly being used at all.
Thank you for sharing! I will add this to my “tool-kit”.