Archive for the ‘LogParser’ Category

Discovering your website usage profile with LogParser

Sunday, September 10th, 2006

It’s always nice to base your load test on real-world usage patterns rather than someone’s “best guess”. If the system is used internally, Business Analysts are usually quite good at providing this information, but if the system is used by external users (like most high-traffic web applications), then they can typically only tell you “big picture” information; e.g. they know how many orders are created, but not how many searches are made per hour.

Basic web log analysis tools provide low-level technical detail that isn’t very useful – hits per hour. To be useful for load testing, we need to translate the data in the logs into end-user business processes and their respective volumes during a peak hour.

Typically this analysis effort is worthwhile when:

  • You are testing a web application that is a replacement for an existing web application, and can therefore be assumed to have a similar usage profile.
  • Your web application has gone live, but you want to validate the “best guess” usage profile that was used for load testing against actual usage patterns. If you tested with less load or used a significantly different usage profile, you might want to run some more tests to make sure that you won’t see problems in the future. You will also be able to use this new profile for future regression tests.

An e-commerce website may have a number of common functions - register, login, update details, search for product, browse catalogue, add to basket, checkout, logout, etc. Using an analysis tool such as LogParser (which I have discussed before), we can map the page requests in the log file onto these functions.

If you are in a hurry to get started, the steps are as follows:

  1. Get a copy of the log files for a peak day.
  2. Extract the top page requests from the file.
  3. Determine which page requests map to which business processes.
  4. Query the log file for the relevant page request that identifies each business process. Pick the busiest hour.

Everyone else, read on…

Hopefully your system administrators are keeping the web server logs for at least a week. Log files will probably be split into a separate file for each day. Ask for the largest log file. Make sure you get the log files for the same day from each web server.

Run a LogParser query to extract all the URLs from the log and sort them by popularity. This gives a good overview of system usage. We can safely exclude any static content (images, stylesheets, JavaScript files) and any errors. The size of the output file can be substantially reduced by excluding pages with a small number of hits (we don’t care about any pages that are low-volume anyway).

SELECT
   COUNT(cs-uri-stem) as Hits,
   cs-uri-stem AS URL
FROM
   C:\TEMP\logs\ex060910_*.log
WHERE
   sc-status <> 404 AND
   EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE ‘jpg’ AND
   EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE ‘gif’ AND
   EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE ‘css’ AND
   EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE ‘ico’ AND
   EXTRACT_EXTENSION(cs-uri-stem) NOT LIKE ‘js’
GROUP BY
   URL
HAVING
   Hits >=5
ORDER BY
   Hits DESC

The overview will generally show that the majority of the requests to the web application are hitting a small number of pages (80/20 rule).

Next, record the key business processes (that you know of) with VuGen using URL mode. This is a simple way of creating a list of all the URLs that each business process covers. Try to pick a URL that uniquely identifies each business process. Take note of the following attributes that are included in the web server log files:

  • cs-uri-stem - usually the filename on the webserver, like “/search.php”
  • cs-uri-query - if the URL has any arguments that are passed in after the filename, these are included here; like “?query=foo&lang=en”
  • cs(Referer) - the complete URL of the previous page
  • cs-method - the HTTP method of the request, usually GET or POST.

Check that there aren’t any popular requests in your overview list that aren’t covered by the business processes that you have recorded.

Finally, write a query using the above information to count the number of times a given business process was run during your peak hour. E.g.

SELECT
  TO_LOCALTIME(QUANTIZE(TO_TIMESTAMP(date, time), 3600)) AS TimeStamp,
   COUNT(cs-uri-stem) as Hits
FROM
   C:\TEMP\logs\ex060910_*.log
WHERE
  cs-uri-stem = '/addToBasket.php' AND
  cs-uri-query LIKE 'item=%' AND
  cs-method = 'GET' AND
  cs(Referer) = 'https://www.store.com/display.php?item=%'
GROUP BY
   TimeStamp
ORDER BY
   TimeStamp

…which gives the following output (in CSV format). Note that the original timestamps were in UTC+0, which was converted to Australian Eastern Standard Time (UTC+10).

TimeStamp,Hits
2006-09-10 10:00:00,933
2006-09-10 11:00:00,935
2006-09-10 12:00:00,761
2006-09-10 13:00:00,655
2006-09-10 14:00:00,705
2006-09-10 15:00:00,680
2006-09-10 16:00:00,565
2006-09-10 17:00:00,231
2006-09-10 18:00:00,105
2006-09-10 19:00:00,66
2006-09-10 20:00:00,60
2006-09-10 21:00:00,65
2006-09-10 22:00:00,98
2006-09-10 23:00:00,36
2006-09-11 00:00:00,26
2006-09-11 01:00:00,15
2006-09-11 02:00:00,16
2006-09-11 03:00:00,17
2006-09-11 04:00:00,23
2006-09-11 05:00:00,24
2006-09-11 06:00:00,20
2006-09-11 07:00:00,43
2006-09-11 08:00:00,182
2006-09-11 09:00:00,398

From this, we can easily see that the peak hour for this business process is between 11 and 12, and we can feed the transaction volumes into the usage profile that will be used for load testing. Rinse and repeat for each business process.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Analysing Web Server Logs

Friday, January 7th, 2005

Following a load test, I often need to perform some additional analysis on the web server logs. It is not practical to use any commercial tools, and the free tools are all aimed at people who want graphs and statistics for entire weeks or months, rather than a few hours of high load. So far, I have usually been forced into a roll-your-own solution.

While it is easy to create graphs with Microsoft Excel, it’s 65536 row limit makes analysing any sort of non-trivial load futile unless the log files have been filtered before importing them. Even with filtering, it is hard to do anything useful with such a small number of records.

Microsoft’s Log Parser tool allows you to perform SQL queries on web log files. If you know exactly what you want, this is an extremely powerful tool. Having a large amount of memory will dramatically improve query times for large files, as will truncating the log files to just the period you are interested in. I found that debugging my queries could be a little painful.

The industrial strength solution is always to just dump the log files into a database and analyse them there.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]