This blog forms part of the http://performanceGuru.com website which focuses on performance tuning UNIX-like Operating Systems (including Linux).
One of the most common issues seen in the performance field is the attempted use of system wide aggregated statistics to diagnose application performance issues. When a production system performance issue is reported resist the temptation to study system wide statistics and extrapolate, particularly if they are historic smoothed averages with no indication of standard deviation (e.g. historic sar data taken by five minute samples in cron). It is usually far more fruitful to identify the thread of execution which is the basis of the complaint (e.g. the report is running slowly, my screen is updating slowly, etc.). Once the performance complaint has been translated into a thread of execution then a deterministic profiling method should be used to produce a performance profile of the problematic thread of execution.Truss & Strace
The simplest tools for doing this are truss (Solaris) and strace (Linux). Later versions of both tools can also provide information about user space library calls. Caution should be exercised when tracing user libraries, as the information can be misleading, as these tools do not separate calls between library routines (i.e. if one library calls another the time will be allotted only to the first library call). To greater and lesser extents these tools can also introduce the Heisenberg effect (effecting what you are observing), to get an indication of the effect of the tool try timing an application run with and without the tool. Most of this tools use interposing, breakpoints or watchpoints, all of which can have a noticeable effect on execution flow, as they either single step code, introduce significant additional code in the interposing path or require complex memory checks (this may change in the future as most modern processors have specialised debug registers for breakpoints and watchpoints).
Dtrace & Systemtap
Whilst these tools are useful far more useful tools have been delivered or are in development, which help with this task. Sun Microsystems have delivered the excellent and proven dtrace tool in their Solaris operating environment. Dtrace provides a scripting language that activates dynamic probes in both user space and kernel space allowing for the exact flow of execution to be traced from user space into kernel space. This is exactly what is required for Production performance issues. Currently dtrace is being developed in many interesting areas, including the ability to trace various dynamic languages (Java bytecode, Python, Perl, etc) and interesting futures such as the integration of hardware counters into the dtrace framework. Various vendors are collaborating on a similar tool for Linux called Systemtap but the project is in its infancy, although it is evolving quickly, it has yet to be proven Production safe and this is likely to be its biggest challenge. Dtrace and Systemtap are referred to as deterministic profilers meaning they instrument each function call and are extremely lightweight probes when enabled, allowing for almost the exact execution path to be profiled.
Linux has a more mature tool known as Oprofile that can be used safely on Production systems. Oprofile is a statistical profiler, meaning it profiles by taking samples. Various hardware counters can be set, each with a sample rate. When the counter reaches the set sample point an NMI (Non Maskable Interrupt) is generated and the handler samples the program counter. The use of NMIs means that code with interrupts disabled can be accurately profiled. These samples are the collected in user space and can be used to generate useful reports. To profile a thread of execution a time source is generally sampled and this provides a statistical sample of what each thread of execution on the system was doing, this data can then be manipulated to isolate the thread of execution of interest using the reporting tools. Currently Oprofile is being extended to begin to understand dynamic code such as Java. The only issue with Oprofile is that it does not easily identify sleep states such as waiting for I/O or locks (unless they are spinning). In spite of this drawback Oprofile is by far the best application profiling tool available in Linux, particularly if your application is CPU bound.
Whilst Linux has a mature tool in Oprofile and a promising tool in Systemtap, Solaris probably wins the observability stakes currently due to dtrace and its proven stability/track record, as well as its future roadmap. Finally, to conclude, when are systemic statistics of most use (i.e. sar and friends), mainly for pre-production tuning and during benchmarking of performance simulations where multiple system elements must be optimised together, as well as for capacity planning exercises.
del.icio.us Tags: performance truss strace dtrace systemtap oprofile