The general probe point syntax is a dotted-symbol sequence. This allows a breakdown of the event namespace into parts, somewhat like the Domain Name System does on the Internet. Each component identifier may be parametrized by a string or number literal, with a syntax like a function call. A component may include a "*" character, to expand to a set of matching probe points. Probe aliases likewise expand to other probe points. Each and every resulting probe point is normally resolved to some low-level system instrumentation facility (e.g., a kprobe address, marker, or a timer configuration), otherwise the elaboration phase will fail.
However, a probe point may be followed by a "?" character, to indicate that it is optional, and that no error should result if it fails to resolve. Optionalness passes down through all levels of alias/wildcard expansion. Alternately, a probe point may be followed by a "!" character, to indicate that it is both optional and sufficient. (Think vaguely of the Prolog cut operator.) If it does resolve, then no further probe points in the same comma-separated list will be resolved. Therefore, the "!" sufficiency mark only makes sense in a list of probe point alternatives.
Additionally, a probe point may be followed by a "if (expr)" statement, in order to enable/disable the probe point on-the-fly. With the "if" statement, if the "expr" is false when the probe point is hit, the whole probe body including alias's body is skipped. The condition is stacked up through all levels of alias/wildcard expansion. So the final condition becomes the logical-and of conditions of all expanded alias/wildcard.
These are all syntactically valid probe points. (They are generally semantically invalid, depending on the contents of the tapsets, and the versions of kernel/user software installed.)
kernel.function("foo").return process("/bin/vi").statement(0x2222) end syscall.* kernel.function("no_such_function") ? module("awol").function("no_such_function") ! signal.*? if (switch) kprobe.function("foo")
Probes may be broadly classified into "synchronous" and "asynchronous". A "synchronous" event is deemed to occur when any processor executes an instruction matched by the specification. This gives these probes a reference point (instruction address) from which more contextual data may be available. Other families of probe points refer to "asynchronous" events such as timers/counters rolling over, where there is no fixed reference point that is related. Each probe point specification may match multiple locations (for example, using wildcards or aliases), and all them are then probed. A probe declaration may also contain several comma-separated specifications, all of which are probed.
The probe points begin and end are defined by the translator to refer to the time of session startup and shutdown. All "begin" probe handlers are run, in some sequence, during the startup of the session. All global variables will have been initialized prior to this point. All "end" probes are run, in some sequence, during the normal shutdown of a session, such as in the aftermath of an exit () function call, or an interruption from the user. In the case of an error-triggered shutdown, "end" probes are not run. There are no target variables available in either context.
If the order of execution among "begin" or "end" probes is significant, then an optional sequence number may be provided:
begin(N) end(N)
The number N may be positive or negative. The probe handlers are run in increasing order, and the order between handlers with the same sequence number is unspecified. When "begin" or "end" are given without a sequence, they are effectively sequence zero.
The error probe point is similar to the end probe, except that each such probe handler run when the session ends after errors have occurred. In such cases, "end" probes are skipped, but each "error" probe is still attempted. This kind of probe can be used to clean up or emit a "final gasp". It may also be numerically parametrized to set a sequence.
The syscall.* aliases define several hundred probes, too many to summarize here. They are:
syscall.NAME
syscall.NAME.return
Generally, two probes are defined for each normal system call as listed in the syscalls(2) manual page, one for entry and one for return. Those system calls that never return do not have a corresponding .return probe.
Each probe alias defines a variety of variables. Looking at the tapset source code is the most reliable way. Generally, each variable listed in the standard manual page is made available as a script-level variable, so syscall.open exposes filename, flags, and mode. In addition, a standard suite of variables is available at most aliases:
Not all probe aliases obey all of these general guidelines. Please report any bothersome ones you encounter as a bug.
Intervals defined by the standard kernel "jiffies" timer may be used to trigger probe handlers asynchronously. Two probe point variants are supported by the translator:
timer.jiffies(N) timer.jiffies(N).randomize(M)
The probe handler is run every N jiffies (a kernel-defined unit of time, typically between 1 and 60 ms). If the "randomize" component is given, a linearly distributed random value in the range [-M..+M] is added to N every time the handler is run. N is restricted to a reasonable range (1 to around a million), and M is restricted to be smaller than N. There are no target variables provided in either context. It is possible for such probes to be run concurrently on a multi-processor computer.
Alternatively, intervals may be specified in units of time. There are two probe point variants similar to the jiffies timer:
timer.ms(N) timer.ms(N).randomize(M)
Here, N and M are specified in milliseconds, but the full options for units are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for hertz timers.
The actual resolution of the timers depends on the target kernel. For kernels prior to 2.6.17, timers are limited to jiffies resolution, so intervals are rounded up to the nearest jiffies interval. After 2.6.17, the implementation uses hrtimers for tighter precision, though the actual resolution will be arch-dependent. In either case, if the "randomize" component is given, then the random value will be added to the interval before any rounding occurs.
Profiling timers are also available to provide probes that execute on all CPUs at the rate of the system tick (CONFIG_HZ). This probe takes no parameters.
timer.profile
Full context information of the interrupted process is available, making this probe suitable for a time-based sampling profiler.
This family of probe points uses symbolic debugging information for the target kernel/module/program, as may be found in unstripped executables, or the separate debuginfo packages. They allow placement of probes logically into the execution path of the target program, by specifying a set of points in the source or object code. When a matching statement executes on any processor, the probe handler is run in that context.
Points in a kernel, which are identified by module, source file, line number, function name, or some combination of these.
Here is a list of probe point families currently supported. The .function variant places a probe near the beginning of the named function, so that parameters are available as context variables. The .return variant places a probe at the moment after the return from the named function, so the return value is available as the "$return" context variable. The .inline modifier for .function filters the results to include only instances of inlined functions. The .call modifier selects the opposite subset. Inline functions do not have an identifiable return point, so .return is not supported on .inline probes. The .statement variant places a probe at the exact spot, exposing those local variables that are visible there.
kernel.function(PATTERN)
kernel.function(PATTERN).call
kernel.function(PATTERN).return
kernel.function(PATTERN).inline
kernel.function(PATTERN).label(LPATTERN)
module(MPATTERN).function(PATTERN)
module(MPATTERN).function(PATTERN).call
module(MPATTERN).function(PATTERN).return
module(MPATTERN).function(PATTERN).inline
kernel.statement(PATTERN)
kernel.statement(ADDRESS).absolute
module(MPATTERN).statement(PATTERN)
In the above list, MPATTERN stands for a string literal that aims to identify the loaded kernel module of interest and LPATTERN stands for a source program label. Both MPATTERN and LPATTERN may include the "*" "[]", and "?" wildcards. PATTERN stands for a string literal that aims to identify a point in the program. It is made up of three parts:
As an alternative, PATTERN may be a numeric constant, indicating an address. Such an address may be found from symbol tables of the appropriate kernel / module object file. It is verified against known statement code boundaries, and will be relocated for use at run time.
In guru mode only, absolute kernel-space addresses may be specified with the ".absolute" suffix. Such an address is considered already relocated, as if it came from /proc/kallsyms, so it cannot be checked against statement/instruction boundaries.
Some of the source-level context variables, such as function parameters, locals, globals visible in the compilation unit, may be visible to probe handlers. They may refer to these variables by prefixing their name with "$" within the scripts. In addition, a special syntax allows limited traversal of structures, pointers, and arrays.
For ".return" probes, context variables other than the "$return" value itself are only available for the function call parameters. The expressions evaluate to the entry-time values of those variables, since that is when a snapshot is taken. Other local variables are not generally accessible, since by the time a ".return" probe hits, the probed function will have already returned.
kprobe.function(FUNCTION) kprobe.function(FUNCTION).return kprobe.module(NAME).function(FUNCTION) kprobe.module(NAME).function(FUNCTION).return kprobe.statement.(ADDRESS).absolute
Probes of type function are recommended for kernel functions, whereas probes of type module are recommended for probing functions of the specified module. In case the absolute address of a kernel or module function is known, statement probes can be utilized.
Note that FUNCTION and MODULE names must not contain wildcards, or the probe will not be registered. Also, statement probes must be run under guru-mode only.
http://people.redhat.com/roland/utrace/
There are several forms. First, a non-symbolic probe point:
process(PID).statement(ADDRESS).absolute
Second, non-symbolic user-kernel interface events handled by
utrace may be probed:
process(PID).begin process("PATH").begin process.begin process(PID).thread.begin process("PATH").thread.begin process.thread.begin process(PID).end process("PATH").end process.end process(PID).thread.end process("PATH").thread.end process.thread.end process(PID).syscall process("PATH").syscall process.syscall process(PID).syscall.return process("PATH").syscall.return process.syscall.return process(PID).insn process("PATH").insn process(PID).insn.block process("PATH").insn.block
A .begin probe gets called when new process described by PID or PATH gets created. A .thread.begin probe gets called when a new thread described by PID or PATH gets created. A .end probe gets called when process described by PID or PATH dies. A .thread.end probe gets called when a thread described by PID or PATH dies. A .syscall probe gets called when a thread described by PID or PATH makes a system call. The system call number is available in the $syscall context variable, and the first 6 arguments of the system call are available in the $argN (ex. $arg1, $arg2, ...) context variable. A .syscall.return probe gets called when a thread described by PID or PATH returns from a system call. The system call number is available in the $syscall context variable, and the return value of the system call is available in the $return context variable. A .insn probe gets called for every single-stepped instruction of the process described by PID or PATH. A .insn.block probe gets called for every block-stepped instruction of the process described by PID or PATH.
Third, symbolic static instrumentation compiled into programs and
shared libraries may be
probed:
process("PATH").mark("LABEL")
A .mark probe gets called via a static probe which is defined in the application by STAP_PROBE1(handle,LABEL,arg1), which is defined in sdt.h. The handle is an application handle, LABEL corresponds to the .mark argument, and arg1 is the argument. STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used for probes with 2 arguments, and so on. The arguments of the probe are available in the context variables $arg1, $arg2, ... An alternative to using the STAP_PROBE macros is to use the dtrace script to create custom macros.
Finally, full symbolic source-level probes in user-space programs
and shared libraries are supported. These are exactly analogous
to the symbolic DWARF-based kernel/module probes described above,
and expose similar contextual $-variables.
process("PATH").function("NAME") process("PATH").statement("*@FILE.c:123") process("PATH").function("*").return process("PATH").function("myfun").label("foo")
Note that for all process probes, PATH names refer to executables that are searched the same way shells do: relative to the working directory if they contain a "/" character, otherwise in $PATH. If a process probe is specified without a PID or PATH, all user threads are probed. PATH may sometimes name a shared library in which case all processes that map that shared library may be probed. However, if systemtap was invoked with the -c or -x options, then process probes are restricted to the process hierarchy associated with the target process.
These probe points allow procfs "files" in /proc/systemtap/MODNAME to be created, read and written (MODNAME is the name of the systemtap module). The proc filesystem is a pseudo-filesystem which is used an an interface to kernel data structures. There are four probe point variants supported by the translator:
procfs("PATH").read procfs("PATH").write procfs.read procfs.write
PATH is the file name (relative to /proc/systemtap/MODNAME) to be created. If no PATH is specified (as in the last two variants above), PATH defaults to "command".
When a user reads /proc/systemtap/MODNAME/PATH, the corresponding procfs read probe is triggered. The string data to be read should be assigned to a variable named $value, like this:
procfs("PATH").read { $value = "100\n" }
When a user writes into /proc/systemtap/MODNAME/PATH, the corresponding procfs write probe is triggered. The data the user wrote is available in the string variable named $value, like this:
procfs("PATH").write { printf("user wrote: %s", $value) }
This family of probe points hooks up to static probing markers inserted into the kernel or modules. These markers are special macro calls inserted by kernel developers to make probing faster and more reliable than with DWARF-based probes. Further, DWARF debugging information is not required to probe markers.
Marker probe points begin with kernel. The next part names the marker itself: mark(name). The marker name string, which may contain the usual wildcard characters, is matched against the names given to the marker macros when the kernel and/or module was compiled. Optionally, you can specify format(format). Specifying the marker format string allows differentiation between two markers with the same name but different marker format strings.
The handler associated with a marker-based probe may read the optional parameters specified at the macro call site. These are named $arg1 through $argNN, where NN is the number of parameters supplied by the macro. Number and string parameters are passed in a type-safe manner.
The marker format string associated with a marker is available in $format. And also the marker name string is available in $name.
This family of probe points hooks up to static probing tracepoints inserted into the kernel or modules. As with markers, these tracepoints are special macro calls inserted by kernel developers to make probing faster and more reliable than with DWARF-based probes, and DWARF debugging information is not required to probe tracepoints. Tracepoints have an extra advantage of more strongly-typed parameters than markers.
Tracepoint probes begin with kernel. The next part names the tracepoint itself: trace(name). The tracepoint name string, which may contain the usual wildcard characters, is matched against the names defined by the kernel developers in the tracepoint header files.
The handler associated with a tracepoint-based probe may read the optional parameters specified at the macro call site. These are named according to the declaration by the tracepoint author. For example, the tracepoint probe kernel.trace(sched_switch) provides the parameters $rq, $prev, and $next. If the parameter is a complex type, as in a struct pointer, then a script can access fields with the same syntax as DWARF $target variables. Also, tracepoint parameters cannot be modified, but in guru-mode a script may modify fields of parameters.
The name of the tracepoint is available in $$name, and a string of name=value pairs for all parameters of the tracepoint is available in $$vars or $$parms.
The perfmon family of probe points is used to access the performance monitoring hardware available in modern processors. This family of probes points needs the perfmon2 support in the kernel to access the performance monitoring hardware.
Performance monitor hardware points begin with a perfmon. The next part of the names the event being counted counter(event). The event names are processor implementation specific with the exception of the generic cycles and instructions events, which are available on all processors. This sets up a counter on the processor to count the number of events occurring on the processor. For more details on the performance monitoring events available on a specific processor use the command perfmon2 command:
pfmon -l
Here are some example probe points, defining the associated events.