DTrace
Tuesday, February 12th, 2008I attended a Sun presentation on DTrace today. Here are links to some good resources:
- OpenSolaris DTrace Community
- Solaris Internals DTrace Topics
- DTrace Toolkit
- Solaris Internals DTrace with Java
- OpenSolaris DTrace Discussion Forum
Some anecdotes that were mentioned:
Buffer sizes
When DTrace was fired up on some monster sized boxes (like a 144 core E25k), it took a while for DTrace to kick in. The issue turned out to be that, by default, DTrace allocates 4MB of memory for each core. On a 144 core machine, this means you need to get ~ 600MB allocated The presenter said he decided to lower the buffer size to 256k or so per CPU and that made DTrace start up much faster.
DTrace aborts on Opteron machines
If you are using dual-core Sun x4100 M2 or X4200 M2 servers and Solaris 10u3 or earlier, you will almost definitely get errors when trying to run DTrace where it aborts almost instantly. You are hitting:
bug id 6507659 tsc differences between CPU’s give dtrace_gethrtime() serious problems
The workaround (-w option) is mentioned in the ticket, but that also disables DTrace safety features, so use caution. If you are on Solaris 10u4 (aka 2008/07) or patched to 120012-14, you should be fine and won’t need the workaround.
Which probes are safe for production?
Probe effect is almost entirely a function of how often they get called. The presenter mentioned that using the syscall, io, and profile providers are almost always fine in production. He did mention that he would not recommend using the pid provider in most cases on a busy process in production, although the the only affect will be slowing the specific traced process down.
DTrace Toolkit
I really wanted to emphasize how good I think this collection of scripts/examples are. I consider it to be almost like the SE Toolkit of the DTrace world. Even if you have no desire to wade through the DTrace docs, or don’t consider yourself much of a scripter, take a look at the DTrace Toolkit and try running some of the examples.