A few DTrace gotchas and workarounds on OS X

A few DTrace gotchas and workarounds on OS X

Colin Jones
Colin Jones

February 02, 2017

I've run into lots of folks who have heard about the cool stuff DTrace can do, but for one reason or another have run into issues and given up.

In this post, we'll take a look at some of the top gotchas I've seen people run into, and what I've done to mitigate them in my workflow.

For our purposes here, I'll assume you're already motivated to check out DTrace, but if you're looking for motivation, check out how you can use DTrace to figure out why things are slow.

I can't even... practically nothing works

When you first try running DTrace, you might run into a situation where a bunch of probes appear to be missing (vminfo, fbt, pid, etc.):

colin:/tmp/ $ sudo dtrace -ln 'vminfo:::'
 ID PROVIDER MODULE FUNCTION NAME
dtrace: failed to match vminfo:::: No probe matches description

El Capitan introduced a security mechanism called System Integrity Protection to help ensure that no malicious parties can fiddle with the operating system. Since DTrace does a bunch of fiddling, this can be pretty limiting. Check out this post on DTrace vs. SIP for the workaround I use. It requires booting into Recovery Mode, which is admittedly a huge pain and opens back up the security hole that SIP closed for us, but this will get most basic DTrace functionality working.

colin:/tmp/ $ csrutil status
System Integrity Protection status: disabled.
colin:/tmp/ $ sudo dtrace -ln 'vminfo:::' | wc -l
 50

So many virtual memory probes!

My code doesn't execute!

If you love the command line like me (and I expect you do since you're reading a blog about freakin' DTrace), you might figure it'd be cool to try something like this:

colin:/tmp/ $ sudo dtrace -n 'syscall:::entry /pid==$target/ {}' -c '/Users/colin/.rbenv/versions/2.3.1/bin/ruby -e "puts 41+1"'
 [...lots and lots of output...]

The -c flag in DTrace allows you to trace short-lived processes, and also to get ahold of the PID of that process by using the $target variable. This is useful for both dynamic tracing (via the PID provider) and static tracing where the provider is based on the PID, as with Ruby's probes.

So maybe you search through the syscalls hoping to see some console-writing stuff, and maybe you even end up seeing things that look reasonable. But at some point (maybe after using a script that side effects to a file like I did), you'll realize that while the Ruby runtime is definitely starting up, the script you pass on the command line is never executing, due to an error.

Check it out:

colin:/tmp/ $ sudo dtrace -n 'syscall:::entry /pid==$target/ {}' -c '/Users/colin/.rbenv/versions/2.3.1/bin/ruby -e "puts 41+1"' | head -n5
dtrace: description 'syscall:::entry' matched 500 probes
-e:1: unterminated string meets end of file
dtrace: pid 68623 has exited
CPU ID FUNCTION:NAME
 0 198 getuid:entry
 0 200 geteuid:entry
 0 198 getuid:entry
 0 554 sysctl:entry

Ouch, a Ruby error prevented our perfect puts 41+1 script from running! If you didn't see the error, take another look—see that -e:1: bit? And also notice that the text "41" never shows up in our output (same deal if we remove the pipe to head). How could this happen to our simple script? One way to get that same error message is as follows:

colin:/tmp/ $ /Users/colin/.rbenv/versions/2.3.1/bin/ruby -e \"puts 41+1\"
-e:1: unterminated string meets end of file

I tried about a dozen different quoting and escaping configurations to the DTrace command above, and my best guess based on the symptoms is that DTrace is escaping any quotes in the -c argument. As far as I can tell there isn't a way to avoid this, but my Bash isn't that advanced and I'd be happy to learn of one. The way things stand, we'd need to escape the space character in order to pass what would ordinarily be a perfectly nice quoted string inside the quoted -c argument:

colin:/tmp/ $ sudo dtrace -n 'syscall:::entry' -c "/Users/colin/.rbenv/versions/2.3.1/bin/ruby -e puts\ 41+1" | less

I suspect this is a common shell-scripting thing that I haven't yet fully understood, but this is where I am right now. The much easier workaround here is to dump the script to a file and call it:

colin:/tmp/ $ echo 'puts 41+1' > math.rb
colin:/tmp/ $ sudo dtrace -n 'syscall:::entry' -c "/Users/colin/.rbenv/versions/2.3.1/bin/ruby ./math.rb" | head -n5
dtrace: description 'syscall:::entry' matched 500 probes
42
dtrace: pid 68879 has exited
CPU ID FUNCTION:NAME
 0 258 ioctl:entry
 0 258 ioctl:entry
 0 554 sysctl:entry

No quotes, no escaping, just plain old files: this is much easier. Quotes and spaces are kryptonite for shell superheroes.

SIP strikes back

You might have noticed that in the previous section, I called Ruby in a bit of a strange way, using the full path to the ruby executable. That wasn't my first attempt.

Try DTrace'ing in the "normal" way, and you get:

colin:/tmp/ $ sudo dtrace -n 'syscall:::entry /pid==$target/ {}' -c 'ruby -v'
dtrace: failed to execute echo: dtrace cannot control executables signed with restricted entitlements

Google the error message (or recall it from the blog post I linked in the first section), and you'll see that SIP is back, even though we disabled it!

Didn't we?

colin:/tmp/ $ csrutil status
System Integrity Protection status: disabled.

Yes, we totally did. How can this be?!

Welllllll...

colin:/tmp/ $ which ruby
/Users/colin/.rbenv/shims/ruby
colin:/tmp/ $ head -n1 $(which ruby)
#!/usr/bin/env bash

Even though the ruby command itself isn't in a directory restricted by SIP (which apparently is not totally disabled, despite csrutil status telling us that it is), it ends up pretty quickly invoking a program (/usr/bin/env) that does live in a restricted directory. You can see for yourself by passing the -O (that's the capital letter o, not the number zero) flag to ls:

colin:/tmp/ $ ls -lO /usr
total 16
lrwxr-xr-x 1 root wheel - 8 Dec 23 2015 X11 -> /opt/X11
lrwxr-xr-x 1 root wheel restricted 8 Dec 23 2015 X11R6 -> /opt/X11
drwxr-xr-x 3 root wheel restricted 102 Nov 13 2015 adic
drwxr-xr-x 1057 root wheel restricted 35938 Oct 25 12:24 bin
drwxr-xr-x 258 root wheel restricted 8772 May 12 2016 include
drwxr-xr-x 289 root wheel restricted 9826 Oct 25 12:26 lib
drwxr-xr-x 186 root wheel restricted 6324 Oct 25 12:27 libexec
drwxr-xr-x 25 colin admin - 850 Oct 16 09:53 local
drwxr-xr-x 243 root wheel restricted 8262 Oct 25 12:24 sbin
drwxr-xr-x 46 root wheel restricted 1564 Jan 11 2016 share
drwxr-xr-x 4 root wheel restricted 136 Dec 3 2015 standalone

This affects lots and lots of programs—and even /usr/bin/env alone would mean trouble.

As one step towards a solution to the rbenv-using case above, you might manually copy the file into a non-restricted location (cp /usr/bin/env /usr/local/bin/env) and edit ~/.rbenv/shims/ruby to point at /usr/local/bin/env instead of /usr/bin/env. This turns out to get a step further, but doesn't solve the issue, because at the end of ~/.rbenv/shims/ruby, it does this:

export RBENV_ROOT="/Users/colin/.rbenv"
exec "/usr/local/Cellar/rbenv/0.4.0/libexec/rbenv" exec "$program" "$@"

And this rbenv script has the same /usr/bin/env dependency in its shebang line:

colin:/tmp/ (master✗) $ head /usr/local/Cellar/rbenv/0.4.0/libexec/rbenv
#!/usr/bin/env bash
set -e
[ -n "$RBENV_DEBUG" ] && set -x

We could follow the chain around our system, depending on this /usr/local/bin/env copy we made, but this doesn't seem like a tenable solution. Changing shebang lines across our systems to not depend on /usr/bin/env is going to mean getting out of sync with upstream versions of tons of commands, and the best answer I've got for now is to bypass these shell scripts completely. This works great for the rbenv / Ruby case, as you can see above when we traced /Users/colin/.rbenv/versions/2.3.1/bin/ruby. But it's definitely limiting to have such a widely used scripting idiom block us from tracing arbitrary code. I hope the folks at Apple one day provide a way to un-restrict these directories for us observability-loving devs. But until then, copying files around or otherwise bypassing restricted directories like /usr/bin seems to be the best path forward.

This last part might have sounded a bit defeatist, but the situation probably isn't as bad as it seems. If you fire up /usr/bin/irb in one terminal (or tmux pane, or whatever), find its PID, and pass a -p to DTrace, you can trace that executable just fine:

colin:/tmp/ $ sudo dtrace -n 'syscall:::entry /pid==$target/ {}' -p $(pgrep irb)
dtrace: description 'syscall:::entry ' matched 500 probes
CPU ID FUNCTION:NAME
 0 944 write_nocancel:entry
 0 246 sigprocmask:entry
 0 256 sigaltstack:entry
 0 156 read:entry

So as far as I know, this limitation is really mostly about starting programs from DTrace. There may be additional related limitations as well, but this is what I've found so far.

Summary

So there you have the top few gotchas I've seen in getting going with DTrace. There's way more to learn, because the DTrace rabbit hole goes deep: we haven't even mentioned compiler flags, static and inline C functions, or symbol resolution for userland stack traces.

But we did see a few key issues and how to work around them:

  • SIP prevents lots of important DTrace functionality, so serious debugging sessions will typically require disabling it.
  • Starting commands from DTrace has quoting & escaping caveats, so dumping nontrivial commands into a file helps out.
  • Executing anything underneath /usr/bin won't be traceable when run via DTrace's -c flag, whether or not you've disabled SIP.

I know DTrace isn't the easiest tool to get started with, and that these Mac-specific issues make the ramp-up process trickier than we'd all hope. But hopefully this helps some budding DTracers to get over these initial hurdles and gain a deep understanding of your systems!