A few DTrace gotchas and workarounds on OS X

I've run into lots of folks who have heard about the cool stuff DTrace can do, but for one reason or another have run into issues and given up.

In this post, we'll take a look at some of the top gotchas I've seen people run into, and what I've done to mitigate them in my workflow.

For our purposes here, I'll assume you're already motivated to check out DTrace, but if you're looking for motivation, check out how you can use DTrace to figure out why things are slow.

I can't even... practically nothing works

When you first try running DTrace, you might run into a situation where a bunch of probes appear to be missing (vminfo, fbt, pid, etc.):

colin:/tmp/ $ sudo dtrace -ln <span class="s1">'vminfo:::'</span>
 ID PROVIDER MODULE FUNCTION NAME
dtrace: failed to match vminfo:::: No probe matches description

El Capitan introduced a security mechanism called System Integrity Protection to help ensure that no malicious parties can fiddle with the operating system. Since DTrace does a bunch of fiddling, this can be pretty limiting. Check out this post on DTrace vs. SIP for the workaround I use. It requires booting into Recovery Mode, which is admittedly a huge pain and opens back up the security hole that SIP closed for us, but this will get most basic DTrace functionality working.

colin:/tmp/ $ csrutil status
System Integrity Protection status: disabled.
colin:/tmp/ $ sudo dtrace -ln <span class="s1">'vminfo:::'</span> <span class="p">|</span> wc -l
 <span class="m">50</span>

So many virtual memory probes!

My code doesn't execute!

If you love the command line like me (and I expect you do since you're reading a blog about freakin' DTrace), you might figure it'd be cool to try something like this:

colin:/tmp/ $ sudo dtrace -n <span class="s1">'syscall:::entry /pid==$target/ {}'</span> -c <span class="s1">'/Users/colin/.rbenv/versions/2.3.1/bin/ruby -e "puts 41+1"'</span>
 <span class="o">[</span>...lots and lots of output...<span class="o">]</span>

The -c flag in DTrace allows you to trace short-lived processes, and also to get ahold of the PID of that process by using the $target variable. This is useful for both dynamic tracing (via the PID provider) and static tracing where the provider is based on the PID, as with Ruby's probes.

So maybe you search through the syscalls hoping to see some console-writing stuff, and maybe you even end up seeing things that look reasonable. But at some point (maybe after using a script that side effects to a file like I did), you'll realize that while the Ruby runtime is definitely starting up, the script you pass on the command line is never executing, due to an error.

Check it out:

colin:/tmp/ $ sudo dtrace -n <span class="s1">'syscall:::entry /pid==$target/ {}'</span> -c <span class="s1">'/Users/colin/.rbenv/versions/2.3.1/bin/ruby -e "puts 41+1"'</span> <span class="p">|</span> head -n5
dtrace: description <span class="s1">'syscall:::entry'</span> matched <span class="m">500</span> probes
-e:1: unterminated string meets end of file
dtrace: pid <span class="m">68623</span> has exited
CPU ID FUNCTION:NAME
 <span class="m">0</span> <span class="m">198</span> getuid:entry
 <span class="m">0</span> <span class="m">200</span> geteuid:entry
 <span class="m">0</span> <span class="m">198</span> getuid:entry
 <span class="m">0</span> <span class="m">554</span> sysctl:entry

Ouch, a Ruby error prevented our perfect puts 41+1 script from running! If you didn't see the error, take another look—see that -e:1: bit? And also notice that the text "41" never shows up in our output (same deal if we remove the pipe to head). How could this happen to our simple script? One way to get that same error message is as follows:

colin:/tmp/ $ /Users/colin/.rbenv/versions/2.3.1/bin/ruby -e <span class="se">\"</span>puts <span class="m">41</span>+1<span class="se">\"</span>
-e:1: unterminated string meets end of file

I tried about a dozen different quoting and escaping configurations to the DTrace command above, and my best guess based on the symptoms is that DTrace is escaping any quotes in the -c argument. As far as I can tell there isn't a way to avoid this, but my Bash isn't that advanced and I'd be happy to learn of one. The way things stand, we'd need to escape the space character in order to pass what would ordinarily be a perfectly nice quoted string inside the quoted -c argument:

colin:/tmp/ $ sudo dtrace -n <span class="s1">'syscall:::entry'</span> -c <span class="s2">"/Users/colin/.rbenv/versions/2.3.1/bin/ruby -e puts\ 41+1"</span> <span class="p">|</span> less

I suspect this is a common shell-scripting thing that I haven't yet fully understood, but this is where I am right now. The much easier workaround here is to dump the script to a file and call it:

colin:/tmp/ $ <span class="nb">echo</span> <span class="s1">'puts 41+1'</span> > math.rb
colin:/tmp/ $ sudo dtrace -n <span class="s1">'syscall:::entry'</span> -c <span class="s2">"/Users/colin/.rbenv/versions/2.3.1/bin/ruby ./math.rb"</span> <span class="p">|</span> head -n5
dtrace: description <span class="s1">'syscall:::entry'</span> matched <span class="m">500</span> probes
<span class="m">42</span>
dtrace: pid <span class="m">68879</span> has exited
CPU ID FUNCTION:NAME
 <span class="m">0</span> <span class="m">258</span> ioctl:entry
 <span class="m">0</span> <span class="m">258</span> ioctl:entry
 <span class="m">0</span> <span class="m">554</span> sysctl:entry

No quotes, no escaping, just plain old files: this is much easier. Quotes and spaces are kryptonite for shell superheroes.

SIP strikes back

You might have noticed that in the previous section, I called Ruby in a bit of a strange way, using the full path to the ruby executable. That wasn't my first attempt.

Try DTrace'ing in the "normal" way, and you get:

colin:/tmp/ $ sudo dtrace -n <span class="s1">'syscall:::entry /pid==$target/ {}'</span> -c <span class="s1">'ruby -v'</span>
dtrace: failed to execute echo: dtrace cannot control executables signed with restricted entitlements

Google the error message (or recall it from the blog post I linked in the first section), and you'll see that SIP is back, even though we disabled it!

Didn't we?

colin:/tmp/ $ csrutil status
System Integrity Protection status: disabled.

Yes, we totally did. How can this be?!

Welllllll...

colin:/tmp/ $ which ruby
/Users/colin/.rbenv/shims/ruby
colin:/tmp/ $ head -n1 <span class="k">$(</span>which ruby<span class="k">)</span>
<span class="c1">#!/usr/bin/env bash</span>

Even though the ruby command itself isn't in a directory restricted by SIP (which apparently is not totally disabled, despite csrutil status telling us that it is), it ends up pretty quickly invoking a program (/usr/bin/env) that does live in a restricted directory. You can see for yourself by passing the -O (that's the capital letter o, not the number zero) flag to ls:

colin:/tmp/ $ ls -lO /usr
total <span class="m">16</span>
lrwxr-xr-x <span class="m">1</span> root wheel - <span class="m">8</span> Dec <span class="m">23</span> <span class="m">2015</span> X11 -> /opt/X11
lrwxr-xr-x <span class="m">1</span> root wheel restricted <span class="m">8</span> Dec <span class="m">23</span> <span class="m">2015</span> X11R6 -> /opt/X11
drwxr-xr-x <span class="m">3</span> root wheel restricted <span class="m">102</span> Nov <span class="m">13</span> <span class="m">2015</span> adic
drwxr-xr-x <span class="m">1057</span> root wheel restricted <span class="m">35938</span> Oct <span class="m">25</span> <span class="m">12</span>:24 bin
drwxr-xr-x <span class="m">258</span> root wheel restricted <span class="m">8772</span> May <span class="m">12</span> <span class="m">2016</span> include
drwxr-xr-x <span class="m">289</span> root wheel restricted <span class="m">9826</span> Oct <span class="m">25</span> <span class="m">12</span>:26 lib
drwxr-xr-x <span class="m">186</span> root wheel restricted <span class="m">6324</span> Oct <span class="m">25</span> <span class="m">12</span>:27 libexec
drwxr-xr-x <span class="m">25</span> colin admin - <span class="m">850</span> Oct <span class="m">16</span> <span class="m">09</span>:53 <span class="nb">local</span>
drwxr-xr-x <span class="m">243</span> root wheel restricted <span class="m">8262</span> Oct <span class="m">25</span> <span class="m">12</span>:24 sbin
drwxr-xr-x <span class="m">46</span> root wheel restricted <span class="m">1564</span> Jan <span class="m">11</span> <span class="m">2016</span> share
drwxr-xr-x <span class="m">4</span> root wheel restricted <span class="m">136</span> Dec <span class="m">3</span> <span class="m">2015</span> standalone

This affects lots and lots of programs—and even /usr/bin/env alone would mean trouble.

As one step towards a solution to the rbenv-using case above, you might manually copy the file into a non-restricted location (cp /usr/bin/env /usr/local/bin/env) and edit ~/.rbenv/shims/ruby to point at /usr/local/bin/env instead of /usr/bin/env. This turns out to get a step further, but doesn't solve the issue, because at the end of ~/.rbenv/shims/ruby, it does this:

<span class="n">export</span> <span class="no">RBENV_ROOT</span><span class="o">=</span><span class="s2">"/Users/colin/.rbenv"</span>
<span class="nb">exec</span> <span class="s2">"/usr/local/Cellar/rbenv/0.4.0/libexec/rbenv"</span> <span class="nb">exec</span> <span class="s2">"$program"</span> <span class="s2">"$@"</span>

And this rbenv script has the same /usr/bin/env dependency in its shebang line:

colin:/tmp/ <span class="o">(</span>master✗<span class="o">)</span> $ head /usr/local/Cellar/rbenv/0.4.0/libexec/rbenv
<span class="c1">#!/usr/bin/env bash</span>
<span class="nb">set</span> -e
<span class="o">[</span> -n <span class="s2">"</span><span class="nv">$RBENV_DEBUG</span><span class="s2">"</span> <span class="o">]</span> <span class="o">&&</span> <span class="nb">set</span> -x

We could follow the chain around our system, depending on this /usr/local/bin/env copy we made, but this doesn't seem like a tenable solution. Changing shebang lines across our systems to not depend on /usr/bin/env is going to mean getting out of sync with upstream versions of tons of commands, and the best answer I've got for now is to bypass these shell scripts completely. This works great for the rbenv / Ruby case, as you can see above when we traced /Users/colin/.rbenv/versions/2.3.1/bin/ruby. But it's definitely limiting to have such a widely used scripting idiom block us from tracing arbitrary code. I hope the folks at Apple one day provide a way to un-restrict these directories for us observability-loving devs. But until then, copying files around or otherwise bypassing restricted directories like /usr/bin seems to be the best path forward.

This last part might have sounded a bit defeatist, but the situation probably isn't as bad as it seems. If you fire up /usr/bin/irb in one terminal (or tmux pane, or whatever), find its PID, and pass a -p to DTrace, you can trace that executable just fine:

colin:/tmp/ $ sudo dtrace -n <span class="s1">'syscall:::entry /pid==$target/ {}'</span> -p <span class="k">$(</span>pgrep irb<span class="k">)</span>
dtrace: description <span class="s1">'syscall:::entry '</span> matched <span class="m">500</span> probes
CPU ID FUNCTION:NAME
 <span class="m">0</span> <span class="m">944</span> write_nocancel:entry
 <span class="m">0</span> <span class="m">246</span> sigprocmask:entry
 <span class="m">0</span> <span class="m">256</span> sigaltstack:entry
 <span class="m">0</span> <span class="m">156</span> read:entry

So as far as I know, this limitation is really mostly about starting programs from DTrace. There may be additional related limitations as well, but this is what I've found so far.

Summary

So there you have the top few gotchas I've seen in getting going with DTrace. There's way more to learn, because the DTrace rabbit hole goes deep: we haven't even mentioned compiler flags, static and inline C functions, or symbol resolution for userland stack traces.

But we did see a few key issues and how to work around them:

SIP prevents lots of important DTrace functionality, so serious debugging sessions will typically require disabling it.
Starting commands from DTrace has quoting & escaping caveats, so dumping nontrivial commands into a file helps out.
Executing anything underneath /usr/bin won't be traceable when run via DTrace's -c flag, whether or not you've disabled SIP.

I know DTrace isn't the easiest tool to get started with, and that these Mac-specific issues make the ramp-up process trickier than we'd all hope. But hopefully this helps some budding DTracers to get over these initial hurdles and gain a deep understanding of your systems!

A Few DTrace Gotchas and Workarounds on OS X

I can't even... practically nothing works

My code doesn't execute!

SIP strikes back

Summary

Colin Jones

Empowering Game Developers with Faster Builds and Better Tools

Essential Strategies for Upgrading Microservices Without Downtime

Getting Started with TDD: A Practical Guide to Beginning a Lasting Practice