Dereferencing NULL Pointer, without a Seg Fault

Dereferencing NULL Pointer, without a Seg Fault

Dariusz Pasciak
Dariusz Pasciak

July 03, 2012

We've all heard that Chuck Norris can count to infinity, twice. We've all heard that Chuck Norris can drown a fish. Chuck Norris can divide by 0. And we all remember the time when Chuck Norris hit a 475- foot home run into upper deck Yankee Stadium, while bunting.

But have you ever heard anything about Chuck Norris dereferencing a NULL pointer? Well, not really. But that's because it's really not that hard to do.

In C and C++, pointers are very common. For those that are not familiar with pointers, or those that need a memory refresh, a pointer is simply the address of a specific location in memory, at which something interesting lies (say, an object). I like to think of it as a "leash" that you follow to get to the "thing" it's attached to. A quick example:

Cow* cowPointer = new Cow();

cowPointer is just a variable of type Cow pointer, meaning it "points" to a cow. So it's a leash to a Cow object located somewhere in memory, and if you follow that leash, you'll get to the Cow. When you get to the Cow, you can then do whatever interesting things that cow was programmed to do, such as:

cowPointer->milk();

The -> literally means follow the leash called cowPointer, and whatever Cow is at the end of that leash, milk it. Pretty simple.

Since the pointer is just a leash, and a leash is not physically part of a cow, we can always untie that leash from one cow and tie it to a different cow.

Cow* cowPointer = new Cow();
cowPointer = new Cow();

On line 1, we create one cow and tie the leash cowPointer to that cow. On line 2, we create another cow, untie the leash from the first cow, and tie that same leash to the new cow. At this point if we follow the leash:

cowPointer->milk();

we will milk the second cow. Simple.

The technical term for "following a leash" is dereferencing a pointer.

Pointer manipulation

Now remember, the pointer (cowPointer) is just a variable that stores an address. It stores the memory address of a Cow object. If the Cow object happens to be stored in memory at address 0x887788, then the "value" of cowPointer will be 0x887788. If we print it, we will see that address:

std::cout << cowPointer;

prints out: 0x887788.

Since cowPointer is nothing more than a value, we can change it if we'd like. In the example above where we create two cows, we change the value of the pointer when we untie it from the first cow and tie it to the second. So if cow #1 was stored at 0x887788 and cow #2 was stored at 0x887700, the "value" of cowPointer will change from 0x887788 to 0x887700.

But what if we do this:

cowPointer = 0x445544;

Is that legal? Sure it is. cowPointer stores a value, and 0x445544 is a value, so let's change the value of cowPointer to 0x445544. And what do we have now? We have a leash that, when followed, will take us to a Cow object stored in memory address 0x445544. But is that really true? Can we be sure that there really is a Cow object stored at that address? No we cannot. When we do this:

cowPointer = new Cow();

we know 100% that cowPointer will be tied to a "real" cow located in a valid memory address because the language guarantees that (unless we get exceptions, in which case we have bigger problems - so let's not go there today).

But, when we assign an arbitrary address such as 0x445544 to cowPointer, we're literally throwing the leash out into the woods, hoping it ties itself onto something, and that whatever that the thing it ties itself to is a Cow object. We're basically gambling. So at that point, if we call

cow->milk();

we will most likely get a core dump and program crash, or some hard-to-debug, undefined behavior later in a completely unrelated part of our program. The reason for that should be obvious, but if it isn't, consider this: if I throw the leash into the woods, and it latches onto a gorilla, would you want to milk that gorilla?

Intriguing Example

So now, consider this:


class Cow {
public:
				void milk() {
								std::cout << "I'm a cow being milked." << std::endl;
				}
};

int main() {
				Cow* cowPointer = 0;
				cowPointer->milk();
				return 0;
}

What will happen when you run this program? I was working on an example similar to this with a guest of 8th Light (Meesu) on Friday, and tried to show her a seg fault by dereferencing 0 (except instead of milk- able cows we had run-able cars). I compiled, ran (expecting to see a seg fault), but instead I saw this:

"I'm a cow being milked."

Yes, it actually did what it was not supposed to do. I don't create any cows, I dereference and operate on address 0 (which is a big no no), and yet the program runs to completion, it milks some cow somewhere, and everybody's happy. How does that not crash?

Once the awe and surprise passed, and Eric Smith came into the battlefield and quickly pointed out that the milk() function doesn't use any member variables. The class Cow is a class that, if I create a million instances of, they will all be identical in structure. Why is that? Because they have no data.

Static functions v.s. Member functions

Really briefly now: classes define functions. There can be static functions or member functions.

class MyClass {
public:
				static void staticFunction() {
								// do something
				}

				void memberFunction() {
								// do something else
				}
};

You can call them like this:

MyClass::staticFunction(); // calls the static function

MyClass* myInstance = new MyClass();
myInstance->memberFunction(); // call member function

The main difference here is in the way that they are used. You can call static functions by simply referencing the class itself. But to call member functions, you need to create an instance of that class (i.e. using new) and then call the member function "on" that instance.

So on the surface, they're different. But under the boards, in memory and compiled code, there really is no difference. The class MyClass is compiled into a groups of instructions that we call "functions". Functions are nothing more than logical operations on some data (which may vary each time the function is called). But think about it: if memberFunction performs some logic related to a specific instance of a class, does that "logic" vary from instance to instance? No, it doesn't. The data varies, yes, but not the logic. And that's where instances come into play. An instance of a class really only stores data specific to that instance. The logical operations (a.k.a functions) are the same for all instances (they are literally one set of instructions, in one location in memory, and shared by all instances).

Now if you don't already know where I'm going with this, you may be asking yourself: if all instances of an object use the same instructions located in the same physical address space in memory, how come when I call the same instruction on different objects, different things happen? One slight difference between member and static functions, is that member functions automatically get passed one extra, hidden parameter - the "this" pointer. You don't see it in the function's signature, but the compiler allows you to use "this" inside each member function. And what is "this"? "this" is a pointer to the instance on which you called the member function.

We're almost done so bear with me. Consider this snipped:

class Cow{
public:
				std::string name;

				void printName(){
								std::cout << "My name is " << name << std::endl;
				}
};

int main() {
				Cow* cow1 = new Cow();
				cow1->name = "Betty";

				Cow* cow2 = new Cow();
				cow2->name = "Sue";

				cow1->printName();
				cow2->printName();

				return 0;
}

There will only be one definition of printName() in memory, and both cow1 and cow2 will use that same definition. The only difference is that the function will be passed an invisible 'this' pointer, whose value will be cow1 when you do cow1->printName() and cow2 when you do cow2->printName(). When the function tries to print the name of that cow, it will automatically follow the 'this' pointer and access the data stored in "name" (which will be different for each cow).

Finally, going back to this example:

class Cow{
public:
				void milk(){
								std::cout >> "I'm a cow being milked." >> std::endl;
				}
};

int main(){
				Cow* cowPointer = 0;
				cowPointer->milk();
				return 0;
}

Why doesn't the program crash? We're clearly following a pointer that takes us to memory located at 0 - why doesn't the OS kill our process? Well, I didn't quite tell you the whole truth before. We don't always follow the leash when we do cowPointer->milk(). Remember, behind the scenes what's happening is that we're calling function milk() on class Cow, and magically passing it the pointer cowPointer. milk is a valid function in memory, because it was compiled and stored alongside the class Cow itself. The function executes (it prints some text to cout), but it never really uses the this pointer. There are no instance variables being used in that function, so it never really goes to read or write to memory at address 0. And since we're not touching that memory, the OS allows us to continue as if we never did anything wrong.

And this is how we can pretend to be tough like Chuck Norris, deference NULL (0) pointers, and still get away with it.

Now, by no means am I advocating that you write programs like this. I just wanted to shed some light on how C++ compiles classes behind the scenes. If you're explicitly setting the values of your pointers to some predefined address space, you're playing Russian Roulette with your program. Experiment and try it yourself!