About this pointers and vtables

We have all heard of the infamous C++ this pointer, and we all know that it’s basically some magical, hidden pointer that is passed along whenever an instance of a C++ class is accessed. Also, somewhere along the way we may have heard of the a virtual function table (vtable). However, perhaps it would be interesting to explore what exactly is these concepts are, in code, and how to peer under the hoods to take a look at just how magical it is.

Let’s start, in C, with a simple program, that has a struct in it (we’ll use typedefs to make things syntactically simpler):

typedef struct {
  int count;
  char buf[100];
} MyStruct;

int main() {
  MyStruct *m;
  m = (MyStruct *)malloc(sizeof(MyStruct));
  // Initialize (mimic constructor)
  m->count = 1;
  m->buffer[0] = NULL;
}

The one fundamental thing that differentiates a class from a struct (in most conventions) is the ability to have class-functions, or otherwise known as member functions. These functions are pieces of code that belong to the class, and are accessible only through the class (we’re not talking about public or private here, this is in reference to how those functions are called).

Let’s mimic a class’s member functions using our struct:

typedef int pMyFuncA (char *);
typedef int pMyFuncB (char *);

typedef struct {
  pMyFuncA *MyFuncA;      // Pointer to our functions
  pMyFuncB *MyFuncB;      // (mimic class member functions)
  int count;
  char buf[100];
} MyStruct;

int MyFuncA(char *s) {    // Function definition (code of function)
  return 0;
}

int MyFuncA(char *s) {    // Function definition (code of function)
  return 1;
}

int main() {
  MyStruct *m;
  m = (MyStruct *)malloc(sizeof(MyStruct));
  // Initialize (mimic constructor)
  m->count = 1;
  m->buffer[0] = NULL;
  m->MyFuncA = MyFuncA;
  m->MyFuncB = MyFuncB;
}

Essentially what we added is a pointer to a function in the struct, and then defined that function, and in the initialization code, made the pointer point to the function. With that, if we want to call the function, the code would simply look like the following:

int retval;
retval = m->MyFuncA("This string goes to MyFuncA");

With such an approach, it is sufficient to mimic the concept of member functions in classes, without such a class construct. However, we can take it one step further, and instead of having as many pointers in the struct as there are “member” functions, we can generalize all of those pointers into a table, and simply store a pointer to that table. Such a table is commonly referred to as a virtual function table, vtable, or vftable.

Here’s how it looks in code:

typedef int pMyFuncA (char *);
typedef int pMyFuncB (char *);

typedef struct {
  MyStructVTable *pVTable;  // Changed to single pointer to vtable
  int count;
  char buf[100];
} MyStruct;

typedef struct {            // The vtable itself
  pMyFuncA *MyFuncA;
  pMyFuncB *MyFuncB;
} MyStructVTable;

int MyFuncA(char *s) {    // Function definition (code of function)
  return 0;
}

int MyFuncA(char *s) {    // Function definition (code of function)
  return 1;
}

int main() {
  // Initialize vtable
  static const MyStructVTable vtable;
  vtable->MyFuncA = MyFuncA;
  vtable->MyFuncB = MyFuncB;
  MyStruct *m;
  m = (MyStruct *)malloc(sizeof(MyStruct));
  // Initialize (mimic constructor)
  m->count = 1;
  m->buffer[0] = NULL;
  m->pVTable = &vtable;
}

Calling our functions is just mildly different now, with the additional level of indirection. The code would simply look like the following:

int retval;
retval = m->pVTable->MyFuncA("This string goes to MyFuncA");

This is precisely how C++ classes work in relation to virtual member functions. Virtual member functions are basically functions that can be overridden by inheriting classes (see Wikipedia for more information). Each virtual function’s implementation (pure virtual functions do not have an implementation) is compiled into a function, as per normal, and each class object has a vtable pointer that points to a vtable of all virtual functions. The vtable pointer is usually stored, in memory, as the first DWORD (for 32-bit) of the object. The vtable itself exists in the .data segment of the binary (statically).

To complete the picture, we need to create the “this” pointer. As we know, it is a hidden pointer passed to every object (of a class). The main reason is that functions, as explained above, as simply pieces of code, and those pieces of code, by itself, do not know what class object it is acting on. Hence, at the risk of detracting away from our code above, consider the following C++ code (lots of stuff removed intentionally):

class MyClass {
  int a;
  void DoSomething() {
    a = 16;
  }
}

When DoSomething() is compiled, the code absolutely does not know which “a” it is accessing. It knows that it is supposed to access the class instance (object) specific “a”, but exactly where is that object? That is the reason for the this pointer. This points to the object that the code is supposed to be accessing.

Hence, how do we implement the concept of “this” in our struct? Simply by literally passing the pointer. Here’s the code:

typedef int pMyFuncA (MyStruct *this, char *);
typedef int pMyFuncB (MyStruct *this, char *);

typedef struct {
  MyStructVTable *pVTable;  // Changed to single pointer to vtable
  int count;
  char buf[100];
} MyStruct;

typedef struct {            // The vtable itself
  pMyFuncA *MyFuncA;
  pMyFuncB *MyFuncB;
} MyStructVTable;

int MyFuncA(MyStruct *this, char *s) {      // Added *this pointer
  return 0;
}

int MyFuncA(MyStruct *this, char *s) {      // Added *this pointer
  return 1;
}

int main() {
  // Initialize vtable
  static const MyStructVTable vtable;
  vtable->MyFuncA = MyFuncA;
  vtable->MyFuncB = MyFuncB;
  MyStruct *m;
  m = (MyStruct *)malloc(sizeof(MyStruct));

  // Initialize (mimic constructor)
  m->count = 1;
  m->buffer[0] = NULL;
  m->pVTable = &vtable;
}

And of course, when calling our functions, we need to respect the additional this pointer as well:

int retval;
retval = m->pVTable->MyFuncA(m, "This string goes to MyFuncA");

Hope this gives a taster into what this pointers and vtables are all about. However, this is certainly not an in-depth treatment of how compilers handle classes, and no low-level code was exposed here (such as that the this pointer is passed in register ecx for 32-bit architectures running Windows). For instance, it may be interesting to take a look at how exactly inherited objects are laid out in memory, and how their vtable pointers are structured (by general convention). Also, note that vtable pointers are not defined in the C++ specification, and compilers are generally free to implement them (or even use alternatives to them). However, most compilers use vtables.