Understanding C object construction and destruction

So many programs these days are implemented in C++, be it because they are using some framework, some library, because somebody “back then” thought C++ was the greatest thing since sliced bread, or simply because.

It doesn’t matter actually. Fact of the matter is that when we reverse stuff, we often, often land up in C++ territory. Hence, I felt that a treatment of the matter would be useful, as I wanted to clarify some thoughts and intuitions I had, in a systematic way, especially in how objects are constructed and destroyed, and the role of the compiler – how the compiler looks at C++ source code and translates (“adds” stuff) to the final assembly/binary output.

Yes, I’m a big fan of systematic treatment.

Construction: Object Construction

In general there are two ways a C++ object can be created – on the stack, and on the heap.

Stack:

class A {
public:
  int a, b;
};

void main() {
  A a;
}

.text:00401090 push    ebp             ; Function prologue.
.text:00401091 mov     ebp, esp
.text:00401093 sub     esp, 50h        ; Grow the stack (includes all
                                       ; the other local variables).

.text:00401096 lea     ecx, [ebp+a]    ; "a" is on the stack.
.text:00401099 call    A::A(void)      ; Call the constructor (passing
                                       ; "this", the object's location,
                                       ; which is on the stack.

When objects are created on the stack, the stack is grown (as per normal in the function prologue) together with the other local variables in the function. Subsequently, code is generated such that the object’s constructor is called.

Heap:

class A {
public:
  int a, b;
};

void main() {
  A *a = new A();
}

.text:004010AE push    0Ch                 
.text:004010B0 call    operator new(uint)  ; Allocate memory.
.text:004010B5 add     esp, 4
.text:004010B8 mov     [ebp+this], eax     ; Point "this" to memory.

.text:004010BB cmp     [ebp+this], 0
.text:004010BF jz      short allocFailed   ; Check if allocation failed.

.text:004010C1 mov     ecx, [ebp+this]     
.text:004010C4 call    A::A(void)          ; Call the constructor (passing
                                           ; "this", the object's location,
                                           ; which is on the heap.

.text:004010C9 mov     [ebp+var_4C], eax   ; Just some "temp" variable.
.text:004010CC jmp     short follow

.text:004010CE
.text:004010CE allocFailed:
.text:004010CE mov     [ebp+var_4C], 0

.text:004010D5 follow:
.text:004010D5 mov     eax, [ebp+var_4C]
.text:004010D8 mov     [ebp+a], eax        ; Store the pointer "a".

When objects are created on the heap, “operator new” is called to allocate the memory, and then the object’s constructor is called.

In general, the constructor takes a parameter, this. this represents the memory for the object instance, either from the stack or heap (see above).

What is important to know is that the first item in this is reserved for a virtual function table pointer (one DWORD, for 32-bit architectures), which will be set to point to a global class specific table of virtual function pointers. The rest of this memory is for parent/member data objects.

Also, that special virtual function table pointer will only exist if that class has virtual methods. Else it doesn’t exist and everything is shifted up by one DWORD (for 32-bit architectures).

Interlude: Virtual Functions and Virtual Tables

In the very simple example above, the class A did not have virtual methods.

Here is a class with both a non-virtual method (A__func1) and a virtual method (A__func2), and a class C that derives class A.

class A {
public:
  int a, b;
  void A__func1() { printf("A__func1 in An"); }
  virtual void A__func2() = 0;
};

class C {
public:
  int a, b;
  void A__func1() { printf("A__func1 in Cn"); }
  virtual void A__func2() { printf("A__func2 in Cn"); }
};

void main() {
  A *a = (A *)new C();

  a->A__func1();  // prints "A__func1 in A"
  a->A__func2();  // prints "A__func2 in C"
}

Virtual methods are methods such that when a class derives class A and overrides the virtual method, the derived class’s method is always called. On the contrary, when a non-virtual method is overriden, the method of the type of the object is called.

Here’s what MSDN has to say about virtual functions:

A virtual function is a member function that you expect to be redefined in derived classes. When you refer to a derived class object using a pointer or a reference to the base class, you can call a virtual function for that object and execute the derived class’s version of the function.

Hence, the output above occurs. The type of a is A, and A__func1 is non-virtual. Hence, A::A__func1() is called. The type of a is A, but A__func2 is virtual. Hence, C::A__func2() is called.

How then does the compiler generate code that knows what the “most derived class” is, and hence what method it should call? In this example, how does class C (referenced as *a) know to call C::A__func2() and not A::A__func2()?

Various techniques exist, but by far the most common one is through the use of virtual function tables (vftables, or vtables). We will not go into a treatment of vftables here, as this is a topic well discussed and documented.

Construction: Order of Construction

What exactly happens inside the constructor? When writing C++ code, we just write the code that we want to execute when an object is created. However, the compiler translates that to much more.

Here’s what happens:

  • If the object has a base class, the base class constructor is called. If the base class has a base class, the process is repeated.

  • If the object has virtual functions, the vftable pointer (first DWORD) is set.

  • If the object has class member objects, it calls their constructors passing in their respective memory locations.

Execute the programmer’s written constructor code.

I use a mnemonic to help to remember this. Setting up the vftable is crucial, so let’s take that out. As per the bold words above, the mnemonic is BMW (you know, the German car brand). vftable pointer setup goes in slot 2.

Destruction: Object Destruction

In general there are two ways a C++ object can be destroyed – it going out of scope (for objects on the stack), and it being explicitly destroyed (for objects on the heap).

Stack:

class A {
public:
  int a, b;
};

void main() {
  A a;
}

Same example from above. When main() terminates, a goes out of scope. This means it’s time to destroy it. The compiler has to generate code to do that explicitly. It’s not automatic.

.text:0040112B lea     ecx, [ebp+a]    ; this
.text:0040112E call    A::~A(void)
.text:00401133 mov     esp, ebp
.text:00401135 pop     ebp
.text:00401138 retn

Here we see an explicit call to the destructor at the end of the function. Subsequently the stack is trashed (as usual in the function epilogue), which removes the memory for the object.

Heap:

class A {
public:
  int a, b;
};

void main() {
  A *a = new A();
}

.text:00418097 mov     ecx, [ebp+a] 
.text:0041809A call    X::~X(void)        ; Call the destructor, passing
                                          ; in this.
               ...
.text:004180A7 mov     ecx, [ebp+this]
.text:004180AA push    ecx                  
.text:004180AB call    operator delete(void *)  ; Free the memory

Hence, when an object is deleted and it does not have a virtual destructor, the known destructor function is called, followed by the release of heap memory through “operator delete”.

Destruction: Virtual Destructors

When an object is deleted and it has a virtual destructor, the virtual destructor function in the vftable is called instead.

Now, the virtual destructor in the vftable is not the destructor function written by the programmer. Instead, it’s a compiler generated “deleting destructor”. It is so named because that function has two intents: (1) run the destructor, and (2) delete the memory, all in one function.

Since each compiler is different, we’ll talk about MSVC here. There are two forms of “deleting destructors” that the compiler will generate. MSVC will either generate a “scalar deleting destructor”, or a “vector deleting destructor”.

Here’s how it goes:

  • If no delete[] is used, MSVC generates the “scalar deleting destructor”. The “scalar deleting destructor” calls the actual destructor function followed by “operator delete”.

  • If delete[] is used at all, MSVC generates the “vector deleting destructor”. The only difference is that it calls the destructor against every object in the array of objects. It follows this by calling “operator delete[]“.

Note: Documents say that MSVC9 will not generate “scalar deleting destructor” if there is a “vector deleting destructor”. Hence, the “vector deleting destructor” must fulfill both roles. It does this through it’s rgument. 1 means delete, and 3 means delete[].

However, I notice that MSVC9 and MSVC10 do not strictly follow this. I have seen both scalar and vector deleting destructors in a binary.

Here’s the actual code inside a scalar deleting destructor:

.text:00418090 public: void * __thiscall X::`scalar deleting destructor'(unsigned int) proc near
.text:00418090
.text:00418090 this= dword ptr -4
.text:00418090 arg_0= dword ptr  8
.text:00418090
.text:00418090 push    ebp                      ; Prologue stuff
.text:00418091 mov     ebp, esp
.text:00418093 push    ecx                      ; This pointer, "thiscall"
.text:00418094 mov     [ebp+this], ecx          ; Store in "this" variable
.text:00418097 mov     ecx, [ebp+this]
.text:0041809A call    X::~X(void)              ; Call destructor, pass "this"
.text:0041809F mov     eax, [ebp+arg_0]
.text:004180A2 and     eax, 1
.text:004180A5 jz      short follow             ; Do stuff based on the argument passed in

.text:004180A7 mov     ecx, [ebp+this]
.text:004180AA push    ecx            
.text:004180AB call    operator delete(void *)  ; Free memory
.text:004180B0 add     esp, 4

.text:004180B3
.text:004180B3 follow:
.text:004180B3 mov     eax, [ebp+this]          ; Return "this", by convention
.text:004180B6 mov     esp, ebp
.text:004180B8 pop     ebp
.text:004180B9 retn    4
.text:004180B9 public: void * __thiscall X::`scalar deleting destructor'(unsigned int) endp
.text:004180B9

I will not show the code for a “vector deleting destructor” since it a bit complex for a blog listing. Feel free to toss it up in IDA.

Destruction: Order of Destruction

What exactly happens inside the destructor? Again, when writing C++ code, we just write the code that we want to execute when an object is destroyed. However, the compiler translates that to much more.

Here’s what happens:

  • If the object has virtual functions, the vftable pointer (first DWORD) is set.

  • Execute the programmer’s written destructor code.

  • If the object has class member objects, it calls their destructors passing in their respective memory locations.

  • If the object has a base class, the base class destructor is called. If the base class has a base class, the process is repeated.

The mnemonic is just flipped to be WMB. vftable pointer setup goes in slot 1.

Appendix, Kind Of

Here’s a source listing that you can compile and open in IDA to play with:

#include <stdio.h>

class X {
  int X_a, X_b;

public:
  X() { printf("X constructor.n"); }
  ~X() { printf("X destructor.n"); }
  virtual void function_X(int a, int b) {}
};

class A {
  int A_a, A_b;

public:
  A() { printf("A constructor.n"); }
  ~A() { printf("A destructor.n"); }
  virtual void function_A(int a, int b) {}
};

class B {
  int B_a, B_b;

public:
  B() { printf("B constructor.n"); }
  ~B() { printf("B destructor.n"); }
  virtual void function_B(int a, int b) {}
};

class C : public A, public B {
  int C_a, C_b;

public:
  C() { printf("C constructor.n"); }
  ~C() { printf("C destructor.n"); }

  virtual void function_A(int a, int b) {
    printf("function_A. %i, %in", a, b);
  }

  virtual void function_B(int a, int b) {
    printf("function_B. %i, %in", a, b);
  }
};

int main() {
  A a;
  B b;
  C c;

  X *x = new X();
  X *xv = new X[5];

  c.function_A(1, 2);
  c.function_B(3, 4);

  delete x;
  delete[] xv;
}