Peering into the magic of stack unwinding

“WARNING: Stack unwind information not available. Following frames may be wrong.”

That’s a very familiar message to many reverse engineers and developers doing assembly level debugging (for instance, using Windbg). Even if the message doesn’t appear, for instance a common case that I observe is when getting the usermode stack trace (UST) from !heap with page guard enabled.

However, often we don’t fully understand what all these means. Much as we would like to trust the debugger to give us beautiful and accurate stack traces (or stack unwinds), it simply doesn’t, because too often, it doesn’t have enough information to do so, and hence relies on simple techniques or algorithms to “best guess” the stack trace.

Hopefully, this article sheds some light on what’s going on behind the scenes.

Note: This applies predominantly to 32-bit Windows. 64-bit is different and should be looked at with a new set of eyes. Also, I took the stack trace here from an article off the Internet, for convenience.

Let us assume that we are at a function called MSVCRT!_output+0x18. Here’s the stack trace we’ll be playing with. This is a dds ebp in windbg:

Stack:

0012fef4  0012ff38                          <-- EBP (current)
0012fef8  77c3e68d MSVCRT!printf+0x35
0012fefc  77c5aca0 MSVCRT!_iob+0x20
0012ff00  00000000
0012ff04  0012ff44
0012ff08  77c5aca0 MSVCRT!_iob+0x20
0012ff0c  00000000
0012ff10  000007e8
0012ff14  7ffdf000
0012ff18  0012ffb0
0012ff1c  00000001
0012ff20  0012ff0c
0012ff24  0012f8c8
0012ff28  0012ffb0
0012ff2c  77c33eb0 MSVCRT!_except_handler3
0012ff30  77c146e0 MSVCRT!`string'+0x16c
0012ff34  00000000

0012ff38  0012ffc0                          <-- EBP (one level down)
0012ff3c  00401044 temp!main+0x44
0012ff40  00000000
0012ff44  77f944a8 ntdll!RtlpAllocateFromHeapLookaside+0x42
0012ff48  00000007
0012ff4c  00000000
0012ff50  00401147 temp!mainCRTStartup+0xe3 <-- Return address?
0012ff54  00000001
0012ff58  00323d70
0012ff5c  00322ca8
0012ff60  00403000 temp!__xc_a
0012ff64  00403004 temp!__xc_z
0012ff68  0012ffa4
0012ff6c  0012ff94
0012ff70  0012ffa0
0012ff74  00000000
0012ff78  0012ff98
0012ff7c  00403008 temp!__xi_a
0012ff80  0040300c temp!__xi_z
0012ff84  77f944a8 ntdll!RtlpAllocateFromHeapLookaside+0x42
0012ff88  00000007
0012ff8c  7ffdf000
0012ff90  c0000005
0012ff94  00323d70
0012ff98  00000000
0012ff9c  8053476f
0012ffa0  00322ca8
0012ffa4  00000001
0012ffa8  0012ff84
0012ffac  0012f8c8
0012ffb0  0012ffe0
0012ffb4  00401210 temp!except_handler3
0012ffb8  004020d0 temp!⌂MSVCRT_NULL_THUNK_DATA+0x80
0012ffbc  00000000

0012ffc0  0012fff0
0012ffc4  77e814c7 kernel32!BaseProcessStart+0x23
0012ffc8  77f944a8 ntdll!RtlpAllocateFromHeapLookaside+0x42
0012ffcc  00000007
0012ffd0  7ffdf000
0012ffd4  c0000005
0012ffd8  0012ffc8
0012ffdc  0012f8c8
0012ffe0  ffffffff
0012ffe4  77e94809 kernel32!_except_handler3
0012ffe8  77e91210 kernel32!`string'+0x98
0012ffec  00000000

0012fff0  00000000
0012fff4  00000000
0012fff8  00401064 temp!mainCRTStartup

Notice that I conveniently broke the stack trace into segments by adding a line between those “segments”. The segments are simply where the “Previous EBP” is stored in the stack. Since for EBP-based stack frames, current EBP points to previous EBP, it is possible to walk through stack frames by following where “Previous EBP” goes to, and continue till we hit bottom or are satisfied. That’s what the above is doing.

The first (and current) EBP is as follows, That means that the previous EBP is at 0012ff38h.

0012fef4  0012ff38                          <-- EBP (current)
0012fef8  77c3e68d MSVCRT!printf+0x35
0012fefc  77c5aca0 MSVCRT!_iob+0x20
0012ff00  00000000
0012ff04  0012ff44

We continue:

0012ff38  0012ffc0                          <-- EBP (one level down)
0012ff3c  00401044 temp!main+0x44
0012ff40  00000000
0012ff44  77f944a8 ntdll!RtlpAllocateFromHeapLookaside+0x42
0012ff48  00000007

We continue (again):

0012ffc0  0012fff0                          <-- EBP (two levels down)
0012ffc4  77e814c7 kernel32!BaseProcessStart+0x23
0012ffc8  77f944a8 ntdll!RtlpAllocateFromHeapLookaside+0x42
0012ffcc  00000007
0012ffd0  7ffdf000

We continue (again, again):

0012fff0  00000000                          <-- EBP (three levels down)
0012fff4  00000000
0012fff8  00401064 temp!mainCRTStartup
0012fffc  00000000
00130000  78746341

And we stop (we are satisfied).

By convention, immediately below EBP is the return address, and we use that to construct who the previous stack frame belonged to. Also, right below the return address, again by convention, are the arguments passed to the called function. Hence, we construct:

MSVCRT!_output+0x18(77c5aca0, 0, 0012ff44)
MSVCRT!printf+0x35(0, 77f944a8, 7)
temp!main+0x44(77f944a8, 7, 7ffdf000)
kernel32!BaseProcessStart+0x23(00401064, 0, 78746341)

Here’s what windbg found (without symbols):

ChildEBP RetAddr  Args to Child
0012fef4 77c3e68d 77c5aca0 00000000 0012ff44 MSVCRT!_output+0x18
0012ff38 00401044 00000000 77f944a8 00000007 MSVCRT!printf+0x35
WARNING: Stack unwind information not available. Following frames may be wrong.
0012ffc0 77e814c7 77f944a8 00000007 7ffdf000 temp+0x1044
0012fff0 00000000 00401064 00000000 78746341 kernel32!BaseProcessStart+0x23

And here’s what it found (with symbols):

ChildEBP RetAddr  Args to Child
0012fef4 77c3e68d 77c5aca0 00000000 0012ff44 MSVCRT!_output+0x18
0012ff38 00401044 00000000 77f944a8 00000007 MSVCRT!printf+0x35
0012ff4c 00401147 00000001 00323d70 00322ca8 temp!main+0x44
0012ffc0 77e814c7 77f944a8 00000007 7ffdf000 temp!mainCRTStartup+0xe3
0012fff0 00000000 00401064 00000000 78746341 kernel32!BaseProcessStart+0x23

We note that without symbols, what we found manually and what the debugger found is the same, because we are using the same technique as the debugger does, when it has no symbolic information (as shown by the WARNING). However, when it does, it gets the picture right, and shows that we are wrong.

Now, what happened? Let’s take a closer look. This is the stack dump near temp!main+0x44, which is where we were right until.

0012ff38  0012ffc0                                  <-- EBP (one level down)
0012ff3c  00401044 temp!main+0x44
0012ff40  00000000
0012ff44  77f944a8 ntdll!RtlpAllocateFromHeapLookaside+0x42
0012ff48  00000007
0012ff4c  00000000
0012ff50  00401147 temp!mainCRTStartup+0xe3         <-- Return address?
0012ff54  00000001
0012ff58  00323d70
0012ff5c  00322ca8

Glancing down the stack dump, we spot an of address which look like a valid return address (it points to code). It’s temp!mainCRTStartup+0xe3. It looks like we completely skipped that call (if we’re guessing right), and went from temp!main+0x44 straight to kernel32!BaseProcessStart+0x23.

Looking at the stack dump, the first thing we notice is that the spot where EBP is supposed to be (above the return address) is 0. Why did that happen? Maybe the callee from temp!mainCRTStartup+0xe3 (i.e. temp!main+0x44) didn’t follow the normal EBP-based stack frame convention and put stuff on the stack BEFORE saving EBP, thus messing with our assumption that EBP is above the return address and that EBPs are chained from function to function perfectly. Or perhaps (which is the case for this trace) the callee from temp!mainCRTStartup+0xe3 (temp!main+0x44) simply doesn’t use EBP at all.

Because EBP didn’t exist for the call to temp!main+0x44 generated an EBP-less stack frame, and hence, when we simplistically walked the stack following EBP to EBP, we completely missed the caller of temp!main+0x44, which is temp!mainCRTStartup+0xe3.

We are now in a position to alter our stack analysis. Let’s first insert the fact that we know there is a call to temp!mainCRTStartup+0xe3.

MSVCRT!_output+0x18(77c5aca0, 0, 0012ff44)
MSVCRT!printf+0x35(0, 77f944a8, 7)
temp!main+0x44(???)
temp!mainCRTStartup+0xe3(???)
kernel32!BaseProcessStart+0x23(00401064, 0, 78746341)

We know that the parameter list to temp!main+0x44 must be wrong, because we missed a function. If the function is temp!mainCRTStartup+0xe3, then we know where to find the parameter list (below the return address of temp!mainCRTSTartup+0xe3). We alter our stack analysis again:

MSVCRT!_output+0x18(77c5aca0, 0, 0012ff44)
MSVCRT!printf+0x35(0, 77f944a8, 7)
temp!main+0x44(??1, 323d70, 323ca8)
temp!mainCRTStartup+0xe3(77f944a8, 7, 7ffdf000)
kernel32!BaseProcessStart+0x23(00401064, 0, 78746341)

And we do the same for mainCRTStartup+0xe3, which is the same list that we (mistakenly) got for temp!main+0x44, for obvious reasons – we only missed one function.

Now, our stack trace looks like the one with symbols.

Hopefully, this gives an insight into why the lack of symbols poses a problem to walking the stack:

  • Some functions may not use EBP, and the debugger doesn’t go and analyze the function to see if it should be careful. It blindly walks EBP

  • Some functions may break convention and put stuff on the stack before saving EBP

  • Both 1 and 2.

In summary, absolutely anything that causes EBP to not be where it’s supposed to be, or not even be there, will break the stack unwinding process.