Unwind Data Can't Sleep - Introducing InsomniacUnwinding

Hi all, in this blog we will discuss sleep masking in detail, the default assumptions that come with it, and how we are going to break those assumptions with a novel approach called InsomniacUnwinding.

Lorenzo Meacci @kapla

3/29/202611 min read

Prerequisites

This is a continuation of my previous blog Bypassing EDR in a Crystal Clear Way, so I recommend reading that first to fully understand what we are building on today.

The source code for today's POC is here -> InsomniacUnwinding

Quick Recap

In the previous blog, we created a reflective loader using Crystal Palace that worked with Cobalt Strike beacon. If you remember, sleep masking was implemented in a separate PICO (Position Independent Code Object) that lived in unbacked RX memory. Because of its nature, code originating from unbacked memory will not have a legitimate looking call stack. The why was discussed in detail in that blog, and for this reason call stack spoofing techniques like Draugr needed to be implemented.

You might also remember that beacon was overloaded into a legitimate DLL and its .pdata section was registered, allowing Windows to properly unwind beacon's API calls and functions. But during the sleep cycle, the Sleep() API was hooked via the IAT and redirected to the PICO, which then encrypted the beacon sections before calling the real Sleep API via spoofing. The flow looked something like this:

The fact that spoofing was needed for the entire duration of the beacon's life frustrated me, and I felt this was a "waste" of potential for that fantastic overloading technique. So that's when I started playing around with memory masking and exploring what we could do to preserve the unwind info at sleep time. But that's not the whole story, as we will see in a bit. Where the sleepmask code lives is just as important as the unwind preservation technique we will discuss.

Unwind deep dive

To understand what we will do next, we first need to know what Windows does when unwinding and resolving the stack of a running application.

x64 Unwinding Basics

On x86, stack walking was straightforward. Functions typically set up a frame pointer using push ebp; mov ebp, esp, creating a linked list of frame pointers that debuggers and exception handlers could follow. You could walk the stack by just chasing EBP values.

x64 changed everything. For performance reasons, most functions no longer use frame pointers. RBP is often used as a general purpose register instead. So instead of using frame pointers Windows uses metadata that exists in the PE itself.

The Chain of References

When Windows needs to unwind the stack (for debugging, exception handling etc), it follows a chain of references baked into the executable.

  • The PE headers contain a DataDirectory array. Entry index 3 (IMAGE_DIRECTORY_ENTRY_EXCEPTION) holds the RVA and size of the exception data. This points to the .pdata section.

  • the .pdata section is an array of RUNTIME_FUNCTION structures, one per function in the PE

  • The UNWIND_INFO structures live in .rdata and describe exactly how to reverse each function's prolog (asm instructions added by the compiler for setting up the stack and registers before a function starts):

How the Unwinder Works

When RtlVirtualUnwind needs to find the caller of a function, it:

  • Takes the current instruction pointer (RIP)

  • Binary searches .pdata to find which RUNTIME_FUNCTION contains that address

  • Follows UnwindInfoAddress to locate the UNWIND_INFO structure in .rdata

  • Reads the UNWIND_CODE array and "undoes" the prolog in reverse: if a register was pushed, pop it. If stack space was allocated, add it back to RSP.

  • After reversing the prolog, RSP points to the return address

  • Pops the return address into RIP and repeats

This continues until RIP is NULL or lands in kernel space, at which point the stack has been fully walked.

Why do we care?

When we perform sleepmasking we are encrypting all beacon memory starting from the base of the image base address, this way we encrypt everything including:

  • The PE headers (Windows cannot find the exception directory)

  • The .pdata section (Windows cannot find RUNTIME_FUNCTION entries)

  • The .rdata section (Windows cannot read UNWIND_INFO structures)

If we encrypt any of these RtlVirtualUnwind will fail and the stack walker cannot determine frame sizes, cannot figure out where return addresses are, and cannot walk backwards through the call chain. Your call stack turns into garbage.

This is why naive encryption breaks stack walking entirely, and why we need to be surgical about what we preserve.

The limitation with current sleepmasks and the Ekko case study

Sleepmasks come in various flavors depending on where they are positioned in memory and how they perform the encryption. For instance, Cobalt Strike's sleepmask is a BOF and is allocated in memory depending on how the settings are configured in the process-inject block in the malleable C2 configuration, so the beacon and the sleepmask live in separate memory regions. On the other hand, Havoc C2 implements the sleepmask directly into the demon (Havoc payload) code.

How the encryption is triggered depends exactly on where the sleepmask is implemented. Ekko creates a chain of timers to perform the encryption. Timers basically tell the OS to execute something at a specific time in a separate thread context. Because all of this is handled by the OS, the entire demon image can be encrypted in memory and the OS will decrypt it later. In KaplaStrike we had no need to use timers and we could encrypt memory directly because the PICO was never encrypted at any point and so it could call APIs directly.

The problem with the timer approach is that timer callbacks execute in the Windows Thread Pool, not in your thread's context. When NtContinue switches execution to your ROP chain, the original thread initialization frames are gone. The frames you would normally see at the bottom of any legitimate call stack (BaseThreadInitThunk, RtlUserThreadStart) simply do not exist in this context.

this is the Ekko code:

Here is what an Ekko style sleep looks like when inspected:

Notice how InsomniaOverloading.exe!EkkoObf resolves correctly, but everything below is garbage. The unwinder walks through the function, hits the context boundary, and has nothing left to unwind through. The thread pool worker was initialized differently, and there is no RUNTIME_FUNCTION chain leading back to a clean thread start.

This is not a bug in Ekko. This is how timers work by design. They execute in a different context, and that context does not have the initialization frames your main thread has.

This is exactly why every Ekko style implementation requires call stack spoofing. The frames do not exist, so you have to fake them.

What happens if we patch back anyway?

For the sake of testing I decided to patch back .rdata to see if it actually unwind properly because at the moment .rdata gets encrypted and the unwind can't find the UNWIND_INFO structures.

I modified Ekko to do exactly that. After encrypting the image, a timer fires RtlCopyMemory to restore the plaintext .rdata section:

The frames resolve now. No more garbage addresses:

You might notice we are only preserving .rdata here, not the PE headers or .pdata. So how does EkkoObf resolve in both cases?

For normally loaded executables, Windows parses and indexes the .pdata section at load time. When RtlLookupFunctionEntry needs to find which function contains a given RIP, it queries the already registered function table that the OS loader set up when it mapped the PE. It does not read .pdata directly from your image memory during unwinding. This is why EkkoObf resolves correctly regardless of whether we preserve .rdata or not.

The difference is what happens to the frames below. Without .rdata preserved, RtlVirtualUnwind cannot read the UNWIND_INFO structures to process the frame, so the debugger just dumps the raw stack value as a garbage address (0x1cc4f271470). With .rdata preserved, the unwinder successfully reads the unwind codes and calculates a "return address", but that address is still garbage from the thread pool context. It just happens to fall near NtDuplicateToken, so the debugger shows that symbol instead of a raw address.

This distinction matters for our later work with stomped modules. When you overload a beacon into a sacrificial DLL, the original DLL's function table is registered with the wrong functions. You must call RtlAddFunctionTable to register your beacon's .pdata, and that registration points directly at the in-memory .pdata. For stomped modules, we need to preserve headers, .pdata, and UNWIND_INFO because there is no OS loader caching our function table.

But the call stack is still missing the BaseThreadInitThunk and RtlUserThreadStart frames. This is the fundamental problem with timers. Even if we preserve all the unwind data perfectly, the execution context is still wrong. The thread pool worker thread was not initialized through the normal CreateThread path, so those frames simply do not exist to unwind through.

Preserving .rdata let the unwinder process our function, but it cannot fix the missing initialization frames. For that, we need to rethink where and how the sleepmask executes entirely.

The architecture - My POC

Again the POC is here - InsomniacUnwinding

I put my hands forward and I want to specify that this POC is only for demonstration purposes, the actual architecture for C2 frameworks will be discussed later in the blog. This POC was created to have a simple implementation of the technique without the need to code and modify other projects that would over complicate the explanation. Sorry but you will have the duty to actually implement it :P

The Goal

We need the beacon's thread to stay intact during sleep. The thread that called sleep must be the same thread that wakes up, with all its original initialization frames still on the stack.

The Solution: Cross Process Sleep Masking

The sleepmask lives in a completely separate process. The beacon does not encrypt itself. Instead, it asks another process to encrypt it from the outside.

The beacon connects to a named pipe, sends a request with its PID, image base, and sleep duration, then blocks on ReadFile waiting for a response. The sleepmask process receives the request, opens a handle to the beacon process, encrypts its memory, sleeps, decrypts, and sends a response back. The beacon wakes up and continues.

Why Spoofing Is Not Needed

In this POC both processes are regular executables, but imagine both were stomped into legitimate DLLs. The beacon lives in backed memory with registered .pdata. The sleepmask also lives in backed memory. When the sleepmask calls Sleep, ReadProcessMemory, WriteProcessMemory, those calls originate from backed addresses. No spoofing needed on either side.

The beacon's thread is just blocked on a pipe read. It never leaves its original context. If we keep the unwind data readable, the call stack will resolve cleanly through the stomped module, down to BaseThreadInitThunk and RtlUserThreadStart.

Preserving Unwind Data

From the sleepmask's perspective, the process is straightforward:

  1. Receive the sleep request containing the beacon's PID, image base, image size, and sleep duration

  2. Open a handle to the beacon process

  3. Read the beacon's entire image into a local buffer

  4. Save copies of the regions we need to preserve

  5. Encrypt the local buffer

  6. Patch back the preserved regions over the encrypted bytes

  7. Write the modified buffer back to the beacon process

  8. Sleep for the requested duration

  9. Repeat the process in reverse to decrypt

The IPC protocol is simple. We use the following structures to communicate and pass data between the two processes.

First Implementation: Preserving Full Sections

Based on what we know about unwinding, we need to preserve:

  • PE headers (contains DataDirectory[IMAGE_DIRECTORY_ENTRY_EXCEPTION])

  • .pdata section (the RUNTIME_FUNCTION array)

  • .rdata section (contains UNWIND_INFO structures)

Now I know what you are thinking: ".rdata can contain signatures! And we are patching it back?!" At the moment yes, but don't worry, a surgical approach is showcased later (and is actually what is inside the POC on GitHub).

The preservation logic finds each section and saves a plaintext copy before encryption:

After encryption, we patch back the preserved regions:

The call stack now resolves correctly during sleep:

Every frame resolves through the beacon module down to BaseThreadInitThunk and RtlUserThreadStart. No spoofing required.

The Problem: YARA Still Hits

The call stack works, but we have a different problem. The .rdata section contains more than just UNWIND_INFO structures. It contains string literals, const arrays, import names, vtables, and other signaturable artifacts.

I added some signature bytes to the beacon to test this:

And created this YARA rule:

Running YARA during sleep:

The .data signature is encrypted and gone. But the .rdata signature is still there because we preserved the entire section. Any signatures living in .rdata remain visible to memory scanners during sleep.

Preserving full sections fixes stack walking but defeats the purpose of sleep masking. We need to preserve less.

Surgical UNWIND_INFO Extraction

We do not need the entire .rdata section. The stack unwinder only reads the specific UNWIND_INFO structures that .pdata references. Everything else in .rdata (strings, const data, vtables, import names) is irrelevant to stack walking.

The approach is simple: parse .pdata, follow each UnwindInfoAddress, calculate the exact size of each UNWIND_INFO structure, and preserve only those bytes.

Finding the UNWIND_INFO Structures

As we saw earlier each RUNTIME_FUNCTION in .pdata contains an UnwindInfoAddress field pointing to an UNWIND_INFO structure in .rdata:

We iterate through every RUNTIME_FUNCTION, extract the UnwindInfoAddress, and record that location. Multiple functions can share the same UNWIND_INFO if they have identical prologs, so we track unique addresses to avoid duplicates.

Calculating UNWIND_INFO Size

UNWIND_INFO is a variable length structure. The size depends on the number of unwind codes and optional trailing data:

The size calculation:

The Extraction Logic

We build a list of regions to preserve: PE headers, .pdata, and each individual UNWIND_INFO:

Result

Sleepmask Output:

Yara scan:

for comparison the size of .rdata is 6108 bytes:

and we only patched back 252 bytes!!! That's about 4% of the section. The other 96% (strings, const data, signatures) gets encrypted.

Real World Architecture

The POC uses two separate processes for simplicity. In production, this would be a bit more complex to implement.

Same process two stomped modules

The beacon and sleepmask would both live in the same process, each stomped into a different sacrificial DLL. For example:

  • Beacon stomped into msxml6.dll

  • Sleepmask stomped into WsmSvc.dll

When beacon calls the sleepmask to encrypt and sleep, the call stack would show frames from both modules. This is fine because both are backed by legitimate DLLs on disk. The entire chain resolves to backed addresses.

It would probably look something like this:

The beacon frames point to encrypted memory, but the addresses are still valid and the UNWIND_INFO is preserved. The stack walks correctly.

Cobalt Strike 4.10+ Integration

Cobalt Strike 4.10 introduced the ALLOCATED_MEMORY structure for User Defined Reflective Loaders. This allows operators to control how and where BOFs are allocated.

Before 4.10, the sleepmask BOF was allocated wherever Cobalt Strike decided using one of the methods exposed via the malleable c2 options, usually this results in unbacked RX memory. With ALLOCATED_MEMORY, you can specify a custom allocation strategy. This means the sleepmask BOF could be loaded into a second stomped module, keeping the entire execution chain in backed memory.

The Critical Point

The InsomniacUnwinding technique works regardless of architecture. The important constraint is where the sleepmask executes from.

If the sleepmask runs from unbacked memory, the call stack will show unbacked return addresses during the Sleep call. You are back to needing spoofing.

If the sleepmask runs from backed memory (stomped module), the call stack is clean. No spoofing needed.

The unwind preservation handles the beacon side. The sleepmask location handles the sleep call side. Both must be in backed memory for a fully clean call stack without spoofing.

Conclusions

This research was done out of pure fun and curiosity. I also like to challenge myself when some techniques are considered mandatory to achieve the desired output.

I know this screenshot demonstrates nothing. I just find it funny xD

I hope to see this technique implemented by someone less lazy than myself. For now this is all. Happy hacking!