Archive for the ‘Reverse Engineering’ Category


Continuing the reverse engineering and software development series on debugging, below I review the role of bits in debugging.

Bits 0-7 are the on/off switches of breakpoints.
Bits 8-15 are used for non-debugging purposes in DR7
Bits 16-31 determine the type and length of the breakpoint that is being set

Bits’ L and G fields are defined as followed:
L — local space
G — global space

 To Do: Add to this post or merge it with prior debugging posts

Breakpoints and other Debug Events

There are 3 major types of events you encounter when debugging:

  1. Breakpoints
  2. Memory violations (!!)
  3. Exceptions
buffer overflow, for example,  is probably the most well-known for of memory violation and is both a notorious headache for end-users of buggy software and a breath of fresh air for reverse engineers.
     There are 3 types of breakpoints of interest:
  1. Software Breakpoints
  2. Hardware Breakpoints
  3. Memory Breakpoints
Soft breakpoints are used specifically to halt the CPU and are the most common type of breakpoint. Soft breakpoints are sing-byte instructions that stop execution of the debugger process. In RStudio you can set a soft breakpoint simply by clicking on the left margin of a particular line or function (or by choosing “Rerun with Debugger” in the error traceback dialog). Soft breakpoints can be persistent if they remain after the CPU has executed its tasks or one-shot if they are cleared out. Soft breakpoints are associated with the interrupt 3 (INT3) event.

Using assembly language, to process a soft breakpoint the single byte instruction must be converted into an operation code (a.k.a. an opcode).

In x86 Assembly language an instruction looks like:
which tells the CPU to move the value stored in the EBX register to the EAX register. In native opcode, this would just be:
A common analogy is that x86 Assembly language is to individual opcodes what DNS is to IP addresses. Memorizing thousands of opcodes is impractical, whereas assembly language is much more manageable.

Hardware breakpoints, associated with the INT1 event, are useful when a small number of breakpoints are needed and the software can’t be modified. When registers are used in this way they’re known as debug registers.

Most CPU’s have 8 debug registers denoted as DR0 through DR7. Up to 4 of the 8 registers can be used for breakpoints and each can only set up to 4 bytes at a time. This means you can only use up to 4 hardware breakpoints at a time. DR0 to DR3 is usually reserved for the breakpoint addresses. DR4 and DR5 are reserved and DR6 is the status register (which determines the type of debugging event triggered by a breakpoint).
Unlike soft and hard breakpoints, memory breakpoints are not true breakpoints. Rather, a memory breakpoint refers to the permissions on a region or page of memory set by a debugger.
memory page is the smallest portion of memory that an OS deals with.
Examples of memory page permissions:
  1. Page execution
  2. Page read
  3. Page write
  4. Guard page
The guard page permission allows us to separate the heap and the stack and can ensure that a chunk of memory does not grow past a given boundary. It does so by  throw a one-time exception whenever accessed and then returning to its original status.

Introduction to the Stack

A fundamental concept in computer science is the stack. Ironically, as coding and scripting becomes more and more removed from the underlying basic software and hardware operations, fewer and fewer code authors have an understanding of concepts such as this… but if nothing else you must’ve wondered where the name StackOverflow comes from, right?

The stack stores information about how a function is called, the parameters it takes, and how it should return after it is finished executing.

Some important points to keep in mind:

  • The stack is a First In, Last Out (FILO) structure
  • Arguments are pushed onto the stack for a function call and popped off the stack when the function is finished
  • The stack grows from high memory addresses to low memory addresses


CPU Registers: an Overview

Register: a small amount of storage on the CPU; the fastest method for a CPU to access data

In the x86 instruction set, a CPU uses 8 general-purpose registers:
  1. EAX — a.k.a. the accumulator register, used for performing calculations as well as storing return values
  2. EDX — a.k.a. the data register, basically an extension of EAX
  3. ECX — a.k.a. the count register, used for looping, counts DOWNWARD not upward
  4. ESI  — a.k.a. the source index, used for reading, holds the location of the input data stream
  5. EDI  — a.k.a. the destination index, used for writing, points to the location where the result is stored
  6. EBP  — a.k.a. the base pointer, used for managing function calls and stack operations, points to the bottom of the stack unless freed up from this function by the compiler, in which case it would be an extra general purpose register
  7. ESP  — a.k.a. the stack pointer, used for managing function calls and stack operations, points to the very top of the stack
  8. EBX  — an extra register, not designed for anything specific

    Another register worth mentioning is:
  9. EIP — a.k.a. the instruction pointer, points to the current instruction that is being executed; as binary code is being executed by the CPU the EIP is updated to reflect the location where the execution is occurring