Programming lesson
Mastering Malware Analysis: Reverse Engineering and Dynamic Binary Instrumentation for Real-World Samples
Learn how to analyze real-world malware using static analysis, symbolic execution, and dynamic binary instrumentation. This tutorial covers reverse engineering with Ghidra, triggering hidden behaviors with angr, and using DynamoRIO to manipulate execution paths.
Introduction to Malware Analysis in 2026
In today's cybersecurity landscape, malware continues to evolve, often hiding its true intentions until triggered by specific commands. As a malware analyst, your mission is to uncover these hidden behaviors to build robust defenses. This tutorial, inspired by a project similar to CS6264, guides you through static analysis, symbolic execution, and dynamic binary instrumentation using tools like Ghidra, angr, and DynamoRIO. Whether you're reversing a MyDoom variant or a novel C2-based threat, these techniques are essential for understanding malware's full capability.
Setting Up Your Malware Analysis Environment
Before diving into analysis, you need a safe, isolated environment. Use VirtualBox to import a pre-configured VM (e.g., Ubuntu 18.04 with a Windows 7 guest). This setup ensures malware cannot escape and affect your host system. Shared folders (e.g., VBOXSVR) allow easy transfer of samples and tools. Remember: never run malware on your personal machine.
Prerequisites
- VirtualBox (latest version)
- Windows 7 VM for dynamic analysis
- Ubuntu 18.04 VM for static analysis and tool execution
- Ghidra (NSA's reverse engineering framework)
- angr (symbolic execution engine)
- DynamoRIO (dynamic binary instrumentation framework)
Static Analysis with Ghidra: Finding the CMD Dispatching Logic
Static analysis is your first step. Using Ghidra, you can disassemble the binary and identify key functions. In many malware samples, the CMD dispatching logic is a function that parses incoming commands and decides which malicious action to execute. For example, a MyDoom variant might check for strings like "exec" or "download".
Reverse Engineering Candidate Functions
Your company's static analysis tool may flag several candidate functions. You must manually reverse each one. Look for conditional branches, string comparisons, and function pointers. In Ghidra, use the decompiler to view high-level C-like code. Often, the dispatcher will have a switch-case structure or a series of if-else statements comparing input against hardcoded values.
Tip: Rename variables and functions in Ghidra to make the logic clearer. This is a common practice in reverse engineering.
Symbolic Execution with angr: Extracting Trigger Commands
Once you've identified the CMD dispatching logic, you can use symbolic execution to automatically discover the commands that trigger each behavior. angr allows you to treat input as symbolic and explore paths that lead to specific code regions.
Writing an angr Script
For example, to find the command that triggers a malicious behavior at address 0x401234, you can write a script that:
- Loads the binary into angr's project.
- Sets up a symbolic stdin or argument.
- Explores execution until it reaches the target address.
- Concretizes the symbolic input to get the exact command.
import angr
p = angr.Project('malware.exe', auto_load_libs=False)
state = p.factory.entry_state()
simgr = p.factory.simulation_manager(state)
target = 0x401234
simgr.explore(find=target)
if simgr.found:
found_state = simgr.found[0]
command = found_state.posix.dumps(0)
print('Command found:', command)
This script reveals the input that leads to the hidden behavior. In a real scenario, you might need to refine the exploration strategy to avoid infinite loops or unsupported instructions.
Dynamic Binary Instrumentation with DynamoRIO: Triggering and Tracing
With the command in hand, you can now execute the malware in a controlled environment and observe its effects. DynamoRIO lets you instrument the binary at runtime, modify execution paths, and collect detailed traces.
Setting Up DynamoRIO
On the Windows 7 VM, navigate to C:\code\dynamorio. Build the DynamoRIO client (e.g., a basic block tracer) and run the malware with the discovered command. The client can log every API call, file operation, or registry modification.
# Example command to run malware with tracer
C:\code\concrete_executor\run.py malware.exe --command "EXEC"
Collecting Dynamic Traces
The tracer will generate a trace file showing the sequence of basic blocks executed and API calls made. Analyze this trace to understand what the malware does: does it create a file? Connect to a network? Modify registry keys? This evidence is crucial for your report.
Case Study: Analyzing a MyDoom-like Sample (malware1.exe)
Assume malware1.exe is similar to a previously analyzed sample. Your colleague's report may have identified the CMD dispatching logic but failed to demonstrate OS effects. Using the techniques above, you can:
- Reverse the binary to confirm the dispatcher function.
- Use angr to extract commands like "OPEN", "DELETE", or "EMAIL".
- Run the malware with each command under DynamoRIO and capture file system changes.
For instance, the "EMAIL" command might cause the malware to enumerate email addresses from the local machine and send spam. The trace would show calls to MAPI functions and network sockets.
Advanced Analysis: Unknown Malware and Bonus Challenges
For a newly discovered complex malware (unknown.exe), you may need to perform deeper static analysis on multiple functions. The goal is to reconstruct the CMD dispatching logic from scratch. Use Ghidra's graphing features to visualize control flow and identify the main command handler.
Bonus: Whole or Partial Input Detection
If you can determine the entire input structure (e.g., a packet format), you can simulate C2 communication. Symbolic execution can help uncover constraints on input fields. For example, the malware might expect a header with a magic number followed by a command byte.
Triggering Hidden Behaviors via DBI
Even without knowing the exact command, you can use DynamoRIO to force execution down specific paths. By modifying register values or memory at key decision points, you can bypass checks and activate malicious routines. This is useful when symbolic execution is too complex.
Connecting to Current Trends: AI and Malware
In 2026, AI-generated malware is increasingly common. Attackers use large language models to craft polymorphic code that evades signature detection. However, the fundamentals of malware analysis remain the same: static analysis to understand structure, symbolic execution to explore logic, and dynamic instrumentation to observe behavior. By mastering these techniques, you stay ahead of threats, whether they are classic worms or AI-driven trojans.
Conclusion
Malware analysis is a blend of art and science. By combining reverse engineering, symbolic execution, and dynamic binary instrumentation, you can uncover the full extent of a malware's capabilities. This tutorial has provided a practical workflow for analyzing real-world samples like those in CS6264. Remember to always work in a VM, document your findings, and share your insights with the security community.
For further reading, explore angr's documentation and DynamoRIO's sample clients. Practice on public malware datasets (e.g., VirusShare) to hone your skills.