Tools
T1204Ghidra
A practical guide to Ghidra for SOC analysts and malware reverse engineers — installation, the decompiler, disassembly analysis, scripting with Python/Java, collaborative analysis, and using Ghidra for malware triage.
View on Graph
What Ghidra Is and Why Analysts Use It
- Ghidra is a reverse engineering (RE) framework developed by the NSA’s Research Directorate and released as open source in 2019. It includes a disassembler, decompiler, debugger (via GDB integration), and a scripting environment.
- MITRE ATT&CK maps reverse engineering to supporting
T1204(User Execution) analysis — understanding what a malware sample does requires seeing the actual code, not just observing its behavior. - Where IDA Pro is the commercial standard (costing thousands per license), Ghidra is completely free and its decompiler is widely considered comparable to IDA’s Hex-Rays decompiler — the key difference being that Ghidra’s decompiler is included, not a paid add-on.
- Ghidra’s collaborative server lets a team of analysts work on the same binary simultaneously — annotations, function names, and comments sync in real time.
Installation and Setup
System Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| RAM | 4 GB | 16 GB (larger binaries need more) |
| JDK | JDK 17 | JDK 21 |
| Disk | 2 GB | 10 GB (script outputs, analysis caches) |
| OS | Windows, macOS, Linux | Linux for best performance |
Installation
# Linux — download and extract
wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip
unzip ghidra_11.2_PUBLIC_20241105.zip
cd ghidra_11.2_PUBLIC/
# Run Ghidra
./ghidraRun
# macOS
brew install --cask ghidra
# Windows — download the zip and run ghidraRun.bat
First Launch
- Create a new project (Non-Shared for single-user, Shared for collaborative)
- Import a binary (PE, ELF, Mach-O, or raw binary)
- Ghidra runs auto-analysis — this takes 30 seconds to several minutes depending on binary size
Key Features — Disassembler and Decompiler
Ghidra’s two most important views are the Listing (disassembly) and the Decompiler (pseudocode).
Listing (Disassembly)
The listing view shows the raw assembly instructions. Ghidra annotates addresses, opcodes, and operands, and uses color coding to distinguish code, data, and undefined bytes.
Address Bytes Instruction Comment
00401000 55 PUSH EBP ; Save base pointer
00401001 8B EC MOV EBP, ESP ; Set up stack frame
00401003 83 EC 0C SUB ESP, 0Ch ; Allocate local variables
00401006 68 00 40 42 00 PUSH 0x424000 ; Push address of string
0040100B E8 10 00 00 00 CALL printf ; Call printf
00401010 33 C0 XOR EAX, EAX ; Return 0
00401012 8B E5 MOV ESP, EBP ; Restore stack
00401014 5D POP EBP ; Restore base pointer
00401015 C3 RET ; Return
Decompiler
The decompiler converts assembly into a C-like pseudocode. This is Ghidra’s killer feature — reading decompiled code is far faster than reading assembly.
// Decompiled Windows API call pattern
void entry(void) {
int iVar1;
HANDLE hProcess;
LPVOID lpBaseAddress;
HANDLE hThread;
hProcess = OpenProcess(PROCESS_ALL_ACCESS, 0, 0x1234);
if (hProcess != (HANDLE)0x0) {
lpBaseAddress = VirtualAllocEx(hProcess, (LPVOID)0x0, 0x100, 0x3000, 0x40);
WriteProcessMemory(hProcess, lpBaseAddress, &shellcode, 0x100, (SIZE_T *)0x0);
hThread = CreateRemoteThread(hProcess, (LPSECURITY_ATTRIBUTES)0x0, 0,
(LPTHREAD_START_ROUTINE)lpBaseAddress, (LPVOID)0x0, 0, (LPDWORD)0x0);
if (hThread != (HANDLE)0x0) {
WaitForSingleObject(hThread, 0xFFFFFFFF);
}
}
return;
}
This decompiled output reveals: the malware opens another process, allocates memory, writes shellcode, and creates a remote thread — classic Process Injection (T1055.001).
Scripting Ghidra with Python (Jython) and Java
Python Scripting (Jython)
Ghidra uses Jython (Python on the JVM) for scripting. Scripts can automate analysis, extract data, or modify the program database.
# GetCurrentFunction.py — List all functions in the current binary
from ghidra.program.model.listing import Function
fm = currentProgram.getFunctionManager()
functions = fm.getFunctions(True)
print(f"Functions in {currentProgram.getName()}:")
for func in functions:
print(f" {func.getName()} @ 0x{func.getEntryPoint().toString()}")
# ExportStrings.py — extract all defined strings with locations
from ghidra.app.util.importer import AutoImporter
listing = currentProgram.getListing()
data_iter = listing.getDefinedData(True)
for data in data_iter:
if data.isString():
print(f"0x{data.getAddress().toString()}: {data.getDefaultValueRepresentation()}")
# FindMutexAPI.py — Find all calls to CreateMutex / CreateMutexEx
from ghidra.app.script import GhidraScript
from ghidra.program.model.symbol import SymbolType
fm = currentProgram.getFunctionManager()
mutex_functions = []
for symbol in currentProgram.getSymbolTable().getAllSymbols(True):
if symbol.getName() in ["CreateMutexA", "CreateMutexW", "CreateMutexExA", "CreateMutexExW"]:
if symbol.getSymbolType() == SymbolType.FUNCTION:
func = fm.getFunctionAt(symbol.getAddress())
if func:
mutex_functions.append(func.getName())
print(f"Mutex-related functions found: {len(mutex_functions)}")
Java Scripting
Java scripts have full access to Ghidra’s API and are faster than Python scripts:
// FindStrings.java — Find all string references in the binary
import ghidra.app.script.GhidraScript;
import ghidra.program.model.listing.*;
import ghidra.program.model.address.*;
import ghidra.util.*;
public class FindStrings extends GhidraScript {
@Override
public void run() throws Exception {
Listing listing = currentProgram.getListing();
DataIterator dataIter = listing.getDefinedData(true);
println("Strings found in " + currentProgram.getName() + ":");
while (dataIter.hasNext()) {
Data data = dataIter.next();
if (data.isString()) {
println(data.getAddress() + ": " + data.getDefaultValueRepresentation());
}
}
}
}
Malware Analysis Workflow with Ghidra
Step 1 — Initial Import and Auto-Analysis
- Create new project
- Import the binary (Ghidra detects PE, ELF, Mach-O)
- Run auto-analysis — selects appropriate analyzers
- Review auto-analysis results (function discovery, stack analysis, data reference creation)
Step 2 — Identify Key Functions
Look for functions that import known malicious API calls:
| API Call | Suspicious Use | Technique |
|---|---|---|
VirtualAllocEx + WriteProcessMemory + CreateRemoteThread | Process injection | T1055.001 |
CreateFileA + WriteFile + DeleteFileA | Dropping and deleting self | Persistence / defense evasion |
CryptEncrypt, CryptDecrypt | Encrypting or decrypting payloads | Defense evasion |
URLDownloadToFileA | Downloading secondary payload | T1105 |
RegSetValueExA to Run key | Persistence via registry | T1547.001 |
WNetAddConnection2A | Lateral movement | T1021 |
CreateProcessWithLogonW | Running commands as another user | Credential abuse |
Step 3 — Trace the Execution Flow
Use Ghidra’s Function Call Trees and Cross References (complement with Volatility for memory-level validation):
- Find
entry()orWinMain - Right-click → “References → Show References to Function”
- Trace the call tree — which functions call which
- Look for encryption/decryption loops (XOR, AES, RC4)
Step 4 — Extract IOCs
| IOC Type | Where to Find in Ghidra | How to Check |
|---|---|---|
| C2 URLs | Data section, string table | Search for http:// in defined strings (cross-ref with YARA signatures) |
| IP addresses | Data section or stack manipulation | Check integer constants pushed before connect() or send() |
| Mutex names | String table, then trace cross references | Strings called with CreateMutex or OpenMutex |
| Registry keys | String table | Strings passed to registry API calls |
| Encryption keys | Stack variables, data section | Constants pushed before CryptEncrypt or custom XOR loops (decode with CyberChef) |
| File paths | String table | Strings referenced near file I/O API calls |
Collaborative Analysis — Ghidra Server
Ghidra includes a server for multi-analyst collaboration:
# Start the Ghidra server (on the server machine)
./server/svrAdmin
./server/ghidraSvr
| Feature | What It Does |
|---|---|
| Real-time sync | Multiple analysts see changes as they happen (function names, comments, types) |
| Check-in/out | Prevent conflicts — exclusive access to modified functions |
| Version history | Full revision history — see who changed what and when |
| User authentication | Username/password, PKI, or LDAP |
Ghidra vs IDA Pro — When to Use Which
| Feature | Ghidra | IDA Pro |
|---|---|---|
| Price | Free | $1,700+ (Pro); $5,000+ (Enterprise) |
| Decompiler | Included (same quality as Hex-Rays) | $1,200 add-on (Hex-Rays) |
| GUI | Java-based (can be slow on large databases) | Native, more responsive |
| Scripting | Python (Jython) + Java | Python (IDAPython) + C++ |
| Collaboration | Built-in server | IDA Team (paid add-on) |
| Debugger | GDB integration only | Built-in debugger (WinDbg, GDB, Bochs) |
| Mobile/embedded | Growing support | Mature support |
| Community | Active but smaller | Large, mature community |
| Best for | Budget-conscious teams, collaborative analysis, decompiler-first workflow | Single-analyst deep RE, performance-sensitive analysis, embedded/mobile |
Related
- REMnux — detection and response for T1204 techniques
- EDR Basics — detection and response for T1059, T1003, T1055, T1204, T1562 techniques
- Indicators: IoC, IoA, and TTP — covers the indicators: ioc, ioa, and ttp concepts
- Kill Chain — covers the kill chain concepts
- Log Sources Overview — covers the log sources overview concepts
