Skip to main content

AI-Powered Malware Debugger That Explains Every Function It Sees

Cover image

Article Metadata

Ecosystem Fit

This page mirrors the original Medium article into the 1200km.com Docusaurus ecosystem. The original article flow, images, screenshots, infographics, and technical blocks are preserved from the export.

How I combined Claude AI, Frida, Capstone, and a suite of static analysis engines into a reverse engineering tool that talks back

Article image

Introdaction

Malware reverse engineering is one of the most skill-intensive jobs in security. You sit with IDA Pro or Ghidra, stare at hundreds of unnamed functions full of obfuscated assembly, and try to build a mental model of what a threat actor’s code is actually doing. It takes years to get fast at it.

I wanted to change that. So I builtAIDebug— an open-source malware debugger that uses Claude AI to analyze every function it encounters, explain what it does in plain English, assign a risk level, and map it to a MITRE ATT&CK technique. In real time. And it now ships with FLIRT signature matching, automatic malware pattern detection, per-function control flow graphs, and live network traffic capture.

This article walks you through what the tool does, how it works architecturally, and how to run it yourself on a real malware sample.

GitHub - anpa1200/AIDebug: AI-Assisted Malware Reverse Engineering Debugger - step-by-step debugger… *AI-Assisted Malware Reverse Engineering Debugger - step-by-step debugger with Claude AI analysis of function…*github.com

Table of Contents

  • The Problem With Traditional Malware Analysis

  • What AIDebug Does

  • Architecture Deep Dive

  • Running It On a Real Sample

  • Installation

  • The TUI: Four Panels That Tell the Full Story

  • Ask the AI Follow-Up Questions

  • Dynamic Mode: What Happens at Runtime

  • Reporting and Export

  • Architecture Summary

  • Why Claude?

  • Conclusion

The Problem With Traditional Malware Analysis

When you open a stripped Windows PE in a disassembler, you’re greeted with hundreds of functions namedsub_401234. Your job is to:

  • Read the assembly

  • Understand what each function does

  • Name it

  • Repeat — for hours

The bottleneck is not intelligence, it’s throughput. An experienced analyst can only read assembly so fast. And when malware is packed, obfuscated, or uses indirect calls, even experienced analysts slow down dramatically.

There are also a dozen sub-tasks that eat time before you even get to the interesting code: separating compiler-generated CRT functions from hand-written malware code, identifying which functions are trivial wrappers, spotting XOR decryption loops before you waste 20 minutes trying to reverse them as normal code.

AI doesn’t replace the analyst — but it can act as an extremely fast co-pilot that reads assembly, pre-classifies behavioral patterns, draws the control flow, and gives you its interpretation in seconds, while you decide where to look next.

What AIDebug Does

AIDebug is a Python tool that runs a full pipeline on any PE or ELF binary:

  • Static analysis— PE/ELF parsing, section entropy, imports, strings

  • Recursive-descent function discovery— from entry point, following CALL targets

  • FLIRT signature matching— library functions identified and excluded from AI analysis

  • Malware pattern scanning— 8 behavioral patterns detected before AI runs

  • CFG construction— basic block decomposition per function

  • Claude AI analysis— disassembly + patterns + context sent to Claude for structured explanation

  • Frida dynamic instrumentation— optional runtime hooks, memory diffs, unpacking detection, network capture

The UI is a Textual TUI with three panels. The right panel has four tabs:

Article image

Architecture Deep Dive

Layer 1: Static Analysis

TheStaticAnalyzerclass handles PE and ELF files:

from
analysis
import
StaticAnalyzer
info = StaticAnalyzer().analyze(
'malware1.exe'
)
print
(info.arch)
# 'x86'
print
(info.entry_point)
# 0x401780
print
(info.imports)
# [ImportInfo(dll='KERNEL32.dll', functions=[...])]

For PE files it usespefileto extract architecture, image base, section names and entropy (> 7.0 flags packing), import table, export table, and both ASCII and UTF-16LE strings with virtual address mapping.

For ELF files it usespyelftoolswith full symbol table support — including RISC-V, useful for IoT malware like Mirai variants.

Layer 2: Disassembler + Enrichment Pipeline

TheDisassembleruses Capstone for recursive-descent function discovery. After finding all reachable functions, it runs an enrichment pass:

def

_run_enrichment
(
self, addresses
):
detector = PatternDetector()
flirt = FlirtMatcher(self.info)
for
addr
in
addresses:
func = self.functions[addr]
func.patterns = detector.detect(func)
# pre-AI pattern scan

match
= flirt.identify(func)

if

match
:
func.flirt_match =
match
func.is_library =
match
.skip_ai

Every function gets pattern detection and FLIRT matching before a single AI call is made. This front-loading means the AI prompt arrives pre-enriched — Claude sees what patterns were already found and can focus on deeper interpretation.

Layer 3: FLIRT Signature Matching

One of the biggest noise sources in PE analysis is compiler-inserted CRT code. Functions like_memset,_strlen,_malloc, and the whole C runtime startup chain appear in virtually every MSVC-compiled binary. Sending them all to Claude wastes tokens and clutters results.

AIDebug solves this with a lightweight FLIRT-style matching system:

**Strategy 1 — Import wrapper detection:**A function that’s justjmp [IAT_entry]is named after the imported API it wraps. This covers the vast majority of API call stubs in PE files.

**Strategy 2 — CRC16 prologue match:**The first 32 instruction bytes are hashed (with call target addresses zeroed out to make the hash position-independent) and looked up indata/flirt_sigs.json.

**Strategy 3 — Single-import call inference:**A function that calls exactly one imported API and immediately returns is named after that API.

**Strategy 4 — Trivial stub detection:**Functions with 3 or fewer instructions are marked as library stubs.

Result: the function list distinguishes clearly between library noise and actual malware logic:

[
LIB
]
0x40541c
_memset (
3
insns) ← skipped
by
AI
[
LIB
]
0x405674
_strlen (
8
insns) ← skipped
by
AI
[
CRIT
]
0x40bcb8
allocate_rwx_region (
31
insns) ← analyzed
by
AI
[
HIGH
]
0x4079b6
check_os_registry (
40
insns) ← analyzed
by
AI

Layer 4: Malware Pattern Detection

Article image

PatternDetectorscans every function's instruction list for 8 behavioral patterns before AI analysis runs. Detected patterns are:

  • **xor_decryption_loop**(HIGH): backward jump + XOR on a memory operand — the classic string/config decryption pattern

  • **stack_string**(MEDIUM): 4+ consecutivemov byte ptr [esp+N]— anti-string-scan technique

  • **api_hash_resolution**(HIGH): ROR/ROL + XOR loop — shellcode loader technique for resolving API names by hash

  • **rdtsc_timing_check**(MEDIUM/HIGH): RDTSC instruction — sandbox/VM timing evasion

  • **direct_syscall**(HIGH): SYSCALL / SYSENTER / INT 2E — EDR bypass via direct kernel entry

  • **nop_sled**(INFO): 5+ consecutive NOPs — shellcode alignment

  • **null_preserving_xor**(HIGH): test/jz/xor sequence — common in XOR-encoded shellcode to avoid null bytes

  • **base64_alphabet_reference**(MEDIUM): reference to a known Base64 alphabet string

These patterns are injected into the AI prompt, so Claude gets pre-flagged behavioral context rather than having to infer everything from raw assembly. The patterns also appear independently in the Patterns tab — no AI call required to see them.

Layer 5: Control Flow Graph

Article image

CFGBuilder.build(func)splits the function into basic blocks at branch/jump/ret boundaries and links blocks via successor/predecessor edges. The result is aCFGobject with a dict ofBasicBlockentries.

Two renderers:

  • **CFGTextRenderer**— renders to multi-line ASCII art with box-drawing characters for the TUI

  • **CFGSVGRenderer**— renders to a self-contained inline SVG for HTML reports, using a BFS layout algorithm

A real function frommalware1.exe(0x4015c2, 26 instructions,stack_stringpattern):

CFG:

6

basic

blocks
┌──



Block

0x004015c2

(12

insns)

──


0x004015c2:

push

ebp


0x004015c3:

mov

ebp,

esp


0x004015c5:

sub

esp,

0x4c




(9

more)
└──



0x004015e9
,

0x004015e1
┌──



Block

0x004015e1

(2

insns)

──


0x004015e1:

xor

eax,

eax


0x004015e3:

ret
└──
[
RET
]

The CFG shows immediately what the branching structure looks like without reading every instruction.

Layer 6: AI Analysis

Article image

This is the core of the tool. For each non-library function, we build a structured prompt that includes binary metadata, the full import table, the disassembly, referenced strings, cross-references — and now the pre-detected patterns:

BINARY INFO:

File :

malware1.exe

Arch :

x86

32
-bit

OS Target :

Windows
KNOWN IMPORTED APIs:

KERNEL32.dll:

WaitForSingleObject,

LoadLibraryA,

...

ADVAPI32.dll:

CryptImportKey,

CryptDecrypt,

...
FUNCTION ADDRESS:

0x40bcb8
DISASSEMBLY

(31

instructions):
0x0040bcb8:

push

ebp
...
0x0040bce4:

call

NtAllocateVirtualMemory
REFERENCED STRINGS:

"NtAllocateVirtualMemory"
PRE-DETECTED PATTERNS:
[
HIGH
]
xor_decryption_loop:

xor

byte

ptr
[
eax
]
,

cl

at

0x40bcd1

Claude returns structured JSON:

{

"suggested_name"
:

"allocate_rwx_region"
,

"summary"
:

"Resolves NtAllocateVirtualMemory dynamically and allocates RWX memory. NT API used to bypass EDR hooks on VirtualAlloc."
,

"parameters"
:

[

{
"name"
:

"size"
,

"type"
:

"ULONG_PTR"
,

"description"
:

"Size of region"
}
,

{
"name"
:

"protect"
,

"type"
:

"ULONG"
,

"description"
:

"0x40 = PAGE_EXECUTE_READWRITE"
}

]
,

"return_value"
:

"Pointer to allocated RWX region, or NULL"
,

"behaviors"
:

[

"Direct NT syscall — bypasses EDR hooks on VirtualAlloc"
,

"RWX memory allocation — shellcode staging indicator"

]
,

"mitre_technique"
:

"T1055.001 - Process Injection: DLL Injection"
,

"risk_level"
:

"CRITICAL"
,

"notes"
:

"Check callers for WriteProcessMemory or CreateRemoteThread after this call."
}

That note —“Check callers for WriteProcessMemory”— is the kind of contextual intelligence that saves an analyst 20 minutes of cross-referencing.

Layer 7: Dynamic Instrumentation (Frida)

In dynamic mode the tool spawns the binary and loads three Frida scripts simultaneously:

**tracer.js**— hooks 80+ Win32 APIs and logs every call with auto-dereferenced string arguments.

**unpack_detector.js**— hooksVirtualAlloc,VirtualProtect, andNtProtectVirtualMemory. When a region transitions from RWX to R-X (the unpacking stub has finished writing and is handing control to the unpacked code), the script scans the region forpush ebp; mov ebp, espprologues to hint at the OEP:

// Scan for PE-like prologue in newly-executable region
for
(
var
i =
0
; i < size -
2
; i++) {

var
b0 = mem[i], b1 = mem[i+
1
], b2 = mem[i+
2
];

if
(b0 ===
0x55
&& b1 ===
0x8B
&& b2 ===
0xEC
) {
oepHint = ptr(baseAddr).
add
(i).toString();

break
;
}
}

**network_tracer.js**— hooks Winsock (connect,send,recv,sendto,recvfrom) and WinInet (InternetOpenUrl,HttpSendRequest,InternetReadFile). Captures actual buffer bytes up to 512 bytes as hex strings, and parsessockaddrstructs to extract IP and port:

// Parse sockaddr for IP:port
var
family = sockaddr.readU16();
if
(family ===
2
) {
// AF_INET

var
port = ((sockaddr.
add
(
2
).readU8() <<
8
) | sockaddr.
add
(
3
).readU8());

var
ip = [
0
,
1
,
2
,
3
].map(function(i) {

return
sockaddr.
add
(
4
+i).readU8();
}).
join
(
'.'
);
send({type:
'network'
,
event
:
'connect'
, ip:ip, port:port, ...});
}

The result is C2 protocol reconstruction without needing a separate network capture tool — all captured data ends up in the Network tab of the TUI and in thenetwork_eventsdatabase table.

Theengine.pyalso captures 64-byte memory snapshots at pointer-valued registers on function entry, then reads the same regions again on exit to produce a per-function memory diff.

Layer 8: Persistence (SQLite)

Five tables intraces.db:

  • sessions— binary metadata per analysis run

  • function_traces— disassembly + AI analysis JSON per function

  • api_calls— Win32 API call log from dynamic mode

  • detected_patterns— malware pattern results per function

  • network_events— network events from dynamic mode

Re-running the tool on the same binary is instant — no repeat API calls for already-analyzed functions.

Running It On a Real Sample

Here’s what happens when I run this onmalware1.exe.

Step 1: Static fingerprint

[*]

Format
:
PE

x86

32
-bit
(Windows)
[*]

EntryPoint
:
0
x401780
[*]

Sections
:
[
'.text'
,
'.rdata'
,
'.data'
,
'.reloc'
,
'dhqj'
]
[*]

Imports
:
89

functions

from

8

DLLs
[!]

Possible

packing
:
[
'dhqj'
]
(entropy >
7.0
)

Immediate red flags:

  • Section**dhqj**— non-standard section name, custom packer

  • Entropy > 7.0— packed or encrypted content

  • Imports:Secur32.dll(SSP manipulation),ADVAPI32.dllwithCryptDecrypt/CryptImportKey,ntdll.dllNT-native calls

Step 2: Function discovery + enrichment

25 functions found. After FLIRT matching: 3 are library functions (_memset,_strlen, a CRT stub) and are skipped. Pattern detection fires on 2 functions before AI runs:

[FLIRT]
sub_00405412 → _memset (msvcrt) — skipped

[FLIRT]
sub_00405674 → _strlen (msvcrt) — skipped

[PAT ]

0
x004015c2 → stack_string (MEDIUM)

[PAT ]

0
x00401000 → xor_decryption_loop (HIGH)

I already know where to look before Claude runs a single analysis.

Step 3: Key findings from AI analysis

**0x40bcb8****allocate_shellcode_region**[CRITICAL]

> Calls NtAllocateVirtualMemory directly (bypassing EDR hooks on VirtualAlloc) to allocate RWX memory. The 0x40 protection constant is PAGE_EXECUTE_READWRITE. MITRE: T1055.001

**0x4079b6****check_os_and_open_registry**[HIGH]

> Queries Windows version (“Windows 11” check) and opens a registry key via NtOpenKey. The FallbackGUID string suggests persistence — writing a GUID-keyed run key. MITRE: T1547.001 — Registry Run Keys

**0x4015c2****build_stack_string**[MEDIUM]

> Constructs a string on the stack byte-by-byte ( stack_string pattern confirmed by pre-analysis). This technique evades static string scanners. The constructed value is likely a registry path, URL, or filename. MITRE: T1140 — Deobfuscate/Decode Files or Information

**0x401000****xor_decode_config**[HIGH]

> XOR decryption loop with a hardcoded key. Likely decoding an embedded C2 address or configuration blob.

Within minutes — without writing a single IDA script — we have a threat profile: RWX allocation via NT syscalls, registry persistence, stack-string obfuscation, and an XOR-encoded C2 config.

Installation

git
clone
https://github.com/anpa1200/AIDebug
cd
AIDebug
pip install -r requirements.txt
export
ANTHROPIC_API_KEY=sk-ant-...

Run in TUI mode

python
main
.py

--binary
/path/
to
/sample
.exe

Run in CLI mode (headless, good for scripting)

python
main
.py

--binary
sample
.exe

--no-tui

Article image

Generate HTML report in one shot

python
main
.py

--binary
sample
.exe

--no-tui

--report

Article image

Dynamic mode (requires Frida)

# Linux with Wine
python
main
.py

--binary
sample
.exe

--mode
dynamic
# Attach
to
running process
python
main
.py

--binary
sample
.exe

--mode
dynamic
--pid

4521

The TUI: Four Panels That Tell the Full Story

Article image

The right panel has four tabs that work together to give you a complete picture of any function without leaving the terminal.

AI Analysis tab— Claude’s structured output: suggested name, 2–3 sentence summary, parameters, return value, behaviors, MITRE technique, analyst notes.

CFG tab— The function’s control flow graph as ASCII art. You see immediately whether you’re looking at a simple linear function or a complex loop-with-branches before reading a single instruction.

CFG: 6 basic blocks
┌── ◆ Block 0x004015c2 (12 insns) ──
│ 0x004015c2: push ebp
│ …
└── → 0x004015e9, 0x004015e1
┌── ◆ Block 0x004015e9 (6 insns) ── ← loop body
│ 0x004015e9: mov al, [ebp-0x3c+ecx]
│ 0x004015ef: xor al, 0x41
│ …
└── → 0x004015e9, 0x004015fb ← loops back

Patterns tab— Pre-detected behavioral patterns. Available immediately, no AI needed:

[HIGH] xor_decryption_loop @
0
x004015ed

XOR

loop

on
memory
with
backward branch
Evidence:
xor

byte
ptr [esi+ecx], al; jne
0
x4015e9
[MED ] stack_string @ 0x004015c2
4+ consecutive byte-by-byte stack writes
Evidence: mov byte ptr [ebp-0x3c], 0x68

Network tab— Live network events in dynamic mode:

connect

connect

192.168
.
1.105
:
4444

0
bytes
send

send

192.168
.
1.105
:
4444

128
bytes
recv

recv

192.168
.
1.105
:
4444

64
bytes

Ask the AI Follow-Up Questions

With a function selected, type questions at the bottom bar:

  • “What protection constant should I look for to confirm it’s RWX?”

  • “Why use NtAllocateVirtualMemory instead of VirtualAlloc?”

  • “What should I look at next to confirm process injection?”

  • “Write a YARA rule for this function’s behavior”

  • “Is the XOR key hardcoded or derived at runtime?”

The AI has the full function context and conversation history. This is closer to having a senior analyst sitting next to you than using a static analysis tool.

Dynamic Mode: What Happens at Runtime

When you run with--mode dynamic, three things happen in parallel as the process executes:

**1. Per-function register snapshots.**Each hooked function firesonEnterandonLeavecallbacks. The JS hook reads all general-purpose registers plus 128 bytes of stack at entry, and re-reads pointer-valued registers at exit to compute a memory diff. The snapshot is fed to Claude as runtime context.

**2. Unpacking detection.**The detector watches VirtualProtect calls. When a region that was RWX becomes R-X, the engine knows unpacking just finished:

[
Unpack
] RWX allocation detected @
0x00870000
size=
65536
[
Unpack
] *** UNPACKING COMPLETE ***
[
Unpack
] Region :
0x00870000
[
Unpack
] OEP hint:
0x00870010
new_protect=
0x20

This tells you exactly where to re-disassemble after the stub finishes.

**3. Network capture.**Every connect/send/recv call is captured with the actual bytes. The Network tab fills up as the malware tries to reach its C2:

connect

connect

192.168
.
1.105
:
4444

0
bytes
send

send

192.168
.
1.105
:
4444

128
bytes ← beacon
recv

recv

192.168
.
1.105
:
4444

64
bytes ← response

All of this is saved to the database so you can review it after the session ends.

Reporting and Export

After analysis, generate reports directly:

#
HTML
report (self-contained, dark theme, CFG SVGs embedded)
python
main
.py

--binary
sample
.exe

--no-tui

--report
# YARA rules for HIGH/CRITICAL functions
python
main
.py

--binary
sample
.exe

--no-tui

--yara
# JSON export for SIEM/SOAR
python
main
.py

--binary
sample
.exe

--no-tui

--json-export
#
All
three, custom output directory
python
main
.py

--binary
sample
.exe

--no-tui

--report

--yara

--json-export

--out-dir
./reports/

The HTML report includes an interactive sidebar with all functions sorted by risk, and each function’s detail page shows:

  • AI summary, MITRE tag, behaviors, parameters

  • Detected patterns sectionwith severity-coded entries

  • Inline CFG SVG— the full control flow graph embedded directly in the page

  • Color-coded disassembly

Article image

Architecture Summary

malware
.exe


┌────────────────────────────────────────────────────────────┐

StaticAnalyzer
(pefile / pyelftools) │
│ →
BinaryInfo
:
arch
,
sections
,
imports
,
strings
,
entropy

└────────────────────┬───────────────────────────────────────┘


┌────────────────────────────────────────────────────────────┐

Disassembler
(Capstone) │
│ →
Function

objects
:
instructions
,
calls_to
,
strings_ref

│ │

Enrichment

pipeline
(runs on all functions): │
│ ├──
FlirtMatcher

is_library
,
flirt_match

│ └──
PatternDetector

patterns

[]

└────────────────────┬───────────────────────────────────────┘

┌──────────┴──────────┐
│ (static) │ (dynamic, optional)
▼ ▼
│ ┌─────────────────────────────┐
│ │
DebugEngine
(Frida) │
│ │
tracer
.js

API

calls

│ │
unpack_detector
.js

OEP

│ │
network_tracer
.js

C2

I
/
O

│ │
hook_function

snapshots

│ └─────────────┬───────────────┘
│ │
▼ ▼
┌────────────────────────────────────────────────────────────┐

CFGBuilder

CFG
(BasicBlock dict) │
└────────────────────┬───────────────────────────────────────┘


┌────────────────────────────────────────────────────────────┐

AIAnalyzer
(Claude claude-opus-
4
-
6
) │

Input
:
disassembly
+
imports
+
strings
+
patterns

│ +
xrefs
+
snapshot
(if dynamic) │

Output
:
name
,
summary
,
risk
,
MITRE
,
params
,
notes


Library

functions
(FLIRT match) →
skipped

└────────────────────┬───────────────────────────────────────┘


┌────────────────────────────────────────────────────────────┐

TraceStore
(SQLite) │

sessions

function_traces

api_calls


network_events

detected_patterns

└────────────────────┬───────────────────────────────────────┘


┌────────────────────────────────────────────────────────────┐

Textual

TUI

│ ┌──────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │
Function
│ │
Disassembly
│ │
[AI Analysis]
[CFG ]
│ │
│ │
List
│ │ +
Regs
│ │
[Patterns ]
[Network]
│ │
│ └──────────┘ └──────────────┘ └──────────────────────┘ │
│ +
Chat

bar

└────────────────────┬───────────────────────────────────────┘


┌────────────────────────────────────────────────────────────┐

Reporting


html_report
.py

.html
(CFG SVG + patterns embedded) │

yara_generator
.py

.yar
(AI-generated rules) │

json_export
.py

.json
(SIEM/SOAR schema v1) │
└────────────────────────────────────────────────────────────┘

Why Claude?

I tried several models for the structured JSON output. Claude was the only one that consistently:

  • Returns valid JSON without markdown fences leaking through

  • Correctly identifies subtle evasion techniques (NT API usage, timing checks)

  • Writes accurate MITRE ATT&CK technique mappings

  • Provides genuinely useful analyst notes, not just restating the disassembly

  • Handles follow-up questions with full context awareness

  • Integrates pre-detected pattern context into its analysis rather than ignoring it

Theclaude-opus-4-6model has strong assembly comprehension. It correctly identifies x86 calling conventions, recognizes common compiler idioms, and understands the difference between a compiler-generated prologue and a hand-written shellcode stub. When you inject pattern context — "this function has a XOR decryption loop at 0x4015ed" — it builds on that rather than re-deriving it from scratch.

Conclusion

AIDebug is not a replacement for IDA Pro or a seasoned reverse engineer. It’s a force multiplier. FLIRT matching removes the library noise. Pattern detection front-loads the behavioral classification. The CFG makes the structure visible at a glance. And Claude’s contextual analysis fills in the meaning — what the function does, why it matters, and where to look next.

The combination gets you from “unknown packed PE” to a prioritized threat profile in minutes rather than hours.

The full source is athttps://github.com/anpa1200/AIDebug.

If you’re working in threat intelligence, incident response, or malware research — try it on your next sample and let me know what you find.

All analysis in this article was performed in an isolated VM environment on samples used for security research. Always analyze malware in a properly isolated sandbox.