Forensics & Model Rev
🧠 AI Forensics & Model Reverse Engineering for CTFs
“Every model leaves fingerprints — if you know where to look.”
This guide focuses on forensic, analytical, and reverse-engineering tasks involving AI models and machine learning artifacts, as seen in AI/ML or cybersecurity CTFs. You’ll learn how to identify architecture, recover metadata, inspect tensors, and perform controlled static and dynamic analysis.
I. 🧩 Common AI Forensics Challenge Types
Model Metadata Leak
Extract info from model configs
config.json, metadata.yaml
Weight Inspection
Find embedded flags or strings
.pt, .pth, .ckpt
Tokenizer Clues
Discover hidden vocabulary entries
vocab.json, merges.txt
Model Comparison
Detect fine-tuned or modified layers
Two .bin or .safetensors files
Embedding Analysis
Decode flag or phrase from vector
.npy, .faiss, .pkl
ONNX / TF Lite Inspection
Reverse compute graph
.onnx, .pb, .tflite
Inference Output Forensics
Detect watermark / dataset hint
API outputs, logits dumps
II. ⚙️ Essential Toolbelt
Frameworks
PyTorch, TensorFlow, Hugging Face Transformers
Inspect weights
torch.load(), safetensors, numpy
Model conversion
transformers-cli, onnxruntime, tf2onnx
Visualization
Netron, TensorBoard, Graphviz
Vector analysis
numpy, scipy, faiss, pandas
Metadata parsing
jq, jsonlint, grep, strings
Model fingerprinting
diffusers, hashlib, hf_transfer
Forensics sandbox
Jupyter + isolated venv
III. 🧱 Understanding Model Artifacts
Common File Types
pytorch_model.bin
Serialized PyTorch weights
model.safetensors
Safer binary weight format
config.json
Model architecture + parameters
tokenizer.json
Vocabulary and token mapping
vocab.txt
Plain text tokens
merges.txt
BPE merge rules
special_tokens_map.json
Start/end/pad token IDs
preprocessor_config.json
Normalization / feature extraction
training_args.bin
Fine-tuning arguments
.onnx
Cross-framework model representation
IV. 🧠 Weight File Analysis (PyTorch / SafeTensors)
Basic Inspection
import torch
model = torch.load("model.pth", map_location="cpu")
for k,v in model.items():
print(k, v.shape)SafeTensors
from safetensors.torch import load_file
tensors = load_file("model.safetensors")
for name, tensor in tensors.items():
print(name, tensor.shape)CTF Trick:
Hidden strings are sometimes stored in tensors as ASCII values.
import numpy as np
data = model['linear.weight'].numpy().astype(np.int8)
print(''.join(chr(abs(x)%128) for x in data[:300]))🧩 If you see gibberish resolving into flag{}, you’ve found a hidden payload.
V. 🔍 Config & Metadata Exploration
Inspect configuration:
cat config.json | jqLook for:
"architectures"→ model type (e.g.,GPTNeoForCausalLM)"hidden_size","num_layers""finetuning_task""dataset_name"(flag sources)"special_tokens"→ custom flag token"model_revision"or"commit_hash"
🧠 Sometimes the flag hides as a “custom token” in tokenizer files.
VI. 📦 Tokenizer Forensics
Check Token Files
cat vocab.txt | grep flag
grep -A2 -B2 "FLAG" tokenizer.jsonMerges File
Flags or hints might appear as:
f l
l a
a gor encoded Unicode sequences:
\u0066\u006c\u0061\u0067💡 Decode JSON escape sequences with Python’s unicode_escape codec.
VII. ⚙️ ONNX & TensorFlow Model Inspection
Convert to Graph View
netron model.onnxVisually check for:
Extra layers (
FlagLayer,HiddenDecoder)Custom op nodes (
CustomOp_1337)Embedded constant tensors (flags in graph)
Command-line metadata
onnxruntime.tools.convert_onnx_models_to_ortTensorFlow SavedModel
saved_model_cli show --dir ./model --allLook inside variables/variables.data-00000-of-00001 with strings.
VIII. 🔬 Embedding & Feature Vector Analysis
Load vector file
numpy.load('embeddings.npy')
Search for outliers
np.where(np.max(abs(x))>100)
Decode as ASCII
Interpret vector as byte values
Compare embeddings
Cosine similarity / Euclidean distance
CTF Example:
import numpy as np
emb = np.load("vec.npy")
print(''.join(chr(int(i)) for i in emb[:50]))IX. 🧩 Model Watermark & Fingerprinting
Statistical Watermarks
Bias added to token probabilities.
Lexical Watermarks
Preferred vocabulary patterns.
Structural Watermarks
Modified attention weights.
CTF Objective: Detect watermark pattern → decode → flag.
Detection via Frequency:
from collections import Counter
tokens = open("output.txt").read().split()
print(Counter(tokens).most_common(10))If output token frequencies spell a pattern → follow it.
X. 🧠 Comparative Diffing (Two Models)
Compare weights
torch.allclose(a,b) or np.allclose()
Compare config hashes
md5sum config.json
Structural diff
diff --side-by-side file1 file2
Output similarity
Run both models → compute cosine_similarity
CTFs often require spotting one altered layer or token.
XI. ⚔️ Model Dataset Clues
training_args.bin
Dataset paths, run names
config.json
"dataset_name", "task"
tokenizer.json
Custom words / dataset leaks
.cache dirs
URLs of original datasets
🧠 Flags sometimes appear as dataset IDs or pipeline parameters.
XII. 🧰 Forensic Automation Scripts
String Extractor
strings model.pth | grep -i flagPython Hex Dump
with open("model.pth","rb") as f:
data=f.read()
print(data[data.find(b"flag{"):data.find(b"}")+1])Tensor Inspector
import torch
state=torch.load("model.pth")
for k,v in state.items():
if v.ndim==1:
text=''.join(chr(int(x)%128) for x in v[:100])
if 'flag' in text:
print(k, text)XIII. 🧱 CTF Workflow Summary
1️⃣ Inspect model metadata and config
2️⃣ Search weights for ASCII-encoded data
3️⃣ Parse tokenizer / vocab files
4️⃣ Check ONNX / graph constants
5️⃣ Analyze embeddings or vectors
6️⃣ Compare model versions for subtle deltas
7️⃣ Extract and validate flag{...}XIV. 🧠 Common Pitfalls
Forgetting to check tokenizer merges (flags split into tokens).
Ignoring hidden
.bin.index.jsonfiles (index → flag).Missing Unicode escapes.
Misinterpreting float weights (need to cast to int8).
Overlooking special_tokens_map.json.
XV. ⚡ Pro Tips
Open models in Netron first — visual diff is faster than text grep.
Always check config + tokenizer together.
In PyTorch models, look for abnormal tensor shapes (1×N).
If flag not ASCII → try Base64 / Hex decode.
Use Jupyter to interactively test weight-to-text hypotheses.
Keep CTF artifacts under Git versioning — easier diffing later.
XVI. 🧩 Advanced CTF Scenarios
Fine-tuned Model
Compare to base model → recover training data
Cloned Model
Identify via watermark or bias fingerprint
RAG Index
Extract flag from vector DB entries
Steganographic Model
Flag hidden in unused parameters
Poisoned Model
Detect anomalous weights or layer order
XVII. 🧠 Educational Resources
Last updated
Was this helpful?