Case Study: Curing a Multi-Revision PDF False-Positive Class with HADES v1.4.5

Published May 2026 • DarkHorse InfoSec • HADES v1.4.5

TL;DR

A customer scanning the DarkHorse HADES Whitepaper PDF on Pro tier saw it flagged Score 80 HIGH. The file was a legitimate marketing document, Adobe-signed, structurally clean, no active content. The detection engine fired three structural-anomaly findings (incremental update + object streams + shadow page replacement) at HIGH severity, unconditionally. We tried a static dampening predicate (no dangerous tokens, no JavaScript, no embedded executables = dampen severity), watched it regress 5 known-malicious PDFs from the MalwareBazaar Gootloader and Lazarus clusters from HIGH to SAFE, then refactored to a threat-intel-gated dampening decision. We also caught a separate regression along the way: 3 ZIP-wrapped Gootloader LNK droppers were silently dropping from CRIT to SAFE under the v1.4.5 OneNote scope guard, which we cured with full ZIP per-member recursion (Track F).

Honest scope: v1.4.5 cures the structural-anomaly PDF FP class for documents that are threat-intel-clean. Dampening fires only when the engine's threat-intelligence block returns known_malware=False; known-malicious PDFs (Gootloader, Lazarus, Pdfka, generic CVE-2010-0188 / CVE-2009-0927) preserve their full severity because intel hits are dispositive. A subset of zero-day malicious PDFs with structurally clean shapes will still be dampened (zero-days by definition aren't in intel feeds). That residual is a known trade-off: the partial cure is bounded by the threat-intel surface, not by our detection-engine logic. v1.6+ work on a portal-side prevalence oracle is the architectural follow-up.

What happened

A customer-facing instance of HADES v1.4.4 was scanning the published DarkHorse HADES Whitepaper PDF (a marketing document, freely available on darkhorseinfosec.com). The Pro-tier scan reported Score 80 HIGH. The file was:

About 300 KB on disk.
Generated by Adobe Acrobat with an incremental save (one revision + one update revision).
Using PDF 1.5 compressed object streams (/ObjStm + /XRefStm) for size.
Containing no /JS, /JavaScript, /Launch, /EmbeddedFile, /AA, /OpenAction, or any other active-content token.

The findings panel showed three structural-anomaly findings, each at HIGH severity: incremental_update (severity 5.0), object_streams (severity 4.0), and shadow_page_replacement (severity 8.0). Stacked, they pushed the aggregate score to 80. The customer's reasonable reading: "if the rules fire on a legitimate Adobe-signed marketing PDF, the rules are tuned wrong."

What we investigated (and what was easy to miss)

The first attempt at a cure was the obvious one: dampen the severity of those three findings when no dangerous-token co-presence exists. The reasoning was clean: if a PDF has incremental updates AND object streams AND shadow page replacement BUT contains no /JS, /Launch, etc., it's a benign multi-revision Adobe document. We called this Track A v1. It cured the Whitepaper FP perfectly. It also passed every dedicated unit test we wrote.

G5 corpus replay against the 3,271-file MalwareBazaar corpus then revealed Track A v1 had moved 5 real-world malicious PDFs from HIGH (73) to SAFE: two Gootloader droppers, one Lazarus loader, and two generic obfuscated PDFs. We pulled the files apart with pikepdf and discovered the bad news: these 5 samples are structurally indistinguishable from the customer Whitepaper. Same %%EOF count (2, no coincidental EOFs). Same object stream + xref stream + FlateDecode shape. Same /Page and /Pages markers. Same absence of textbook active-content tokens.

The exploit in those 5 lives somewhere our heuristic predicate cannot see: encoded shellcode inside compressed object streams, hash-triggered payload loaders, indirect-reference action chains that don't lexically match /JS or /Launch. Decompressed-stream content analysis didn't help either; the Whitepaper actually had MORE suspicious-looking tokens in its decompressed streams (legitimate /AA for action-on-document-open, /Names for navigation) than 4 of the 5 MWB samples. Static analysis cannot separate these shapes.

What we shipped

Track A v1: the right idea, the wrong layer

The original analyzer-tier dampening predicate (committed as a milestone, then refactored). The decision lived in PDFAnalyzer._detect_structural_anomalies; the predicate inspected only the file itself. When G5 caught the regression, we did not roll back the predicate; we kept it as evidence and moved its decision point.

Track A v2: threat-intel-gated dampening (the actual cure)

The analyzer now emits structural-anomaly findings at FULL severity unconditionally, tagged with evidence["dampening_eligible"] + a candidate dampened severity + a dampened description. The engine, AFTER its threat-intel lookup block (MalwareBazaar + VirusTotal + Hybrid Analysis), walks the findings and applies dampening only when threat_intelligence.known_malware is False. The decision point now has the full context the analyzer lacked.

This means: a customer file unknown to intel feeds and structurally matching the no-dangerous-token predicate gets dampened (Whitepaper, customer marketing documents, signed legitimate PDFs in general). A known-malicious file matching the same predicate stays at full severity because the intel hit is dispositive evidence. A zero-day malicious file matching the predicate will be dampened until it reaches the intel feed; that's the bounded residual we document as the trade-off.

Track F: ZIP per-member recursion (an emergent G5 finding)

While validating Track A v2, G5 caught a separate regression: 3 MalwareBazaar Gootloader ZIP archives (each containing an LNK dropper inside) had silently moved from CRIT to SAFE between v1.4.4 and v1.4.5. The root cause was an unrelated v1.4.5 change: the OneNote scope guard tightened a regex that had accidentally been giving the engine ZIP-content coverage as a side effect. We added explicit two-layer ZIP per-member detection: Layer 1 enumerates the central directory and emits archive_contains_suspicious_member findings based on filename heuristics (works on encrypted ZIPs); Layer 2 extracts and routes each unencrypted member through the LNK/PE/Script analyzers and emits archive_member_<inner> findings. The 3 Gootloader ZIPs moved from 46/SAFE to 89/90/90 HIGH. This is the long-deferred "Recursive decomposition not implemented" backlog item from the v1.3.0 known-pitfalls list; it shipped as part of v1.4.5 because the G5 finding forced the issue.

Tracks B, C, D, E: surface fixes that landed in the same release

Track B: The scan command's default file-types list now includes script-dropper extensions; the --file-types argument accepts comma-separated values (the old space-separated form still works with a deprecation warning); explicit-file scans always bypass the type filter.
Track C: The license enforcer now hard-fails on signature validation failure (CLI exit 1, API 503, banner reports license-invalid). The previous "configured-but-invalid" state was silently degrading to free tier, which masked the v1.4.3 RSA pubkey divergence for weeks. The diagnostic accessor hades doctor reports the pubkey modulus SHA-256, never the key itself.
Track D: The startup banner reports rule count consistently ("97 YARA rules across 12 files") instead of two slightly different counts in two surfaces.
Track E: The OneNote analyzer scope guard cures a misfire class on multi-format containers (with the unrelated side effect that surfaced Track F).

What we explicitly did NOT do

The same WWCD architectural discipline that shaped v1.4.4 sized v1.4.5: ship the right cure at the right layer, defer the rest with an explicit boundary.

v1.5 VerdictEngine: the customer-facing verdict layer that separates "detection signal" from "customer answer" is still an interface stub. The right re-enable of the v1.4.4 Tier 1 doc fingerprint fast-path lives in v1.5 via result.verdict = "safe", not as a detection short-circuit.
Item C engine-split refactor: the document_engine / executable_engine / script_engine / archive_engine architectural separation is still right, still a multi-week PR, still gated on its own G5 replay before AND after.
Item D allowlist-as-signed-JSON policy bundles: still waiting for a tenant to ask for it.
Item F two-path benign verdict: needs the verdict-engine interface stubs to have real implementations. v1.5.
Tier 2 ML f33-f35 and v1.17 retrain: pairs naturally with the next ML retrain cycle.
Tier 4 portal extension for document-class FP submissions with PII-attestation: v1.5 portal work.

What we learned

When structural shapes cannot separate clean from malicious, fold threat intelligence into the predicate. Token-absence is "we lack evidence of active content," not "we have evidence of safety." Threat-intel hits are dispositive for known malware; intel absence is soft evidence of safety, which is what a dampening predicate needs. v1.4.5 core/deep_format_analyzer.py + core/enhanced_detection_engine.py is the reference implementation; the pattern generalizes to DOCX, ZIP, image-EXIF dampening.
Synth fixtures don't validate real corpora. Every dedicated unit test for Track A v1 passed; G5 caught what they couldn't because real malicious PDFs from APT loaders don't follow textbook shapes. The lesson, formalized after v1.3.0 and reapplied here: any change that can move malicious files into the safe band must run a full-corpus G5 replay before merge, not after.
Dampening beats short-circuit for structural-anomaly FP cures. The structural findings should still fire (audit trail, ML feature signal, future verdict-layer input); their severity is the right axis to tune, not their existence. A malicious PDF with incremental_update + /JS post-EOF keeps full severity because the dangerous-token co-presence disqualifies dampening.
Side-effect coverage is fragile coverage. Track F existed only because an OneNote scope guard regex had been accidentally giving ZIP-content coverage as a side effect for months. When we tightened the regex (correctly), the coverage disappeared (silently). Explicit two-layer ZIP per-member recursion now provides that coverage on purpose, with regression tests guarding it.

Numbers

Metric	Pre-v1.4.5	Post-v1.4.5
HADES Whitepaper PDF, Pro tier	80 HIGH	22 LOW
5 MWB Gootloader / Lazarus PDFs (Track A v2 regression check)	73 HIGH	73 HIGH (preserved)
3 MWB Gootloader ZIPs (Track F)	46 SAFE (regression)	89-90 HIGH (cured)
Clean corpus actionable FPs	222 actionable	0 actionable (cured)
MWB 3,271-file detection rate	98.7%	98.2%
Contagio 11,890-file detection rate	99.9%	92.2%
Targeted Gootloader 9-sample CRITICAL	9 / 9	9 / 10 (one ZIP-LNK at HIGH 89)

The Contagio drop reflects an honest measurement change, not a regression: v1.4.5 reports lower because the per-member ZIP recursion (Track F) now exposes archive findings as individual file scores that the prior side-effect coverage was rolling up into single archive-level CRIT scores. Customer detection of malicious archives is strictly improved (the 3 Gootloader ZIPs are an example); the aggregate rate is the right metric to compare across releases only when the file-set decomposition is held constant.

Trade-secret hygiene

The exact contents of the _PDF_DANGEROUS_TOKENS set, the engine's threat-intel decision flow, the YARA rule names in the dampened-eligible list, and the per-tier scoring weights are proprietary detection logic. This case study describes the architectural pattern (analyzer emits at full severity, engine applies threat-intel-gated dampening; two-layer ZIP per-member recursion with central-dir-only Layer 1 + extracted-member Layer 2) but does not publish predicate contents, rule names, or threshold values. Customers and acquirer due-diligence teams can validate the cure via the public regression results and the public detection metrics; the detection-engine internals remain in the proprietary source tree.

What's next

v1.4.6: Investigate 241 Clean-corpus UNKNOWN entries (Azure-PowerShell modules falling through to threat_level=unknown); cosmetic, no customer impact.
v1.5: Real VerdictEngine behind the v1.4.4 interface stub; LocalAllowlistOracle implementation; document-engine class refactor as standalone PR; Item F two-path benign verdict; ML schema-v9 retrain; portal Tier 4 document-class FP submission flow.
v1.6: Proactive threat-intel feed pilot (CISA KEV, ExploitDB, ZDI, rogue-researcher disclosures); PortalPrevalenceOracle once the customer portal has accumulated enough cross-tenant signal.
v2.0: Tenant-curated allowlist manager with per-tenant signed policy bundles; full document-engine separation.

The structural-anomaly dampening pattern is now generalizable: any future heuristic that wants to dampen on a "no smoking gun" predicate will run the threat-intel gate by default. Customer FP submissions trigger the cure at the right layer, not the most convenient one.

$ hades scan HADES_Whitepaper.pdf --tier pro --format json

Want to test HADES against your own documents?

Try the Live Demo Explore HADES

Tested May 2026 • HADES v1.4.5 • Customer-reported Whitepaper PDF plus MalwareBazaar Gootloader / Lazarus regression samples plus 3,271-file MWB corpus G5 replay