Case Study: Curing a Multi-Revision PDF False-Positive Class with HADES v1.4.5
TL;DR
A customer scanning the DarkHorse HADES Whitepaper PDF on Pro tier saw it flagged Score 80 HIGH. The file was a legitimate marketing document, Adobe-signed, structurally clean, no active content. The detection engine fired three structural-anomaly findings (incremental update + object streams + shadow page replacement) at HIGH severity, unconditionally. We tried a static dampening predicate (no dangerous tokens, no JavaScript, no embedded executables = dampen severity), watched it regress 5 known-malicious PDFs from the MalwareBazaar Gootloader and Lazarus clusters from HIGH to SAFE, then refactored to a threat-intel-gated dampening decision. We also caught a separate regression along the way: 3 ZIP-wrapped Gootloader LNK droppers were silently dropping from CRIT to SAFE under the v1.4.5 OneNote scope guard, which we cured with full ZIP per-member recursion (Track F).
known_malware=False; known-malicious PDFs (Gootloader, Lazarus, Pdfka, generic CVE-2010-0188 / CVE-2009-0927) preserve their full severity because intel hits are dispositive. A subset of zero-day malicious PDFs with structurally clean shapes will still be dampened (zero-days by definition aren't in intel feeds). That residual is a known trade-off: the partial cure is bounded by the threat-intel surface, not by our detection-engine logic. v1.6+ work on a portal-side prevalence oracle is the architectural follow-up.
What happened
A customer-facing instance of HADES v1.4.4 was scanning the published DarkHorse HADES Whitepaper PDF (a marketing document, freely available on darkhorseinfosec.com). The Pro-tier scan reported Score 80 HIGH. The file was:
- About 300 KB on disk.
- Generated by Adobe Acrobat with an incremental save (one revision + one update revision).
- Using PDF 1.5 compressed object streams (
/ObjStm+/XRefStm) for size. - Containing no
/JS,/JavaScript,/Launch,/EmbeddedFile,/AA,/OpenAction, or any other active-content token.
The findings panel showed three structural-anomaly findings, each at HIGH severity: incremental_update (severity 5.0), object_streams (severity 4.0), and shadow_page_replacement (severity 8.0). Stacked, they pushed the aggregate score to 80. The customer's reasonable reading: "if the rules fire on a legitimate Adobe-signed marketing PDF, the rules are tuned wrong."
What we investigated (and what was easy to miss)
The first attempt at a cure was the obvious one: dampen the severity of those three findings when no dangerous-token co-presence exists. The reasoning was clean: if a PDF has incremental updates AND object streams AND shadow page replacement BUT contains no /JS, /Launch, etc., it's a benign multi-revision Adobe document. We called this Track A v1. It cured the Whitepaper FP perfectly. It also passed every dedicated unit test we wrote.
G5 corpus replay against the 3,271-file MalwareBazaar corpus then revealed Track A v1 had moved 5 real-world malicious PDFs from HIGH (73) to SAFE: two Gootloader droppers, one Lazarus loader, and two generic obfuscated PDFs. We pulled the files apart with pikepdf and discovered the bad news: these 5 samples are structurally indistinguishable from the customer Whitepaper. Same %%EOF count (2, no coincidental EOFs). Same object stream + xref stream + FlateDecode shape. Same /Page and /Pages markers. Same absence of textbook active-content tokens.
The exploit in those 5 lives somewhere our heuristic predicate cannot see: encoded shellcode inside compressed object streams, hash-triggered payload loaders, indirect-reference action chains that don't lexically match /JS or /Launch. Decompressed-stream content analysis didn't help either; the Whitepaper actually had MORE suspicious-looking tokens in its decompressed streams (legitimate /AA for action-on-document-open, /Names for navigation) than 4 of the 5 MWB samples. Static analysis cannot separate these shapes.
What we shipped
Track A v1: the right idea, the wrong layer
The original analyzer-tier dampening predicate (committed as a milestone, then refactored). The decision lived in PDFAnalyzer._detect_structural_anomalies; the predicate inspected only the file itself. When G5 caught the regression, we did not roll back the predicate; we kept it as evidence and moved its decision point.
Track A v2: threat-intel-gated dampening (the actual cure)
The analyzer now emits structural-anomaly findings at FULL severity unconditionally, tagged with evidence["dampening_eligible"] + a candidate dampened severity + a dampened description. The engine, AFTER its threat-intel lookup block (MalwareBazaar + VirusTotal + Hybrid Analysis), walks the findings and applies dampening only when threat_intelligence.known_malware is False. The decision point now has the full context the analyzer lacked.
This means: a customer file unknown to intel feeds and structurally matching the no-dangerous-token predicate gets dampened (Whitepaper, customer marketing documents, signed legitimate PDFs in general). A known-malicious file matching the same predicate stays at full severity because the intel hit is dispositive evidence. A zero-day malicious file matching the predicate will be dampened until it reaches the intel feed; that's the bounded residual we document as the trade-off.
Track F: ZIP per-member recursion (an emergent G5 finding)
While validating Track A v2, G5 caught a separate regression: 3 MalwareBazaar Gootloader ZIP archives (each containing an LNK dropper inside) had silently moved from CRIT to SAFE between v1.4.4 and v1.4.5. The root cause was an unrelated v1.4.5 change: the OneNote scope guard tightened a regex that had accidentally been giving the engine ZIP-content coverage as a side effect. We added explicit two-layer ZIP per-member detection: Layer 1 enumerates the central directory and emits archive_contains_suspicious_member findings based on filename heuristics (works on encrypted ZIPs); Layer 2 extracts and routes each unencrypted member through the LNK/PE/Script analyzers and emits archive_member_<inner> findings. The 3 Gootloader ZIPs moved from 46/SAFE to 89/90/90 HIGH. This is the long-deferred "Recursive decomposition not implemented" backlog item from the v1.3.0 known-pitfalls list; it shipped as part of v1.4.5 because the G5 finding forced the issue.
Tracks B, C, D, E: surface fixes that landed in the same release
- Track B: The scan command's default file-types list now includes script-dropper extensions; the
--file-typesargument accepts comma-separated values (the old space-separated form still works with a deprecation warning); explicit-file scans always bypass the type filter. - Track C: The license enforcer now hard-fails on signature validation failure (CLI exit 1, API 503, banner reports
license-invalid). The previous "configured-but-invalid" state was silently degrading to free tier, which masked the v1.4.3 RSA pubkey divergence for weeks. The diagnostic accessorhades doctorreports the pubkey modulus SHA-256, never the key itself. - Track D: The startup banner reports rule count consistently ("97 YARA rules across 12 files") instead of two slightly different counts in two surfaces.
- Track E: The OneNote analyzer scope guard cures a misfire class on multi-format containers (with the unrelated side effect that surfaced Track F).
What we explicitly did NOT do
The same WWCD architectural discipline that shaped v1.4.4 sized v1.4.5: ship the right cure at the right layer, defer the rest with an explicit boundary.
- v1.5 VerdictEngine: the customer-facing verdict layer that separates "detection signal" from "customer answer" is still an interface stub. The right re-enable of the v1.4.4 Tier 1 doc fingerprint fast-path lives in v1.5 via
result.verdict = "safe", not as a detection short-circuit. - Item C engine-split refactor: the document_engine / executable_engine / script_engine / archive_engine architectural separation is still right, still a multi-week PR, still gated on its own G5 replay before AND after.
- Item D allowlist-as-signed-JSON policy bundles: still waiting for a tenant to ask for it.
- Item F two-path benign verdict: needs the verdict-engine interface stubs to have real implementations. v1.5.
- Tier 2 ML f33-f35 and v1.17 retrain: pairs naturally with the next ML retrain cycle.
- Tier 4 portal extension for document-class FP submissions with PII-attestation: v1.5 portal work.
What we learned
- When structural shapes cannot separate clean from malicious, fold threat intelligence into the predicate. Token-absence is "we lack evidence of active content," not "we have evidence of safety." Threat-intel hits are dispositive for known malware; intel absence is soft evidence of safety, which is what a dampening predicate needs. v1.4.5
core/deep_format_analyzer.py+core/enhanced_detection_engine.pyis the reference implementation; the pattern generalizes to DOCX, ZIP, image-EXIF dampening. - Synth fixtures don't validate real corpora. Every dedicated unit test for Track A v1 passed; G5 caught what they couldn't because real malicious PDFs from APT loaders don't follow textbook shapes. The lesson, formalized after v1.3.0 and reapplied here: any change that can move malicious files into the safe band must run a full-corpus G5 replay before merge, not after.
- Dampening beats short-circuit for structural-anomaly FP cures. The structural findings should still fire (audit trail, ML feature signal, future verdict-layer input); their severity is the right axis to tune, not their existence. A malicious PDF with
incremental_update+/JSpost-EOF keeps full severity because the dangerous-token co-presence disqualifies dampening. - Side-effect coverage is fragile coverage. Track F existed only because an OneNote scope guard regex had been accidentally giving ZIP-content coverage as a side effect for months. When we tightened the regex (correctly), the coverage disappeared (silently). Explicit two-layer ZIP per-member recursion now provides that coverage on purpose, with regression tests guarding it.
Numbers
| Metric | Pre-v1.4.5 | Post-v1.4.5 |
|---|---|---|
| HADES Whitepaper PDF, Pro tier | 80 HIGH | 22 LOW |
| 5 MWB Gootloader / Lazarus PDFs (Track A v2 regression check) | 73 HIGH | 73 HIGH (preserved) |
| 3 MWB Gootloader ZIPs (Track F) | 46 SAFE (regression) | 89-90 HIGH (cured) |
| Clean corpus actionable FPs | 222 actionable | 0 actionable (cured) |
| MWB 3,271-file detection rate | 98.7% | 98.2% |
| Contagio 11,890-file detection rate | 99.9% | 92.2% |
| Targeted Gootloader 9-sample CRITICAL | 9 / 9 | 9 / 10 (one ZIP-LNK at HIGH 89) |
The Contagio drop reflects an honest measurement change, not a regression: v1.4.5 reports lower because the per-member ZIP recursion (Track F) now exposes archive findings as individual file scores that the prior side-effect coverage was rolling up into single archive-level CRIT scores. Customer detection of malicious archives is strictly improved (the 3 Gootloader ZIPs are an example); the aggregate rate is the right metric to compare across releases only when the file-set decomposition is held constant.
Trade-secret hygiene
The exact contents of the _PDF_DANGEROUS_TOKENS set, the engine's threat-intel decision flow, the YARA rule names in the dampened-eligible list, and the per-tier scoring weights are proprietary detection logic. This case study describes the architectural pattern (analyzer emits at full severity, engine applies threat-intel-gated dampening; two-layer ZIP per-member recursion with central-dir-only Layer 1 + extracted-member Layer 2) but does not publish predicate contents, rule names, or threshold values. Customers and acquirer due-diligence teams can validate the cure via the public regression results and the public detection metrics; the detection-engine internals remain in the proprietary source tree.
What's next
- v1.4.6: Investigate 241 Clean-corpus UNKNOWN entries (Azure-PowerShell modules falling through to
threat_level=unknown); cosmetic, no customer impact. - v1.5: Real
VerdictEnginebehind the v1.4.4 interface stub;LocalAllowlistOracleimplementation; document-engine class refactor as standalone PR; Item F two-path benign verdict; ML schema-v9 retrain; portal Tier 4 document-class FP submission flow. - v1.6: Proactive threat-intel feed pilot (CISA KEV, ExploitDB, ZDI, rogue-researcher disclosures);
PortalPrevalenceOracleonce the customer portal has accumulated enough cross-tenant signal. - v2.0: Tenant-curated allowlist manager with per-tenant signed policy bundles; full document-engine separation.
The structural-anomaly dampening pattern is now generalizable: any future heuristic that wants to dampen on a "no smoking gun" predicate will run the threat-intel gate by default. Customer FP submissions trigger the cure at the right layer, not the most convenient one.
Tested May 2026 • HADES v1.4.5 • Customer-reported Whitepaper PDF plus MalwareBazaar Gootloader / Lazarus regression samples plus 3,271-file MWB corpus G5 replay