DarkHorse InfoSec

Case Study: Curing a Multi-Revision PDF False-Positive Class with HADES v1.4.5

TL;DR

A customer scanning the DarkHorse HADES Whitepaper PDF on Pro tier saw it flagged Score 80 HIGH. The file was a legitimate marketing document, Adobe-signed, structurally clean, no active content. The detection engine fired three structural-anomaly findings (incremental update + object streams + shadow page replacement) at HIGH severity, unconditionally. We tried a static dampening predicate (no dangerous tokens, no JavaScript, no embedded executables = dampen severity), watched it regress 5 known-malicious PDFs from the MalwareBazaar Gootloader and Lazarus clusters from HIGH to SAFE, then refactored to a threat-intel-gated dampening decision. We also caught a separate regression along the way: 3 ZIP-wrapped Gootloader LNK droppers were silently dropping from CRIT to SAFE under the v1.4.5 OneNote scope guard, which we cured with full ZIP per-member recursion (Track F).

Honest scope: v1.4.5 cures the structural-anomaly PDF FP class for documents that are threat-intel-clean. Dampening fires only when the engine's threat-intelligence block returns known_malware=False; known-malicious PDFs (Gootloader, Lazarus, Pdfka, generic CVE-2010-0188 / CVE-2009-0927) preserve their full severity because intel hits are dispositive. A subset of zero-day malicious PDFs with structurally clean shapes will still be dampened (zero-days by definition aren't in intel feeds). That residual is a known trade-off: the partial cure is bounded by the threat-intel surface, not by our detection-engine logic. v1.6+ work on a portal-side prevalence oracle is the architectural follow-up.

What happened

A customer-facing instance of HADES v1.4.4 was scanning the published DarkHorse HADES Whitepaper PDF (a marketing document, freely available on darkhorseinfosec.com). The Pro-tier scan reported Score 80 HIGH. The file was:

The findings panel showed three structural-anomaly findings, each at HIGH severity: incremental_update (severity 5.0), object_streams (severity 4.0), and shadow_page_replacement (severity 8.0). Stacked, they pushed the aggregate score to 80. The customer's reasonable reading: "if the rules fire on a legitimate Adobe-signed marketing PDF, the rules are tuned wrong."

What we investigated (and what was easy to miss)

The first attempt at a cure was the obvious one: dampen the severity of those three findings when no dangerous-token co-presence exists. The reasoning was clean: if a PDF has incremental updates AND object streams AND shadow page replacement BUT contains no /JS, /Launch, etc., it's a benign multi-revision Adobe document. We called this Track A v1. It cured the Whitepaper FP perfectly. It also passed every dedicated unit test we wrote.

G5 corpus replay against the 3,271-file MalwareBazaar corpus then revealed Track A v1 had moved 5 real-world malicious PDFs from HIGH (73) to SAFE: two Gootloader droppers, one Lazarus loader, and two generic obfuscated PDFs. We pulled the files apart with pikepdf and discovered the bad news: these 5 samples are structurally indistinguishable from the customer Whitepaper. Same %%EOF count (2, no coincidental EOFs). Same object stream + xref stream + FlateDecode shape. Same /Page and /Pages markers. Same absence of textbook active-content tokens.

The exploit in those 5 lives somewhere our heuristic predicate cannot see: encoded shellcode inside compressed object streams, hash-triggered payload loaders, indirect-reference action chains that don't lexically match /JS or /Launch. Decompressed-stream content analysis didn't help either; the Whitepaper actually had MORE suspicious-looking tokens in its decompressed streams (legitimate /AA for action-on-document-open, /Names for navigation) than 4 of the 5 MWB samples. Static analysis cannot separate these shapes.

What we shipped

Track A v1: the right idea, the wrong layer

The original analyzer-tier dampening predicate (committed as a milestone, then refactored). The decision lived in PDFAnalyzer._detect_structural_anomalies; the predicate inspected only the file itself. When G5 caught the regression, we did not roll back the predicate; we kept it as evidence and moved its decision point.

Track A v2: threat-intel-gated dampening (the actual cure)

The analyzer now emits structural-anomaly findings at FULL severity unconditionally, tagged with evidence["dampening_eligible"] + a candidate dampened severity + a dampened description. The engine, AFTER its threat-intel lookup block (MalwareBazaar + VirusTotal + Hybrid Analysis), walks the findings and applies dampening only when threat_intelligence.known_malware is False. The decision point now has the full context the analyzer lacked.

This means: a customer file unknown to intel feeds and structurally matching the no-dangerous-token predicate gets dampened (Whitepaper, customer marketing documents, signed legitimate PDFs in general). A known-malicious file matching the same predicate stays at full severity because the intel hit is dispositive evidence. A zero-day malicious file matching the predicate will be dampened until it reaches the intel feed; that's the bounded residual we document as the trade-off.

Track F: ZIP per-member recursion (an emergent G5 finding)

While validating Track A v2, G5 caught a separate regression: 3 MalwareBazaar Gootloader ZIP archives (each containing an LNK dropper inside) had silently moved from CRIT to SAFE between v1.4.4 and v1.4.5. The root cause was an unrelated v1.4.5 change: the OneNote scope guard tightened a regex that had accidentally been giving the engine ZIP-content coverage as a side effect. We added explicit two-layer ZIP per-member detection: Layer 1 enumerates the central directory and emits archive_contains_suspicious_member findings based on filename heuristics (works on encrypted ZIPs); Layer 2 extracts and routes each unencrypted member through the LNK/PE/Script analyzers and emits archive_member_<inner> findings. The 3 Gootloader ZIPs moved from 46/SAFE to 89/90/90 HIGH. This is the long-deferred "Recursive decomposition not implemented" backlog item from the v1.3.0 known-pitfalls list; it shipped as part of v1.4.5 because the G5 finding forced the issue.

Tracks B, C, D, E: surface fixes that landed in the same release

What we explicitly did NOT do

The same WWCD architectural discipline that shaped v1.4.4 sized v1.4.5: ship the right cure at the right layer, defer the rest with an explicit boundary.

What we learned

  1. When structural shapes cannot separate clean from malicious, fold threat intelligence into the predicate. Token-absence is "we lack evidence of active content," not "we have evidence of safety." Threat-intel hits are dispositive for known malware; intel absence is soft evidence of safety, which is what a dampening predicate needs. v1.4.5 core/deep_format_analyzer.py + core/enhanced_detection_engine.py is the reference implementation; the pattern generalizes to DOCX, ZIP, image-EXIF dampening.
  2. Synth fixtures don't validate real corpora. Every dedicated unit test for Track A v1 passed; G5 caught what they couldn't because real malicious PDFs from APT loaders don't follow textbook shapes. The lesson, formalized after v1.3.0 and reapplied here: any change that can move malicious files into the safe band must run a full-corpus G5 replay before merge, not after.
  3. Dampening beats short-circuit for structural-anomaly FP cures. The structural findings should still fire (audit trail, ML feature signal, future verdict-layer input); their severity is the right axis to tune, not their existence. A malicious PDF with incremental_update + /JS post-EOF keeps full severity because the dangerous-token co-presence disqualifies dampening.
  4. Side-effect coverage is fragile coverage. Track F existed only because an OneNote scope guard regex had been accidentally giving ZIP-content coverage as a side effect for months. When we tightened the regex (correctly), the coverage disappeared (silently). Explicit two-layer ZIP per-member recursion now provides that coverage on purpose, with regression tests guarding it.

Numbers

MetricPre-v1.4.5Post-v1.4.5
HADES Whitepaper PDF, Pro tier80 HIGH22 LOW
5 MWB Gootloader / Lazarus PDFs (Track A v2 regression check)73 HIGH73 HIGH (preserved)
3 MWB Gootloader ZIPs (Track F)46 SAFE (regression)89-90 HIGH (cured)
Clean corpus actionable FPs222 actionable0 actionable (cured)
MWB 3,271-file detection rate98.7%98.2%
Contagio 11,890-file detection rate99.9%92.2%
Targeted Gootloader 9-sample CRITICAL9 / 99 / 10 (one ZIP-LNK at HIGH 89)

The Contagio drop reflects an honest measurement change, not a regression: v1.4.5 reports lower because the per-member ZIP recursion (Track F) now exposes archive findings as individual file scores that the prior side-effect coverage was rolling up into single archive-level CRIT scores. Customer detection of malicious archives is strictly improved (the 3 Gootloader ZIPs are an example); the aggregate rate is the right metric to compare across releases only when the file-set decomposition is held constant.

Trade-secret hygiene

The exact contents of the _PDF_DANGEROUS_TOKENS set, the engine's threat-intel decision flow, the YARA rule names in the dampened-eligible list, and the per-tier scoring weights are proprietary detection logic. This case study describes the architectural pattern (analyzer emits at full severity, engine applies threat-intel-gated dampening; two-layer ZIP per-member recursion with central-dir-only Layer 1 + extracted-member Layer 2) but does not publish predicate contents, rule names, or threshold values. Customers and acquirer due-diligence teams can validate the cure via the public regression results and the public detection metrics; the detection-engine internals remain in the proprietary source tree.

What's next

The structural-anomaly dampening pattern is now generalizable: any future heuristic that wants to dampen on a "no smoking gun" predicate will run the threat-intel gate by default. Customer FP submissions trigger the cure at the right layer, not the most convenient one.

$ hades scan HADES_Whitepaper.pdf --tier pro --format json

Want to test HADES against your own documents?

Try the Live Demo Explore HADES

Tested May 2026 • HADES v1.4.5 • Customer-reported Whitepaper PDF plus MalwareBazaar Gootloader / Lazarus regression samples plus 3,271-file MWB corpus G5 replay