DarkHorse InfoSec

Case Study: Curing a Document False-Positive Class with HADES v1.4.4

TL;DR

A customer scanning a 20 KB Microsoft Excel spreadsheet on HADES Community tier saw it flagged Score 84 HIGH. The file was a school class roster, structurally clean, no macros, no embedded content. We took the report seriously, traced the FP across the engine, found four structural defects plus one bonus YARA-filter gap, shipped a layered cure, and added a 100-document business-document regression corpus so this class of FP can't recur silently. We also caught and closed a separate PII-redaction gap discovered during the investigation: the FP report itself would have exposed personal email addresses in customer SIEM events.

Honest scope: v1.4.4 cures the Excel/Office FP class entirely. A separate customer-reported FP on a multi-revision PDF whitepaper (Pro tier, Score 80) is partially diagnosed and deferred to v1.4.5. The v1.4.4 deterministic fast-path appeared to cure it, but G5 corpus replay found it also mistakenly cleared 75 real-world malicious PDFs from the MalwareBazaar 3,271-file corpus. We shipped the fast-path module with the short-circuit DISABLED, kept the predicate as the v1.5 verdict-engine foundation, and tracked the PDF-side fix as a dedicated v1.4.5 follow-up in the deep-format analyzer. Shipping with the short-circuit on would have moved malicious files into the safe band — a regression worse than the original FP.

What happened

A customer-facing instance of HADES v1.4.3 was scanning a Microsoft Excel file (a school class roster .xlsx). The Community-tier scan reported Score 84 HIGH. The file was:

A second customer report on a different file (HADES Whitepaper v1.1.0 PDF, scanned on Pro tier) showed Score 80 HIGH with the YARA noise filter masking which rules fired (YARA: filtered 5/5 noisy matches for PDF format). Two FPs, two file formats, two license tiers. The shape of the bug was bigger than either sample on its own.

What we investigated (and what was easy to miss)

The natural triage was "find the YARA rule firing on the xlsx and silence it." That turned out to be wrong on three counts.

First, the source CLI scored the xlsx at 16 LOW, not 84 HIGH. The customer was running the deployed Nuitka binary or the demo container; the source HEAD at v1.4.3 had partially-fixed something but not propagated to the customer. We documented this gap explicitly so the fix would address both paths.

Second, the FP wasn't a single bug. It was four structural defects stacked, plus a fifth that surfaced only when we built a synthetic 100-doc test corpus:

  1. The polyglot detector iterated every file format for OOXML zips, hitting PK\x03\x04 from the zip wrapper at offset 0 and flagging the file as a ZIP-format polyglot of itself.
  2. The MIME validator had no entries for DOCX, XLSX, or PPTX. When libmagic returned application/zip for an xlsx (structurally accurate), validation reported a 5.0 threat score.
  3. The base64 pattern scanner fired on legitimate Office metadata values (revision GUIDs, Google Sheets roundtripDataChecksum, hash digests) without any context check on the field name.
  4. The format detection iterated DOCX first in the file-signatures table, so any xlsx tied on signature got classified as DOCX, cascading into wrong MIME comparisons.
  5. (Bonus) The pre-existing YARA noise filter for compressed formats only fired when the file was larger than 100 KB. Real business documents are typically smaller; 35 KB generated contracts lit up multiple noisy rules on every clean DOCX and PPTX at Pro tier because they bypassed the size gate.

Third, the FP was leaking PII. The xlsx carried 18 personal email addresses. HADES findings would include those addresses in the description and evidence fields, which then flowed to customer SOC dashboards, portal submissions, and Splunk events. A customer scanning a HIPAA-regulated file (medical roster, insurance roll, financial export) would have HADES leak the PII into their incident-response pipeline. We promoted PII redaction from "nice to have" to "hard prerequisite" for the v1.4.4 ship.

What we shipped

Three surgical patches in the detection path

One emergent bonus patch in the YARA noise filter

Drop the 100 KB size gate for the OOXML and JAR families (always zip-wrapped at all sizes) so noisy rules are suppressed on real business documents regardless of file size. Add the hidden-archive-in-metadata rule to the OOXML-only suppression list because the rule was designed for multi-archive families but mis-fires on pure OOXML wrappers. Non-OOXML formats keep the 100 KB threshold unchanged.

One architectural addition (the WWCD piece)

A deterministic structural-fingerprint fast-path that short-circuits heuristic analysis BEFORE the engine runs IOC and heuristic stages, when a document matches an "obviously clean" predicate (whitelisted member list, no macros, no embedded executables, no http/s external Targets outside known schema hosts, well-formed XML, valid PDF xref with single %%EOF). This mirrors our existing v1.16.1 SHA-256 hash allowlist and v1.9.0 magic-byte-format ML feature pattern: the deterministic allowlist is the trust boundary, ML is the second opinion, raw heuristics are the third. Acquirer-grade by design: detection-engine vendors plug their cloud reputation services into this layer cleanly.

Two architectural interface stubs

One mandatory cross-cutting hardening

A PII redactor at the result-serialization boundary that masks emails, phones, SSNs, Luhn-valid credit cards, and user-home filesystem paths, while preserving SHA-256 hashes, MAC addresses, UUIDs, ISO timestamps, system paths, and Luhn-invalid CC-shaped tracking numbers. A new --ir-mode CLI flag opts out for SOC incident-investigation workflows.

One regression gate

A 100-document business-document clean corpus (25 each XLSX, DOCX, PDF, PPTX) generated deterministically by a committed script. The regression test asserts zero HIGH or CRITICAL on both Community and Pro tiers across all 100 documents plus both real-world customer-reported FP samples. Auto-generated on demand so the test "just works" in local dev and CI.

What we explicitly did NOT do

WWCD architectural discipline means knowing when to STOP. We deferred several items to v1.5+ rather than entangle them with the FP cure:

The discipline of stopping at the right boundary is itself the WWCD lesson: every fix is an architectural investment, not a tactical unblock, but every investment is sized to what the signal supports.

What we learned

  1. Real-world clean documents must be in the regression gate. Until we built the 100-doc business corpus, we had no test that asserted "100 clean DOCX, XLSX, PPTX, and PDF score zero HIGH or CRITICAL." That's exactly the gate the v1.3.0 retrospective said we needed for malicious detection; we extended it symmetrically to benign detection.
  2. YARA noise filters need size-class awareness, not just size thresholds. A 100 KB size gate that excludes 35 KB business documents is a gate that doesn't fire when it's most needed. For inherently-compressed formats (OOXML, JAR, ZIP, archives in general) the gate should be format-aware, not byte-counted.
  3. Customer-visible findings must never carry raw PII. This is HIPAA and GDPR baseline regardless of detection accuracy. The audit log can carry originals (encrypted at rest, role-gated); the SIEM event, the portal submission, and the customer report cannot.

Numbers

MetricPre-v1.4.4Post-v1.4.4
Customer xlsx, Community tier84 HIGH (binary) / 16 LOW (source)0 SAFE
Customer xlsx, Pro tiernot reproduced0 SAFE
100-doc business corpus HIGH/CRIT (Community)0 / 1000 / 100
100-doc business corpus HIGH/CRIT (Pro)50 / 1000 / 100
Customer-visible PII leak in findingsyes (18 emails in xlsx)redacted
MWB 3,271-file detection rate98.66%98.7%
Contagio 11,890-file detection rate99.89%99.9%

Trade-secret hygiene

The exact predicate set inside the deterministic structural fingerprint and the YARA rule names in the suppression sets are proprietary detection logic. This case study describes the architectural pattern (deterministic allowlist + ML + heuristics in priority order; OOXML-family size-class awareness; context-gated PII patterns) but does not publish the specific allowlist members, rule names, or threshold values. Customers and acquirer due-diligence teams can validate the cure via the public regression corpus and the public detection metrics; the detection-engine internals remain in the proprietary source tree.

What's next

We will not wait until v2.0 to listen to customer FP reports. Every FP submission is sized at the time it lands; v1.4.4 is the proof that we ship the fix at the right altitude for the report.

$ hades scan inventory.xlsx --format json

Want to test HADES against your own documents?

Try the Live Demo Explore HADES

Tested May 2026 • HADES v1.4.4 • Customer-reported samples plus synthetic 100-document business corpus