Chapter 2: Design Methods
Executable design principles, failure analysis, decision logic, and key dimensions
2.1 Executable Design Principles
These twelve design principles are not abstract guidelines — each one is accompanied by a specific condition under which it applies, a technical basis, and an acceptance criterion. Organizations should map each principle to their applicable compliance framework and verify implementation through the specified acceptance evidence. Principles are ordered from most fundamental (evidence preservation) to most operational (lifecycle management).
Principle 1: Evidence-First Storage
Write raw logs to immutable storage before or alongside indexing. The index is a convenience layer; the raw vault is the evidence layer. Never allow indexing failure to cause evidence loss.
Principle 2: Separation of Index and Evidence
The search index is rebuildable from raw evidence; evidence is immutable and cannot be rebuilt from the index. These two stores must reside on separate logical or physical storage with separate access controls.
Principle 3: End-to-End Identity Binding
Every sender has a verifiable identity (certificate/TPM) and is allowlisted before being permitted to inject logs. Anonymous or unauthenticated senders must be rejected at the collection boundary.
Principle 4: Reliable Delivery with Explicit Loss Accounting
Buffers, retries, backpressure, and drop counters must all exist. Any log loss must be explicitly counted, alarmed, and accounted for — "acceptable loss" without measurement is not acceptable in a compliance context.
Principle 5: Time as a Security Control
Detect and alarm on time drift. Record both event_time (from source) and receive_time (at collector). Use authenticated NTP. Treat time manipulation as a security incident, not an operational issue.
Principle 6: Cryptographic Integrity with Periodic Verification
Hash chains and digital signatures are not sufficient alone — they must be verified on a schedule. Unverified integrity controls provide only the illusion of protection; tampering may go undetected for months without active verification.
Principle 7: Immutability Enforced by Storage Controls, Not Only Software
WORM/Object Lock/SnapLock must be enabled at the storage platform level. Software-only immutability can be bypassed by a privileged administrator. Storage-level locks cannot be removed even by the storage administrator without a separate governance process.
Principle 8: Least Privilege + Separation of Duties
Operations staff cannot delete evidence. Auditors can read but not modify. Security analysts can search but not export raw evidence without approval. No single role should have the combined ability to delete evidence and delete the audit trail of that deletion.
Principle 9: Dual Control for Destructive Actions
Require two-person approval plus alerting for purge, retention change, time change, and key policy changes. The approval workflow itself must be logged immutably. Break-glass procedures must be documented and audited post-use.
Principle 10: Observable System (Meta-Logging)
Every component emits its own operational logs and metrics. The log system must log about itself — collector health, ingest rates, verification results, and admin actions are all first-class evidence. A system that cannot be observed cannot be trusted.
Principle 11: Lifecycle Policy Clarity
Define hot/warm/archive retention, legal hold, and deletion workflow with audit trail before deployment. Ambiguous retention policies lead to either premature deletion (compliance risk) or unbounded cost growth. Legal hold must override all automated lifecycle policies.
Principle 12: Key Management as Part of the Evidence Chain
Keys, rotations, and KMS access logs are themselves part of the evidence chain. A compromised key can decrypt or forge evidence. KMS audit logs must be stored with the same immutability guarantees as the evidence they protect.
2.2 Failure Reasons and Recommendations
The table below documents the most common failure mechanisms observed in log security deployments, along with root causes, avoidance strategies, and the specific acceptance evidence required to verify that each failure mode has been addressed. Each failure mechanism represents a real-world incident pattern where evidence integrity was compromised or evidence was lost entirely.
| Failure Mechanism | What Happens | Root Cause | Avoidance / Recommendation | Acceptance Evidence |
|---|---|---|---|---|
| Silent loss during outage | Gaps in evidence timeline | No buffering/retry | Collector disk queue + replay + drop counters | Outage test report |
| Source spoofing | Fake logs injected | No auth/allowlist | mTLS + CN/SAN allowlist + network ACL | Rejection logs |
| Timestamp manipulation | Misordered events | Untrusted NTP, no drift alarm | Authenticated NTP + drift thresholds + dual timestamps | Drift alarm test |
| Index-only storage | Evidence destroyed by reindex | Raw not preserved | Raw vault immutable + index rebuild | Rebuild drill |
| Insider purge | Logs deleted to cover tracks | Over-privilege | SoD + dual control + immutable audit store | RBAC matrix + audit proof |
| Parsing drift | Wrong fields, missed alerts | Schema changes without versioning | Versioned parsers + tests + fallback raw view | Parser CI results |
| Weak retention governance | Non-compliance | No policy/automation | Tiered lifecycle + legal hold + scheduled reports | Retention report |
| Key compromise | Decrypt/forge evidence | No KMS controls | KMS/HSM, rotation, access logging, break-glass | Key audit logs |
| Storage misconfig | "Immutable" not truly immutable | Lock not enforced at storage level | Storage-level retention lock + periodic immutability verification | Lock status proof |
2.3 Core Design / Selection Logic
The decision tree below provides a structured path from requirements to solution design. Starting with the most fundamental question — whether court-admissible or strong compliance evidence is required — the tree branches through log source type, scale, latency requirements, and geographic distribution to arrive at a specific combination of collection, storage, integrity, and DR design choices. Each branch is annotated with the acceptance KPI that verifies the correct choice was made.
The decision process follows eight sequential steps that must be completed in order. Skipping steps or making assumptions without evidence leads to the failure modes documented in Section 2.2. The steps are designed to be repeatable — they should be re-executed whenever the threat model, scale, or compliance requirements change significantly.
- Classify logs by criticality: critical/security/legal vs. operational/diagnostic
- Determine threat model: external attacker, insider admin, or compromised source
- Choose collection method per source type: agent, syslog, or API pull
- Size for peak EPS plus burst; define buffer duration to survive expected outages
- Choose storage tiers and immutability mechanism (WORM/Object Lock/SnapLock)
- Define integrity method: hash chain cadence and signature frequency
- Define RBAC/SoD matrix and dual control workflows for destructive actions
- Define acceptance tests and periodic audit schedule
2.4 Key Design Dimensions
Log security system design must be evaluated across seven dimensions that reflect both technical and organizational requirements. These dimensions are not independent — trade-offs between them (e.g., compression vs. query performance, or retention cost vs. compliance duration) must be explicitly documented and approved by the relevant stakeholders.
Performance & Experience
Ingest latency, query latency, analyst usability, and evidence export time. Security logs typically require lower latency than operational logs.
Stability & Reliability
RPO/RTO targets, collector failover behavior, storage durability guarantees, and replay correctness after outages.
Maintainability & Replaceability
Agent upgrade strategy, schema versioning for parsers, rolling upgrade procedures, and backward compatibility windows.
Compatibility & Extensibility
Support for open formats (CEF/LEEF/JSON), open APIs for new source onboarding, and parallel-run migration strategy.
Life-Cycle Cost (LCC)
Storage growth modeling, compute cost for indexing, staff time for operations, and cost of compliance audits.
Energy & Sustainability
Compression ratios, tiering to cold archive, right-sizing to avoid over-provisioning, and power efficiency of storage platforms.
Compliance & Certification
Auditable controls with evidence, immutable retention with lock status proof, periodic access reviews, and evidence export procedures.