Chapter 2: Design Methods

Executable design principles, failure analysis, decision logic, and key dimensions

2.1 Executable Design Principles

These twelve design principles are not abstract guidelines — each one is accompanied by a specific condition under which it applies, a technical basis, and an acceptance criterion. Organizations should map each principle to their applicable compliance framework and verify implementation through the specified acceptance evidence. Principles are ordered from most fundamental (evidence preservation) to most operational (lifecycle management).

Principle 1: Evidence-First Storage

Write raw logs to immutable storage before or alongside indexing. The index is a convenience layer; the raw vault is the evidence layer. Never allow indexing failure to cause evidence loss.

Condition: Any environment requiring audit/forensics  |  Basis: Evidence preservation practice

Principle 2: Separation of Index and Evidence

The search index is rebuildable from raw evidence; evidence is immutable and cannot be rebuilt from the index. These two stores must reside on separate logical or physical storage with separate access controls.

Condition: SIEM/search used  |  Basis: Fault-tolerant evidence model

Principle 3: End-to-End Identity Binding

Every sender has a verifiable identity (certificate/TPM) and is allowlisted before being permitted to inject logs. Anonymous or unauthenticated senders must be rejected at the collection boundary.

Condition: Multi-tenant or multi-zone  |  Basis: Zero trust architecture

Principle 4: Reliable Delivery with Explicit Loss Accounting

Buffers, retries, backpressure, and drop counters must all exist. Any log loss must be explicitly counted, alarmed, and accounted for — "acceptable loss" without measurement is not acceptable in a compliance context.

Condition: Any outage-prone networks  |  Basis: Distributed systems reliability

Principle 5: Time as a Security Control

Detect and alarm on time drift. Record both event_time (from source) and receive_time (at collector). Use authenticated NTP. Treat time manipulation as a security incident, not an operational issue.

Condition: Incident correlation  |  Basis: Forensic best practice

Principle 6: Cryptographic Integrity with Periodic Verification

Hash chains and digital signatures are not sufficient alone — they must be verified on a schedule. Unverified integrity controls provide only the illusion of protection; tampering may go undetected for months without active verification.

Condition: Anti-tamper requirement  |  Basis: Non-repudiation

Principle 7: Immutability Enforced by Storage Controls, Not Only Software

WORM/Object Lock/SnapLock must be enabled at the storage platform level. Software-only immutability can be bypassed by a privileged administrator. Storage-level locks cannot be removed even by the storage administrator without a separate governance process.

Condition: Insider threat model  |  Basis: Defense-in-depth

Principle 8: Least Privilege + Separation of Duties

Operations staff cannot delete evidence. Auditors can read but not modify. Security analysts can search but not export raw evidence without approval. No single role should have the combined ability to delete evidence and delete the audit trail of that deletion.

Condition: Regulated environments  |  Basis: Governance controls

Principle 9: Dual Control for Destructive Actions

Require two-person approval plus alerting for purge, retention change, time change, and key policy changes. The approval workflow itself must be logged immutably. Break-glass procedures must be documented and audited post-use.

Condition: High impact operations  |  Basis: Safety engineering

Principle 10: Observable System (Meta-Logging)

Every component emits its own operational logs and metrics. The log system must log about itself — collector health, ingest rates, verification results, and admin actions are all first-class evidence. A system that cannot be observed cannot be trusted.

Condition: Production operation  |  Basis: SRE principles

Principle 11: Lifecycle Policy Clarity

Define hot/warm/archive retention, legal hold, and deletion workflow with audit trail before deployment. Ambiguous retention policies lead to either premature deletion (compliance risk) or unbounded cost growth. Legal hold must override all automated lifecycle policies.

Condition: Long retention  |  Basis: Compliance requirements

Principle 12: Key Management as Part of the Evidence Chain

Keys, rotations, and KMS access logs are themselves part of the evidence chain. A compromised key can decrypt or forge evidence. KMS audit logs must be stored with the same immutability guarantees as the evidence they protect.

Condition: Encryption at rest/in transit  |  Basis: Cryptographic assurance

2.2 Failure Reasons and Recommendations

The table below documents the most common failure mechanisms observed in log security deployments, along with root causes, avoidance strategies, and the specific acceptance evidence required to verify that each failure mode has been addressed. Each failure mechanism represents a real-world incident pattern where evidence integrity was compromised or evidence was lost entirely.

Failure Mechanism What Happens Root Cause Avoidance / Recommendation Acceptance Evidence
Silent loss during outageGaps in evidence timelineNo buffering/retryCollector disk queue + replay + drop countersOutage test report
Source spoofingFake logs injectedNo auth/allowlistmTLS + CN/SAN allowlist + network ACLRejection logs
Timestamp manipulationMisordered eventsUntrusted NTP, no drift alarmAuthenticated NTP + drift thresholds + dual timestampsDrift alarm test
Index-only storageEvidence destroyed by reindexRaw not preservedRaw vault immutable + index rebuildRebuild drill
Insider purgeLogs deleted to cover tracksOver-privilegeSoD + dual control + immutable audit storeRBAC matrix + audit proof
Parsing driftWrong fields, missed alertsSchema changes without versioningVersioned parsers + tests + fallback raw viewParser CI results
Weak retention governanceNon-complianceNo policy/automationTiered lifecycle + legal hold + scheduled reportsRetention report
Key compromiseDecrypt/forge evidenceNo KMS controlsKMS/HSM, rotation, access logging, break-glassKey audit logs
Storage misconfig"Immutable" not truly immutableLock not enforced at storage levelStorage-level retention lock + periodic immutability verificationLock status proof

2.3 Core Design / Selection Logic

The decision tree below provides a structured path from requirements to solution design. Starting with the most fundamental question — whether court-admissible or strong compliance evidence is required — the tree branches through log source type, scale, latency requirements, and geographic distribution to arrive at a specific combination of collection, storage, integrity, and DR design choices. Each branch is annotated with the acceptance KPI that verifies the correct choice was made.

Design Decision Tree
Figure 2.1: Requirements-to-Solution Decision Tree — Structured path from compliance requirements through source type, scale, latency, and geography to specific design choices

The decision process follows eight sequential steps that must be completed in order. Skipping steps or making assumptions without evidence leads to the failure modes documented in Section 2.2. The steps are designed to be repeatable — they should be re-executed whenever the threat model, scale, or compliance requirements change significantly.

  1. Classify logs by criticality: critical/security/legal vs. operational/diagnostic
  2. Determine threat model: external attacker, insider admin, or compromised source
  3. Choose collection method per source type: agent, syslog, or API pull
  4. Size for peak EPS plus burst; define buffer duration to survive expected outages
  5. Choose storage tiers and immutability mechanism (WORM/Object Lock/SnapLock)
  6. Define integrity method: hash chain cadence and signature frequency
  7. Define RBAC/SoD matrix and dual control workflows for destructive actions
  8. Define acceptance tests and periodic audit schedule

2.4 Key Design Dimensions

Log security system design must be evaluated across seven dimensions that reflect both technical and organizational requirements. These dimensions are not independent — trade-offs between them (e.g., compression vs. query performance, or retention cost vs. compliance duration) must be explicitly documented and approved by the relevant stakeholders.

Performance & Experience

Ingest latency, query latency, analyst usability, and evidence export time. Security logs typically require lower latency than operational logs.

Stability & Reliability

RPO/RTO targets, collector failover behavior, storage durability guarantees, and replay correctness after outages.

Maintainability & Replaceability

Agent upgrade strategy, schema versioning for parsers, rolling upgrade procedures, and backward compatibility windows.

Compatibility & Extensibility

Support for open formats (CEF/LEEF/JSON), open APIs for new source onboarding, and parallel-run migration strategy.

Life-Cycle Cost (LCC)

Storage growth modeling, compute cost for indexing, staff time for operations, and cost of compliance audits.

Energy & Sustainability

Compression ratios, tiering to cold archive, right-sizing to avoid over-provisioning, and power efficiency of storage platforms.

Compliance & Certification

Auditable controls with evidence, immutable retention with lock status proof, periodic access reviews, and evidence export procedures.

← Chapter 1: System Components Chapter 3: Scenarios & Selection →