Skip to main content

ADR-0004 — Write-Ahead Log Power-Failure Durability

Status: ACCEPTED Date: 2026-02-28 Authors: Matthew Lynch matthew.lynch@helix-os.ai FR-Refs: FR-AUD-007, FR-AUD-008, FR-AUD-009 Supersedes: None Superseded by: None


Context

HELIX OS maintains an append-only cryptographic audit ledger (see ADR-0002). For this ledger to serve as a GMP-compliant tamper-evident record, every committed entry must survive a sudden power failure without producing a partially-written, undetectable entry.

The standard POSIX write() path goes through the kernel page cache. A process may receive a successful write() return while the data exists only in volatile DRAM. If power is lost before the kernel flushes the page cache to stable storage, the entry is silently lost — violating 21 CFR Part 11 §11.10(e) which requires complete and accurate audit trails.

A Write-Ahead Log (WAL) with explicit durability semantics is required to close this gap. The WAL must guarantee:

  1. Atomicity — An entry is either fully written or not present at all. No partial entries are silently accepted.
  2. Durability — Once append() returns, the entry survives any subsequent power failure.
  3. Detectability — If power is lost during a write, the incomplete entry is detectable on recovery and can be safely discarded without compromising the integrity of preceding entries.

Decision

The HELIX OS audit ledger shall use a Write-Ahead Log (helix-ledger/src/wal.rs) with the following durability guarantees:

1. O_DIRECT — Bypass the Page Cache

All WAL writes use O_DIRECT. This flag instructs the kernel to bypass the page cache entirely. Data passes directly from the application's aligned buffer to the storage device's controller. This eliminates the window where committed data exists only in volatile DRAM.

Constraint: O_DIRECT requires all I/O buffers and file offsets to be aligned to the filesystem's logical block size (512 bytes). The WAL pads all writes to this boundary.

2. O_SYNC — Synchronous Durability

All WAL writes use O_SYNC. This flag guarantees that write() does not return until the data and metadata have been transferred to the storage device and the device has acknowledged persistence.

Combined with O_DIRECT, this closes all kernel-level durability gaps:

  • No page cache buffering (O_DIRECT)
  • No deferred writeback (O_SYNC)
  • No metadata-only sync (O_SYNC covers both data and metadata)

3. CRC-32C Entry Framing

Each WAL entry is written with a frame header containing:

  • entry_len (u32): Byte length of the canonical entry payload
  • sequence_number (u64): Monotonic sequence from the ledger entry
  • entry_crc32c (u32): CRC-32C (Castagnoli polynomial) of the payload

This framing allows precise recovery after a power failure:

  • If the frame header is incomplete (torn write), the entry is discarded.
  • If the frame header is complete but the CRC does not match the payload, the entry is discarded.
  • If both the header and CRC match, the entry is accepted.

4. CRC-32C Segment Headers

Each WAL segment file begins with a fixed header containing the segment ID, a link to the previous segment (Blake3 hash), a TAI-64N creation timestamp, and a CRC-32C integrity check. On recovery, the runtime verifies the header before replaying any entries.

5. Formal Durability Proof

The following invariant holds at all times:

Invariant: Let $E_k$ be the $k$-th entry appended to the WAL. If append(E_k) returns Ok, then $E_k$ resides on stable storage and will be recovered by replay_all() after any subsequent power failure.

Proof sketch:

  1. append() serializes $E_k$ to canonical bytes, computes CRC-32C(payload), constructs the frame [header || payload], and pads to O_DIRECT alignment.
  2. write() is issued on the file descriptor opened with O_DIRECT | O_SYNC.
  3. By the POSIX O_SYNC specification, write() does not return until the data has been transferred to stable storage.
  4. Therefore, when append() returns Ok, the frame bytes are on stable storage.
  5. On recovery, replay_all() reads each frame, verifies CRC-32C(payload) == frame.entry_crc32c, and accepts only matching frames.
  6. Since the complete frame was durably written before append() returned, the CRC check will pass. $\square$

Crash during write: If power is lost after write() begins but before it completes:

  • The O_DIRECT + O_SYNC contract guarantees that write() has not returned Ok to the caller.
  • On recovery, the partial frame will have either an incomplete header (detected as a torn write by byte count) or a complete header with mismatched CRC (detected by CRC verification). In both cases, the entry is discarded.
  • All entries before the torn frame remain intact because each was individually durably written before its append() returned.

6. Platform Fallback

On platforms where O_DIRECT is not available (macOS, some container environments):

  • The writer falls back to O_SYNC with explicit fsync() after each write.
  • This provides durability but not page-cache bypass. Acceptable for development and testing environments, but production GMP deployments must use Linux with O_DIRECT support.

Rationale

  • O_DIRECT + O_SYNC is the strongest durability guarantee available on Linux. It eliminates both page-cache buffering and deferred writeback. No userspace application can achieve stronger durability without specialized hardware (battery-backed NVRAM).
  • CRC-32C (Castagnoli) is the industry standard for storage integrity. It is used by ext4, btrfs, iSCSI, and NVMe. The Castagnoli polynomial has superior error detection properties compared to CRC-32 (IEEE 802.3) for burst errors common in storage write failures.
  • Per-entry framing allows recovery to precisely identify the boundary between valid and invalid data. Without framing, a torn write could corrupt the interpretation of subsequent entries.
  • Alignment padding is required by the O_DIRECT contract. While it adds space overhead (up to 511 bytes per entry), the typical audit event rate in GMP environments (< 500 events/min) makes this negligible.

Alternatives Considered

AlternativeReason for Rejection
fsync() after each write (without O_DIRECT)Data still passes through page cache. A kernel crash between write() and fsync() can lose the entry. O_DIRECT eliminates this window.
mmap() + msync()Complex, error-prone, and does not provide atomic-write guarantees. Torn pages are possible.
Database WAL (SQLite, RocksDB)Adds a large dependency, introduces complexity, and obscures the durability contract. For a GMP audit trail, the durability mechanism must be auditable and minimal.
Journaling filesystem relianceFilesystem journals protect metadata consistency, not application-level data integrity. An application-level WAL is still required.
Battery-backed write cacheHardware-dependent. Not all deployment environments have battery-backed controllers. The software WAL must be correct independent of hardware.

Consequences

Positive

  • Every committed entry survives any power failure — no silent data loss.
  • Torn writes are always detected and safely discarded on recovery.
  • The durability mechanism is transparent, auditable, and minimal (~600 lines of Rust).
  • CRC-32C provides hardware-accelerated integrity checking on x86 (SSE 4.2) and ARM (CRC extension).
  • The WAL is the foundation for WAL rotation (v2.3.0), which depends on reliable per-entry durability.

Negative

  • O_DIRECT alignment padding increases disk usage by up to 511 bytes per entry. At typical GMP event rates, this is negligible.
  • O_DIRECT bypasses the page cache, so sequential reads during recovery do not benefit from readahead. Mitigation: recovery is an infrequent operation; sequential I/O is inherently fast.
  • O_DIRECT is Linux-specific. macOS and other platforms use a degraded fallback. Mitigation: production GMP deployments are exclusively Linux.

Risks

  • Filesystem compatibility: Some filesystems (tmpfs, certain network filesystems) do not support O_DIRECT. Mitigation: the use_direct_io config flag allows fallback; production validation must confirm O_DIRECT support on the target filesystem.
  • NVMe write atomicity: NVMe spec guarantees atomicity for writes up to the device's AWUN (Atomic Write Unit Normal). If a single aligned write exceeds AWUN, the device may exhibit torn writes at the block level. Mitigation: entry sizes in GMP workloads are well within typical AWUN values (4 KiB–16 KiB). A future version may enforce a per-entry size limit matching the device's AWUN.

Compliance Notes

  • FR-AUD-007: WAL with O_DIRECT/O_SYNC guarantees power-failure durability for every audit entry. This ADR defines the mechanism.
  • FR-AUD-008: CRC-32C segment headers provide integrity verification for WAL segment files.
  • FR-AUD-009: Per-entry CRC-32C framing detects torn writes and prevents partial entries from being accepted as valid.
  • 21 CFR Part 11 §11.10(e): Requires that audit trails are complete and accurate. The WAL guarantees that committed entries are never silently lost.
  • EU GMP Annex 11 §7.1: Requires that data is stored in a manner that ensures integrity. O_DIRECT + O_SYNC + CRC-32C provides a formally provable integrity guarantee.
  • This ADR is permanent per HELIX-VAL-STR-001. Superseding it requires a full re-validation cycle.

Revision History

DateAuthorChange
2026-02-28Matthew LynchInitial accepted decision