# Concurrency And Reliability

Reliability discipline for shared state, races, and uptime targets.

## Rules

1. **Shared mutable state needs a strategy.** Locks, atomic operations, immutability, or message passing — pick one explicitly. Hope is not a concurrency model.

2. **Race conditions reproduce intermittently.** "I can't reproduce it" doesn't mean it's not real. Look for read-modify-write without synchronization, check-then-act patterns, and shared caches without coordination.

3. **Retries with backoff, always capped.** Exponential backoff with jitter. A max attempt count. Uncapped retries turn a downstream outage into your outage.

4. **Define SLOs before you need them.** What's acceptable uptime? What's acceptable latency? An error budget tells you when to stop shipping features and fix reliability.

5. **Graceful degradation over total failure.** If the recommendation engine is down, show the catalog without recommendations. Partial service beats error page.

6. **Health checks reflect reality.** A `/health` endpoint that returns 200 while the database is unreachable is worse than no health check — it keeps sending traffic to a broken instance.

## What This Replaces

Unsynchronized shared state, infinite retry loops, undefined reliability targets, and health checks that lie about system readiness.