diff --git a/PLAN.md b/PLAN.md
index a342b88..3fdd40b 100644
--- a/PLAN.md
+++ b/PLAN.md
@@ -494,16 +494,31 @@ On completion, each program prints a summary line and writes its stats JSON (wit
 
 ---
 
-## Phase 7: 3M Scale Test
+## Phase 7: 3M Scale Test [COMPLETED]
 
-Validates disk-based icon storage at scale and gets real timing estimates.
+Validated disk-based icon storage, performance tuning, and full pipeline at 3M scale.
 
-- Tear down current infra, bring up fresh (1TB EBS)
-- Run full pipeline with `--limit 3000000`
-- Icon downloader writes to local disk (`--icons-dir icons`) instead of S3
-- Downloads ALL icons (no size filter) — full archive for posterity
-- Bundle gen reads from local disk (`--icons-dir icons`)
-- **Watch for:** disk I/O, DuckDB LIMIT behavior, any OOM issues
+**Final 3M pipeline results:**
+
+| Stage | Duration | Result |
+|-------|----------|--------|
+| CC-Index query | ~13min | 3M hosts |
+| WARC parsing | ~3hrs (concurrency 50) | 2.8M titles, 6M icons |
+| Icon download | 4h21m (408/s) | 4.5M completed, 53GB, 70% success |
+| Best icon selection | instant | 2M hosts with icons |
+| Bundle generation | 1h23m (540 hosts/sec) | 22,429 bundles, 4.7GB |
+| Frontend deploy | seconds | Live at everytab.site |
+| **Total** | **~9 hours** | |
+
+**Key changes during this phase:**
+- Icons stored on local disk (sharded `ab/cd/ef/hash`), not S3 — saves ~$175 in PUT costs
+- Removed icon size filter — downloads ALL icons for archival, filters at bundle gen time
+- Dropped `ORDER BY md5(id::text)` from icon claim query — was causing 30-second burst/stall cycles at 3M scale
+- Icon download batch size 200 → 5000, channel buffer = batch size
+- Bundle gen rewritten to stream: paginated DB reads, incremental bundle writes (fixed OOM at 3M)
+- `random_order` column on hosts table for shuffled bundles
+- EBS volume sized at 1TB for full icon archive
+- Added COSTS.md with monthly cost breakdown (~$42/month ongoing)
 
 ### Icon selection strategy (TODO: decide before Phase 8)
 
@@ -518,66 +533,98 @@ This works but questions remain:
 - Should we have different strategies for different icon sources? (e.g., always use link_rel PNG 32x32 if available, fall back to favicon.ico)
 - At 30M scale, how much do large icons bloat total bundle size? Need data from the 3M run to decide.
 
-## Phase 7.2: Performance Fixes + 3M Re-test
+## Phase 7.2: Code Review + Performance Fixes [COMPLETED]
 
-Code review and performance improvements before the full 30M run. Make changes, review, then re-run the full 3M pipeline to validate.
+Adversarial code review followed by performance improvements, validated with a 300K run.
 
-### Bundle gen redesign: streaming pipeline
+### Code review findings and fixes
+- **float32 pagination bug** — `random_order REAL` on hosts table caused ~1.5% data loss at 30M scale due to float32 collisions in keyset pagination. Fixed: `DOUBLE PRECISION` + `float64` in Go.
+- **Protocol missing from bundles** — bundle JSON had `host` but no protocol. HTTP-only sites (23%) loaded as `https://` and broke. Fixed: replaced `host` field with `url` (full URL built on Go side).
+- **Non-atomic bundle deployment** — bundle gen deleted all S3 bundles before writing new ones. Crash mid-write = broken live site. Fixed: overwrite in-place, deploy.sh cleans up stale bundles after cache invalidation.
+- **ARCHITECTURE.md stale** — still described S3 icon storage, old claim query, old bundle format. Updated throughout to match current code.
+- **Dead go.mod dependencies** — progressbar and transitive deps removed. Direct vs indirect annotations fixed via `go mod tidy`.
+- **Shadowed builtins** — custom `min()`/`max()` functions removed (Go 1.21+ builtins).
+- **BMP decoder missing** — standalone BMP favicons passed download but failed in bundle gen. Added `golang.org/x/image/bmp` import.
+- **Frontend memory leak** — `loadedIcons` array grew unboundedly. Capped at 100 entries.
+- **Iframe stats inflated** — error hosts counted as "iframe blocked" (zero value of bool). Fixed to only count successful parses.
+- **CSP check incomplete** — only checked first `Content-Security-Policy` header. Fixed to check all headers via `headers.Values()`.
+- **DNS error classification** — direct type assertion `err.(*net.DNSError)` never matched wrapped errors. Fixed with `errors.As()`.
+- **Icon download host hammering** — adjacent same-host icons in batches caused simultaneous requests. Fixed: Fisher-Yates shuffle of each batch before feeding to workers.
+- **Max icons per host** — capped at 50 link_rel icons per host in HTML parser to prevent adversarial pages from bloating the DB.
+- **`downloaded_at` column** — added to icons table for data freshness tracking.
 
-Current architecture (batch-convert-then-write):
+### Pipeline performance redesign
+
+**WARC parser** — three-stage pipeline:
 ```
-[fetch 6000 hosts] → [convert all 6000 icons] → [write bundles] → [fetch next 6000]
-                      ^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^^^^^^^^^^
-                      all cores busy              cores idle
+[DB fetcher] → hostCh → [500 workers] → resultCh → [DB writer with pgx.Batch]
 ```
+- Channel-based worker pool (500 goroutines, up from 100 semaphore)
+- S3 retry with 6 attempts (AWS SDK `retry.AddWithMaxAttempts`)
+- Batched DB writes via `pgx.Batch` (100 results = ~400 queries per round-trip)
+- Result: 566 hosts/sec (1.6x improvement over 352/sec)
 
-Target architecture (fully pipelined):
+**Bundle gen** — four-stage pipeline:
 ```
-[DB fetcher] → channel → [N converter workers] → channel → [bundle writer]
-   always                   always busy              writes as soon as
-   prefetching                                       120 entries ready
+[DB fetcher] → hostCh → [20 converters] → entryCh → [assembler] → uploadCh → [10 uploaders]
 ```
+- Converters default 20 (CPU-bound, ~5x core count on c5.xlarge)
+- Separate upload workers for S3 PUT parallelism
+- Result: 2,377 hosts/sec (4.4x improvement over 540/sec)
 
-Changes:
-- **DB fetcher goroutine** — continuously fetches pages, feeds host channel. Same pattern as icon downloader.
-- **Converter workers** (200 goroutines) — read from host channel, read icon from disk, decode, re-encode PNG, base64, send BundleEntry to output channel.
-- **Bundle writer goroutine** — collects entries from output channel into a buffer. Every 120 entries, serialize JSON and upload to S3. Runs concurrently with conversion.
+**Icon download** — in-memory batch shuffle added, concurrency bumped to 1000.
 
-This eliminates all synchronization points — DB, disk, CPU, and S3 are all utilized simultaneously. No fits and starts.
+**CC-Index query** — downloads parquet files locally first (`aws s3 sync`), then queries with DuckDB. Eliminates S3 503 rate-limit failures.
 
-### Other fixes to include
-- **WARC parser: add S3 retry with backoff** — currently concurrency 50 to avoid 503s. With retry, can go back to 100+ and handle transient 503s gracefully.
-- **Icon download: reduce timeout to 3-5s** — most legitimate servers respond in <1s. Dead hosts block a worker for 10s currently.
-- **Icon download: confirm all icons downloaded** — size filter already removed from claim query, but verify at 3M that all link_rel icons (including large declared sizes) are being downloaded.
-- **Icon selection strategy** — decide on final criteria (see Phase 7 notes above) and validate with 3M data.
+**Best icon selection** — new priority: target 32x32 for Retina display. Pick smallest icon ≥32px, fall back to largest <32px. No more "standard sizes" tiers.
 
-### Code review
-Before running the 3M re-test, do a full read-through of all Go code:
-- Check all error handling paths — are errors logged, counted, and surfaced?
-- Check concurrency patterns — any race conditions, deadlocks, goroutine leaks?
-- Check resource cleanup — are DB connections, file handles, HTTP responses closed?
-- Check the new streaming bundle gen — does it handle edge cases (empty pages, partial final bundle, S3 upload failures)?
-- Review all CLI flag defaults — are they tuned for the 30M run?
+### Stats improvements
+- WARC parser: added `no_title` counter, fixed `icons_found` to include favicon.ico
+- Best icon selection: now writes `stats/04_best_icon.json`
+- Bundle gen: added `bundled_with_icon` / `bundled_no_icon` counters (distinguishes "never had icon" from "convert error")
 
-### Validation
-- Run full 3M pipeline from scratch with all fixes
-- Compare timings against Phase 7 run
-- Confirm bundle gen saturates CPU and doesn't stall
-- Confirm icon download rate improves with shorter timeout
-- Confirm WARC parsing can run at higher concurrency with retry
-- Confirm all icon types (including large) are downloaded
-- Review live site at everytab.site with 3M data
+### 300K validation run results
+
+| Stage | Duration | Rate |
+|-------|----------|------|
+| CC-Index query | 83s | — |
+| WARC parsing | 8m50s | 566 hosts/sec |
+| Icon download | 34m45s | 439 icons/sec |
+| Best icon selection | instant | — |
+| Bundle generation | 1m59s | 2,377 hosts/sec |
+| Frontend deploy | seconds | — |
+| **Total** | **~47 min** | |
+
+**Loss funnel:**
+```
+300,000 hosts from CC-Index
+ → 282,854 with titles (94.3%)
+ → 213,656 bundled with icon (75.5% of titled)
+ →  69,198 bundled without icon (24.5%)
+     → 68,793 never had an icon
+     →    405 icon convert errors
+ → 2,358 bundles, 603MB total
+```
 
 ## Phase 8: 30M Full Run (Single Machine)
 
 Full internet scan on one c5.xlarge.
 
 - `--limit 0` for CC-Index query
-- **Expected:** ~3-4 days total (WARC parsing ~25hrs, icon download ~50hrs)
+- **Expected (extrapolated from 300K run):**
+  - CC-Index: ~10min (download) + ~15min (query) — possibly much longer at 30M due to swap thrashing
+  - WARC parsing: ~14-15hrs (566 hosts/sec)
+  - Icon download: ~50-60hrs (439 icons/sec at 1000 concurrency, the long pole — 2500 concurrency may improve this)
+  - Bundle gen: ~3.5hrs (2,377 hosts/sec)
+  - **Total: ~3 days**
 - Run in tmux, monitor with `psql` queries from another session
-- **Expected disk:** ~500GB-1TB for all icons (full archive)
-- **Cost:** ~$50 (EC2 + RDS + 1TB EBS for 4 days)
+- **Expected disk:** ~650GB for all icons (6.5GB per 300K × 100)
+- **Cost:** ~$50 (EC2 + RDS + 1TB EBS for 3-4 days)
 - After completion: deploy frontend, verify live site, backup icons + DB to homelab via rsync
+- **Stuck icon recovery** (if icon download crashes): `UPDATE icons SET scan_state = 'unscanned' WHERE scan_state = 'in_progress';`
+
+### Consider c5.2xlarge for future runs
+The CC-Index DuckDB query is memory-bound — at 30M the GROUP BY hash table exceeds 8GB and swap thrashing dominates query time. c5.2xlarge (16GB, 8 vCPUs) would eliminate swap pressure entirely and double CPU cores for bundle gen. Cost difference: $0.17/hr → $0.34/hr, but if it halves the CC-Index query time and speeds up bundle gen (CPU-bound), the total EC2 hours may decrease enough to break even. Also benefits WARC parsing (more headroom for 500+ goroutines) and icon download (more memory for 5000 concurrent connections). Worth testing on a future run.
 
 ## Phase 9: Frontend Polish
 
@@ -621,17 +668,18 @@ Monthly pipeline triggered by new Common Crawl release.
 ## Future Improvements (Non-Blocking)
 
 ### Pipeline
-- **WARC parser: retry on fetch errors** — add 1 retry with backoff for transient S3 errors
-- **WARC parser: batch DB inserts** — pgx batch or CopyFrom for better write throughput
+- **CC-Index query: streaming dedup** — current GROUP BY builds a ~30M-row hash table in memory, causing severe swap thrashing on c5.xlarge (8GB). Options: (1) use `INSERT ... ON CONFLICT (hostname) DO UPDATE` to stream rows into Postgres and let the UNIQUE constraint dedup, eliminating the hash table entirely; (2) process parquet files in smaller batches, dedup per-batch in DuckDB, final dedup in Postgres; (3) just use c5.2xlarge (16GB) to fit the hash table in RAM. Current workaround: `SET temp_directory` to let DuckDB spill to EBS instead of OS swap.
 - **Encoding: remaining garbled titles** — more aggressive charset detection heuristics
 - **Icon download: retry transient failures** — single retry for DNS/timeout errors
-- **Bundle gen: SVG rasterization** — recover ~1,077 hosts with SVG-only favicons
+- **Bundle gen: SVG rasterization** — shell out to `rsvg-convert` for SVG-only hosts (~3.5% of icons)
 - **Bundle gen: bilinear downscaling** — better quality than nearest-neighbor for >128px icons
 
 ### Frontend
-- **Cross-browser tab styling** — match real browser tabs more closely
 - **Mobile layout** — responsive tab sizing, touch-friendly interaction
-- **Stats page** — pipeline stats rendered on the site
+- **Stats page / Sankey diagram** — pipeline loss funnel rendered on the site
 
-### Icon Download Ordering
-- **Verify icon row ordering at 30M scale** — WARC parser inserts ~2-5 icons per host in sequence (favicon_ico + link_rel entries). Without ORDER BY, Postgres returns rows roughly in insertion order, so icons from the same host are adjacent. At 100K-3M this didn't cause problems (batch size 5000 means each batch has icons from ~2,000+ different hosts). At 30M, confirm with `iftop` that we're not hammering individual hosts. If needed, add `random_order REAL DEFAULT random()` column to icons table and use it in the claim query — but don't index it (60M+ writes).
+### Additional Metadata
+- **`http_status INT` on icons** — structured HTTP status code (currently only stored as error text). Enables analysis like "404 (site moved) vs 403 (bot blocked) vs 500 (server error)".
+- **`response_time_ms INT` on icons** — server response latency. Useful for tuning timeouts, identifying slow hosts, health signal.
+- **`parsed_at TIMESTAMPTZ` on hosts** — when the WARC was parsed. Currently only a `parsed` boolean.
+- **`created_at TIMESTAMPTZ DEFAULT now()` on hosts** — when the host entered the pipeline.