added infra setup with terraform

2026-05-17 16:07:50 -04:00 · 2026-05-17 16:07:50 -04:00 · fcf203e1d8
commit fcf203e1d8
parent 64ae58494b
8 changed files with 556 additions and 74 deletions
--- a/PLAN.md
+++ b/PLAN.md
@ -6,95 +6,55 @@ Each step has a clear deliverable and validation criteria. Steps are sequential

 ---

-## Phase 0: Project Setup & AWS Infrastructure
+## Phase 0: Project Setup & AWS Infrastructure [COMPLETED]

-### Step 0.1: Repository Structure
-
-Create the project layout:
+### Step 0.1: Repository Structure [COMPLETED]

 ```
 everytab/
 ├── design.md
 ├── ARCHITECTURE.md
 ├── PLAN.md
-├── infra/               # AWS CLI scripts for setup/teardown
-│   ├── setup.sh         # Create RDS, S3 buckets, security groups
-│   ├── teardown.sh      # Delete non-permanent resources
-│   └── ec2-userdata.sh  # EC2 bootstrap (install Go, DuckDB, Unbound)
+├── infra/
+│   ├── main.tf              # Terraform: all AWS resources
+│   ├── terraform.tfvars.example
+│   ├── ec2-userdata.sh      # EC2 bootstrap (Go, DuckDB, Unbound)
+│   └── README.md            # Setup steps
 ├── pipeline/
-│   ├── 01_cc_index/     # DuckDB query scripts
-│   ├── 02_warc_parse/   # Go program
-│   ├── 03_icon_download/# Go program
-│   ├── 04_best_icon/    # SQL script
-│   ├── 05_bundle_gen/   # Go program
-│   └── 06_frontend/     # Build script, templates
+│   ├── 01_cc_index/
+│   │   └── schema.sql      # Postgres table definitions
+│   ├── 02_warc_parse/
+│   ├── 03_icon_download/
+│   ├── 04_best_icon/
+│   ├── 05_bundle_gen/
+│   └── 06_frontend/
 ├── frontend/
-│   ├── index.html
-│   └── site.js
-├── stats/               # Stats output from each stage (gitignored)
-└── go.mod               # Shared Go module for pipeline programs
+├── stats/                   # gitignored
+└── go.mod
 ```

-**Done when:** Repo structure exists, `go.mod` initialized, `.gitignore` covers stats/ and any local config.
+### Step 0.2: AWS Infrastructure (Terraform) [COMPLETED]

-### Step 0.2: AWS Infrastructure (Manual CLI)
+Infrastructure managed via `infra/main.tf`. Single file, uses `var.scanning` bool to switch phases:
+- `terraform apply` — creates all scanning resources (EC2, RDS, S3 icons, S3 site, IAM, security groups)
+- `terraform apply -var="scanning=false"` — destroys scanning resources, keeps site bucket
+- `terraform destroy` — removes everything

-Create resources using AWS CLI commands in `infra/setup.sh`:
+Resources created:
+- S3 `everytab-icons` (private), S3 `everytab-site` (for CloudFront later)
+- RDS Postgres 16, db.t3.medium, 20GB gp3
+- EC2 c5.xlarge, Amazon Linux 2023, 50GB gp3
+- Security groups (SSH from home IP, RDS from EC2 only)
+- IAM role + instance profile (S3 access only)
+- SSH key (Terraform-managed ed25519)

-1. **S3 buckets:**
-   - `everytab-icons` (private, no public access)
-   - `everytab-site` (private, accessed via CloudFront OAC)
+### Step 0.3: EC2 Environment Setup [COMPLETED]

-2. **RDS Postgres:**
-   - `db.t3.medium`, 20GB storage (expandable), Postgres 16
-   - In a VPC, security group allows inbound 5432 from EC2 security group
-   - No public access (EC2 connects within VPC)
-   - No multi-AZ (dev, not production)
-   - Set a strong password, store in a local `.env` (gitignored)
-
-3. **EC2 instance:**
-   - `c5.xlarge` (4 vCPU, 8GB RAM) — enough for Go concurrency + Unbound cache
-   - Amazon Linux 2023 or Ubuntu 24.04
-   - Security group: allow SSH (from your IP), allow outbound all
-   - Same VPC/subnet as RDS
-   - Key pair for SSH access
-
-4. **CloudFront distribution:**
-   - Origin: `everytab-site` S3 bucket (OAC)
-   - Default cache behavior: cache everything, Brotli+Gzip compression
-   - Can set up now or defer to Phase 2
-
-5. **IAM role for EC2:**
-   - S3 read/write to both buckets
-   - Attach as instance profile
-
-**Validation:** SSH into EC2, confirm `psql` can connect to RDS, confirm `aws s3 ls` shows both buckets.
-
-**Done when:** All resources exist, EC2 can reach RDS and S3.
-
-### Step 0.3: EC2 Environment Setup
-
-Bootstrap script (`infra/ec2-userdata.sh` or run manually):
-
-1. Install Go (latest stable, 1.22+)
-2. Install DuckDB CLI
-3. Install Unbound, configure as recursive resolver:
-   - `/etc/unbound/unbound.conf`: recursive mode, no forwarding, listen on 127.0.0.1
-   - High cache: `msg-cache-size: 512m`, `rrset-cache-size: 1g`
-   - `cache-min-ttl: 3600`
-   - `prefetch: yes`
-   - `num-threads: 4`
-4. Set `/etc/resolv.conf` → `nameserver 127.0.0.1`
-5. Install `psql` client, `pg_dump`
-6. Confirm DuckDB httpfs extension works: `INSTALL httpfs; LOAD httpfs;`
-
-**Validation:**
- `go version` works
- `duckdb -c "INSTALL httpfs; LOAD httpfs; SELECT 1;"` works
- `dig example.com @127.0.0.1` resolves (Unbound working)
- `psql $DATABASE_URL -c "SELECT 1;"` connects to RDS
-
-**Done when:** EC2 is a working development environment for all pipeline stages.
+Bootstrap via `infra/ec2-userdata.sh`:
+- Go 1.22+, DuckDB (httpfs + postgres extensions), Unbound (recursive resolver), psql, tmux
+- Unbound configured as system resolver (systemd-resolved disabled)
+- DATABASE_URL in .bashrc
+- Schema applied: hosts + icons tables with indexes

 ---

@ -715,3 +675,22 @@ On completion, each program prints a summary line and writes its stats JSON (wit
 - **Postgres connection limits:** RDS db.t3.medium has max_connections ≈ 80. With 1000 goroutines, we need connection pooling (pgx pool handles this). Set pool max to ~40 connections.
 - **S3 eventual consistency:** After uploading an icon, a HEAD request might not find it immediately. For dedup checks, handle "not found" gracefully (just upload again — idempotent since key is content hash).
 - **CloudFront caching:** After deploying new bundles, invalidate `/*` or set short TTL during development. For production, use long TTLs (bundles are immutable between crawls).
+
+---
+
+## Progress Log
+
+### Phase 0 — Completed 2026-05-17
+
+**Changes from original plan:**
+- Replaced shell scripts (`setup.sh`, `teardown.sh`) with Terraform (`infra/main.tf`). Single file, `var.scanning` bool switches between scanning and serving phases.
+- SSH key is Terraform-managed (no passphrase, stored in state) rather than manually generated.
+- CloudFront distribution deferred — not created in Phase 0, will add to Terraform when frontend is ready.
+- Added `infra/README.md` with terse setup steps for future replication.
+
+**Lessons learned:**
+- Shell scripts with `2>/dev/null || echo "already exists"` swallow real errors. Terraform's declarative model avoids this entirely — errors are always surfaced.
+- RDS requires a DB subnet group (2+ subnets in different AZs). The original shell script didn't create one, causing a silent failure. Terraform handles this dependency automatically.
+- Amazon Linux 2023 uses `systemd-resolved` which manages `/etc/resolv.conf`. Must disable it before pointing resolv.conf at Unbound. `chattr +i` doesn't work on the symlink.
+- AWS EC2 key pairs created via API don't support passphrases. Use `tls_private_key` in Terraform or generate locally with `ssh-keygen` + import.
+- When an AWS key pair name already exists from a previous run, Terraform may not regenerate it. Use `-replace` to force recreation of the key + instance together.