diff --git a/infra/README.md b/infra/README.md index ca759b0..cd4e711 100644 --- a/infra/README.md +++ b/infra/README.md @@ -6,79 +6,65 @@ Two EC2 instances during scanning: - **c5.2xlarge** (`everytab`) — compute: runs pipeline, stores icons on 1TB EBS - **i3.large** (`everytab-db`) — database: runs Postgres on 475GB local NVMe (100K+ IOPS) -Both provisioned by Terraform with `user_data` scripts that run on first boot: -- Compute: `ec2-userdata.sh` (Go, DuckDB, Unbound, swap) -- Database: `db-setup.sh` (NVMe format, Postgres install + config) +Both provisioned by Terraform with `user_data` scripts that auto-run on first boot: +- Compute: `ec2-userdata.sh` — installs Go, DuckDB, Unbound, swap; clones repo; builds binaries; applies DB schema +- Database: `db-setup.sh` — formats NVMe, installs Postgres, creates database + schema -## 1. Terraform +## Quick Start + +Everything runs from your local machine unless noted. ```bash +# 1. Create infrastructure cd infra -cp terraform.tfvars.example terraform.tfvars # fill in your values +cp terraform.tfvars.example terraform.tfvars # fill in your values (including repo_url) terraform init terraform apply -``` -This creates both instances. They auto-provision via user_data (~3 minutes). - -## 2. SSH Key - -```bash +# 2. Save SSH key terraform output -raw ssh_private_key > everytab-key && chmod 600 everytab-key -terraform output ssh_command # SSH to compute instance -terraform output ssh_command_db # SSH to database instance + +# 3. Wait ~3-5 minutes for both instances to auto-provision, then verify +ssh -i everytab-key ec2-user@$(terraform output -raw ec2_public_ip) \ + 'pg_isready -h $(grep DATABASE_URL ~/.bashrc | cut -d@ -f2 | cut -d: -f1)' ``` -## 3. Verify Database is Ready +If `repo_url` is set in tfvars, the compute instance automatically: +- Clones the repo +- Builds all Go binaries +- Waits for the DB to be ready +- Applies the schema -```bash -# From your local machine or the compute instance -pg_isready -h $(terraform output -raw db_private_ip) -``` +## Running the Pipeline -If not ready yet, SSH to the DB instance and check `cloud-init` logs: -```bash -tail -f /var/log/cloud-init-output.log -``` - -## 4. Clone Repo + Build on Compute Instance +SSH to the compute instance — everything is ready: ```bash ssh -i everytab-key ec2-user@$(terraform output -raw ec2_public_ip) -git clone ~/everytab -cd ~/everytab -go build -o ~/warc_parse ./pipeline/02_warc_parse/ -go build -o ~/icon_download ./pipeline/03_icon_download/ -go build -o ~/bundle_gen ./pipeline/05_bundle_gen/ +# DATABASE_URL is already in .bashrc, binaries already built +# Start the pipeline (see pipeline/README.md for full guide) +./pipeline/01_cc_index/query.sh --db-url "$DATABASE_URL" --limit 0 ``` -## 5. Connect to Database + Apply Schema +## Debugging (if auto-provision fails) +Check cloud-init logs on either instance: ```bash -# Get the connection string -export DATABASE_URL=$(terraform output -raw database_url) -echo "export DATABASE_URL='$DATABASE_URL'" >> ~/.bashrc +# Compute instance +ssh -i everytab-key ec2-user@$(terraform output -raw ec2_public_ip) \ + 'tail -30 /var/log/cloud-init-output.log' -# Test connectivity -psql $DATABASE_URL -c 'SELECT 1;' - -# Apply schema -psql $DATABASE_URL -f ~/everytab/pipeline/01_cc_index/schema.sql +# DB instance +ssh -i everytab-key ec2-user@$(terraform output -raw db_public_ip) \ + 'tail -30 /var/log/cloud-init-output.log' ``` -## 6. Run Pipeline - -See `pipeline/README.md` for the full stage-by-stage guide. - ## Pinning the EC2 AMI -The `data.aws_ami` lookup fetches the latest Amazon Linux 2023 AMI. If Amazon publishes a new one between applies, Terraform will want to replace your instances. - -To prevent this, pin the AMI after initial creation: +The `data.aws_ami` lookup fetches the latest Amazon Linux 2023 AMI. Pin it to prevent instance replacement on unrelated changes: ```bash -# Get the current AMI aws ec2 describe-instances --filters "Name=tag:Name,Values=everytab" \ --query "Reservations[0].Instances[0].ImageId" --output text @@ -86,27 +72,27 @@ aws ec2 describe-instances --filters "Name=tag:Name,Values=everytab" \ echo 'ec2_ami = "ami-XXXXXXXXXXXX"' >> terraform.tfvars ``` -Remove the `ec2_ami` line from tfvars when you want fresh instances with the latest AMI. +Remove the line when you want fresh instances with the latest AMI. -## Teardown (after backup) +## Teardown + +From the compute instance, back up before tearing down: ```bash -# Back up the database (run from compute instance) +# Back up database pg_dump $DATABASE_URL -Fc > ~/everytab_dump.pgfc # Back up icons to homelab rsync -avP ~/icons/ homelab:/backups/everytab/icons/ ``` -Switch to serving-only mode (destroys both EC2 instances): +From your local machine: ```bash +# Destroy scanning infrastructure (keeps CloudFront + site bucket) terraform apply -var="scanning=false" -``` -Full destroy (including the live site): - -```bash +# Or full destroy (including the live site) terraform destroy ```