History

Joe Lothan 7c5573c24d add filter to squash warning		2026-05-25 21:40:01 -04:00
..
db-setup.sh	automated ec2 setup and build	2026-05-25 18:29:37 -04:00
ec2-userdata.sh	automated ec2 setup and build	2026-05-25 18:29:37 -04:00
main.tf	add filter to squash warning	2026-05-25 21:40:01 -04:00
README.md	automated ec2 setup and build	2026-05-25 18:29:37 -04:00
terraform.tfvars.example	automated ec2 setup and build	2026-05-25 18:29:37 -04:00

README.md

Infrastructure Setup

Architecture

Two EC2 instances during scanning:

c5.2xlarge (everytab) — compute: runs pipeline, stores icons on 1TB EBS
i3.large (everytab-db) — database: runs Postgres on 475GB local NVMe (100K+ IOPS)

Both provisioned by Terraform with user_data scripts that run on first boot:

Compute: ec2-userdata.sh (Go, DuckDB, Unbound, swap)
Database: db-setup.sh (NVMe format, Postgres install + config)

1. Terraform

cd infra
cp terraform.tfvars.example terraform.tfvars  # fill in your values
terraform init
terraform apply

This creates both instances. They auto-provision via user_data (~3 minutes).

2. SSH Key

terraform output -raw ssh_private_key > everytab-key && chmod 600 everytab-key
terraform output ssh_command     # SSH to compute instance
terraform output ssh_command_db  # SSH to database instance

3. Verify Database is Ready

# From your local machine or the compute instance
pg_isready -h $(terraform output -raw db_private_ip)

If not ready yet, SSH to the DB instance and check cloud-init logs:

tail -f /var/log/cloud-init-output.log

4. Clone Repo + Build on Compute Instance

ssh -i everytab-key ec2-user@$(terraform output -raw ec2_public_ip)

git clone <your-repo-url> ~/everytab
cd ~/everytab
go build -o ~/warc_parse ./pipeline/02_warc_parse/
go build -o ~/icon_download ./pipeline/03_icon_download/
go build -o ~/bundle_gen ./pipeline/05_bundle_gen/

5. Connect to Database + Apply Schema

# Get the connection string
export DATABASE_URL=$(terraform output -raw database_url)
echo "export DATABASE_URL='$DATABASE_URL'" >> ~/.bashrc

# Test connectivity
psql $DATABASE_URL -c 'SELECT 1;'

# Apply schema
psql $DATABASE_URL -f ~/everytab/pipeline/01_cc_index/schema.sql

6. Run Pipeline

See pipeline/README.md for the full stage-by-stage guide.

Pinning the EC2 AMI

The data.aws_ami lookup fetches the latest Amazon Linux 2023 AMI. If Amazon publishes a new one between applies, Terraform will want to replace your instances.

To prevent this, pin the AMI after initial creation:

# Get the current AMI
aws ec2 describe-instances --filters "Name=tag:Name,Values=everytab" \
  --query "Reservations[0].Instances[0].ImageId" --output text

# Add to terraform.tfvars
echo 'ec2_ami = "ami-XXXXXXXXXXXX"' >> terraform.tfvars

Remove the ec2_ami line from tfvars when you want fresh instances with the latest AMI.

Teardown (after backup)

# Back up the database (run from compute instance)
pg_dump $DATABASE_URL -Fc > ~/everytab_dump.pgfc

# Back up icons to homelab
rsync -avP ~/icons/ homelab:/backups/everytab/icons/

Switch to serving-only mode (destroys both EC2 instances):

terraform apply -var="scanning=false"

Full destroy (including the live site):

terraform destroy

IMPORTANT: The i3's local NVMe is ephemeral — all data is lost on stop/terminate. Always pg_dump before teardown.