113 lines
3 KiB
Markdown
113 lines
3 KiB
Markdown
# Infrastructure Setup
|
|
|
|
## Architecture
|
|
|
|
Two EC2 instances during scanning:
|
|
- **c5.2xlarge** (`everytab`) — compute: runs pipeline, stores icons on 1TB EBS
|
|
- **i3.large** (`everytab-db`) — database: runs Postgres on 475GB local NVMe (100K+ IOPS)
|
|
|
|
Both provisioned by Terraform with `user_data` scripts that run on first boot:
|
|
- Compute: `ec2-userdata.sh` (Go, DuckDB, Unbound, swap)
|
|
- Database: `db-setup.sh` (NVMe format, Postgres install + config)
|
|
|
|
## 1. Terraform
|
|
|
|
```bash
|
|
cd infra
|
|
cp terraform.tfvars.example terraform.tfvars # fill in your values
|
|
terraform init
|
|
terraform apply
|
|
```
|
|
|
|
This creates both instances. They auto-provision via user_data (~3 minutes).
|
|
|
|
## 2. SSH Key
|
|
|
|
```bash
|
|
terraform output -raw ssh_private_key > everytab-key && chmod 600 everytab-key
|
|
terraform output ssh_command # SSH to compute instance
|
|
terraform output ssh_command_db # SSH to database instance
|
|
```
|
|
|
|
## 3. Verify Database is Ready
|
|
|
|
```bash
|
|
# From your local machine or the compute instance
|
|
pg_isready -h $(terraform output -raw db_private_ip)
|
|
```
|
|
|
|
If not ready yet, SSH to the DB instance and check `cloud-init` logs:
|
|
```bash
|
|
tail -f /var/log/cloud-init-output.log
|
|
```
|
|
|
|
## 4. Clone Repo + Build on Compute Instance
|
|
|
|
```bash
|
|
ssh -i everytab-key ec2-user@$(terraform output -raw ec2_public_ip)
|
|
|
|
git clone <your-repo-url> ~/everytab
|
|
cd ~/everytab
|
|
go build -o ~/warc_parse ./pipeline/02_warc_parse/
|
|
go build -o ~/icon_download ./pipeline/03_icon_download/
|
|
go build -o ~/bundle_gen ./pipeline/05_bundle_gen/
|
|
```
|
|
|
|
## 5. Connect to Database + Apply Schema
|
|
|
|
```bash
|
|
# Get the connection string
|
|
export DATABASE_URL=$(terraform output -raw database_url)
|
|
echo "export DATABASE_URL='$DATABASE_URL'" >> ~/.bashrc
|
|
|
|
# Test connectivity
|
|
psql $DATABASE_URL -c 'SELECT 1;'
|
|
|
|
# Apply schema
|
|
psql $DATABASE_URL -f ~/everytab/pipeline/01_cc_index/schema.sql
|
|
```
|
|
|
|
## 6. Run Pipeline
|
|
|
|
See `pipeline/README.md` for the full stage-by-stage guide.
|
|
|
|
## Pinning the EC2 AMI
|
|
|
|
The `data.aws_ami` lookup fetches the latest Amazon Linux 2023 AMI. If Amazon publishes a new one between applies, Terraform will want to replace your instances.
|
|
|
|
To prevent this, pin the AMI after initial creation:
|
|
|
|
```bash
|
|
# Get the current AMI
|
|
aws ec2 describe-instances --filters "Name=tag:Name,Values=everytab" \
|
|
--query "Reservations[0].Instances[0].ImageId" --output text
|
|
|
|
# Add to terraform.tfvars
|
|
echo 'ec2_ami = "ami-XXXXXXXXXXXX"' >> terraform.tfvars
|
|
```
|
|
|
|
Remove the `ec2_ami` line from tfvars when you want fresh instances with the latest AMI.
|
|
|
|
## Teardown (after backup)
|
|
|
|
```bash
|
|
# Back up the database (run from compute instance)
|
|
pg_dump $DATABASE_URL -Fc > ~/everytab_dump.pgfc
|
|
|
|
# Back up icons to homelab
|
|
rsync -avP ~/icons/ homelab:/backups/everytab/icons/
|
|
```
|
|
|
|
Switch to serving-only mode (destroys both EC2 instances):
|
|
|
|
```bash
|
|
terraform apply -var="scanning=false"
|
|
```
|
|
|
|
Full destroy (including the live site):
|
|
|
|
```bash
|
|
terraform destroy
|
|
```
|
|
|
|
**IMPORTANT:** The i3's local NVMe is ephemeral — all data is lost on stop/terminate. Always pg_dump before teardown.
|