switched from rds to i5 ec2 for nvme disk read/write speeds

This commit is contained in:
Joe Lothan 2026-05-25 18:17:07 -04:00
parent c93d1736fe
commit bf8b932cdc
4 changed files with 233 additions and 48 deletions

View file

@ -7,11 +7,8 @@ Between stages, run the sanity checks to confirm data looks right before proceed
## Prerequisites
```bash
# Database URL in environment
export DATABASE_URL='postgres://everytab:PASS@RDS_ENDPOINT:5432/everytab'
# Schema created
psql $DATABASE_URL -f pipeline/01_cc_index/schema.sql
# Postgres on i3 instance (run infra/db-setup.sh on the i3 first)
export DATABASE_URL='postgres://everytab@<i3-private-ip>:5432/everytab'
# Go binaries built on EC2
cd ~/everytab
@ -39,10 +36,10 @@ Fetches WARC records from CC's S3, extracts titles, icons, and iframe headers.
## Stage 3: Icon Download
Downloads favicons from the live web, validates, downloads to disk.
Downloads favicons from the live web, validates, writes to local disk.
```bash
./icon_download --db "$DATABASE_URL" --log-file icon_download.log --icons-dir icons/ --log-errors-only
GOMEMLIMIT=12GiB ./icon_download --db "$DATABASE_URL" --log-file icon_download.log --icons-dir ~/icons --log-errors-only
```
## Stage 4: Best Icon Selection