switched from rds to i5 ec2 for nvme disk read/write speeds
This commit is contained in:
parent
c93d1736fe
commit
bf8b932cdc
4 changed files with 233 additions and 48 deletions
|
|
@ -7,11 +7,8 @@ Between stages, run the sanity checks to confirm data looks right before proceed
|
|||
## Prerequisites
|
||||
|
||||
```bash
|
||||
# Database URL in environment
|
||||
export DATABASE_URL='postgres://everytab:PASS@RDS_ENDPOINT:5432/everytab'
|
||||
|
||||
# Schema created
|
||||
psql $DATABASE_URL -f pipeline/01_cc_index/schema.sql
|
||||
# Postgres on i3 instance (run infra/db-setup.sh on the i3 first)
|
||||
export DATABASE_URL='postgres://everytab@<i3-private-ip>:5432/everytab'
|
||||
|
||||
# Go binaries built on EC2
|
||||
cd ~/everytab
|
||||
|
|
@ -39,10 +36,10 @@ Fetches WARC records from CC's S3, extracts titles, icons, and iframe headers.
|
|||
|
||||
## Stage 3: Icon Download
|
||||
|
||||
Downloads favicons from the live web, validates, downloads to disk.
|
||||
Downloads favicons from the live web, validates, writes to local disk.
|
||||
|
||||
```bash
|
||||
./icon_download --db "$DATABASE_URL" --log-file icon_download.log --icons-dir icons/ --log-errors-only
|
||||
GOMEMLIMIT=12GiB ./icon_download --db "$DATABASE_URL" --log-file icon_download.log --icons-dir ~/icons --log-errors-only
|
||||
```
|
||||
|
||||
## Stage 4: Best Icon Selection
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue