deploy frontend from the ec2 at the end of the pipeline
This commit is contained in:
parent
8d62832c1d
commit
8e3907505f
4 changed files with 60 additions and 10 deletions
|
|
@ -11,10 +11,9 @@ Between stages, run the sanity checks to confirm data looks right before proceed
|
|||
export DATABASE_URL='postgres://everytab@<i3-private-ip>:5432/everytab'
|
||||
|
||||
# Go binaries built on EC2
|
||||
cd ~/everytab
|
||||
go build -o ~/warc_parse ./pipeline/02_warc_parse/
|
||||
go build -o ~/icon_download ./pipeline/03_icon_download/
|
||||
go build -o ~/bundle_gen ./pipeline/05_bundle_gen/
|
||||
go build -o ~/warc_parse ./everytab/pipeline/02_warc_parse/
|
||||
go build -o ~/icon_download ./everytab/pipeline/03_icon_download/
|
||||
go build -o ~/bundle_gen ./everytab/pipeline/05_bundle_gen/
|
||||
```
|
||||
|
||||
## Stage 1: CC-Index Query
|
||||
|
|
@ -22,7 +21,7 @@ go build -o ~/bundle_gen ./pipeline/05_bundle_gen/
|
|||
Populates the `hosts` table from Common Crawl's columnar index.
|
||||
|
||||
```bash
|
||||
./pipeline/01_cc_index/query.sh --db-url "$DATABASE_URL" --limit 100000
|
||||
./everytab/pipeline/01_cc_index/query.sh --db-url "$DATABASE_URL" --limit 100000
|
||||
# Full run: --limit 0
|
||||
```
|
||||
|
||||
|
|
@ -47,7 +46,7 @@ GOMEMLIMIT=12GiB ./icon_download --db "$DATABASE_URL" --log-file icon_download.l
|
|||
Picks the best icon per host for display.
|
||||
|
||||
```bash
|
||||
psql $DATABASE_URL -f pipeline/04_best_icon/select.sql
|
||||
psql $DATABASE_URL -f ./everytab/pipeline/04_best_icon/select.sql
|
||||
```
|
||||
|
||||
## Stage 5: Bundle Generation
|
||||
|
|
@ -62,12 +61,20 @@ Note the `TOTAL_BUNDLES` number from the summary — this gets baked into the fr
|
|||
|
||||
## Stage 6: Frontend Deploy
|
||||
|
||||
From your local machine:
|
||||
From EC2, after bundle gen completes:
|
||||
|
||||
```bash
|
||||
./pipeline/06_frontend/deploy.sh --total-bundles <NUMBER>
|
||||
TOTAL_BUNDLES=$(jq -r '.bundles_created' stats/05_bundle_gen.json)
|
||||
./everytab/pipeline/06_frontend/deploy.sh --total-bundles "$TOTAL_BUNDLES"
|
||||
```
|
||||
|
||||
The deploy script:
|
||||
1. Injects TOTAL_BUNDLES into index.html
|
||||
2. Minifies site.js (via esbuild, strips comments + whitespace)
|
||||
3. Uploads frontend files to S3
|
||||
4. Deletes stale bundles from previous runs (numbers ≥ TOTAL_BUNDLES)
|
||||
5. Invalidates CloudFront cache
|
||||
|
||||
## Stage 7: Backup to Homelab
|
||||
|
||||
After the site is deployed and verified, backup data before tearing down scanning infra.
|
||||
|
|
@ -76,7 +83,7 @@ After the site is deployed and verified, backup data before tearing down scannin
|
|||
|
||||
| Data | Location on EC2 | Size estimate (30M) | Purpose |
|
||||
|------|----------------|---------------------|---------|
|
||||
| Database | RDS (pg_dump) | ~5-10GB compressed | Full hosts + icons metadata, titles, WARC coordinates |
|
||||
| Database | pg_dump from i3 instance | ~5-10GB compressed | Full hosts + icons metadata, titles, WARC coordinates |
|
||||
| Icons | `~/icons/` directory | ~500GB-1TB | Complete favicon archive, content-addressed by SHA-256 |
|
||||
| Stats | `~/stats/*.json` | <1MB | Pipeline timing and counts per stage |
|
||||
| Logs | `~/*.log` | varies | Error logs for debugging |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue