Commit graph

104 commits

Author SHA1 Message Date
8e3907505f deploy frontend from the ec2 at the end of the pipeline 2026-05-25 23:21:50 -04:00
8d62832c1d tmux and htop on the db for performance monitoring 2026-05-25 23:09:40 -04:00
3ea88790b5 frontend now looks like firefox and chrome tabs on linux 2026-05-25 22:53:11 -04:00
7c4572aafb EVEN MORE CONCURRENCYYYY 2026-05-25 22:00:53 -04:00
6c64ffcf94 upped concurrent s3 requests to speed up cc-index download 2026-05-25 21:53:57 -04:00
4bfe165fac update infra README for cloud init 2026-05-25 21:43:31 -04:00
a92c838d23 fix ec2 provisioning for cloud init 2026-05-25 21:43:12 -04:00
7c5573c24d add filter to squash warning 2026-05-25 21:40:01 -04:00
33bd0a221e updated s3_key name to icon_hash 2026-05-25 21:05:26 -04:00
e308718eb2 remove icon s3 bucket, add log retention policy, make logs private explicitly 2026-05-25 20:57:11 -04:00
8c005c4f6c two phase best icon selection with a temporary table 2026-05-25 20:55:29 -04:00
a819dabb57 updated number of async writers to warc_parse to accomidate faster db nvme write speeds 2026-05-25 19:41:25 -04:00
bfb7d8f883 updated gitignore 2026-05-25 19:40:42 -04:00
1afbc41599 automated ec2 setup and build 2026-05-25 18:29:37 -04:00
bf8b932cdc switched from rds to i5 ec2 for nvme disk read/write speeds 2026-05-25 18:17:07 -04:00
c93d1736fe tune unbound to take up less memory for our use case 2026-05-25 17:30:29 -04:00
cb8d23842c better gated firefox specific code 2026-05-25 17:18:28 -04:00
8ceb31bcbb increase bundlegen producer host amount to ensure workers aren't starved 2026-05-25 16:21:49 -04:00
4c7a0f54f7 disable keepalives so connections stop after data transfer complete 2026-05-25 16:20:31 -04:00
ca90b7071e optimize db for bulk insert by turning off indexes and vacuum 2026-05-25 14:16:40 -04:00
eec486880a about everytab tab bolded title 2026-05-21 01:02:08 -04:00
0f0acb642f fixed firefox marquee rollover flicker 2026-05-21 00:56:50 -04:00
fe3d5f7039 speed is fixed, no longer dependent on viewport width 2026-05-21 00:44:00 -04:00
b53fd7844b smoother marquee and fixed icon jitter/stutter in firefox 2026-05-21 00:38:33 -04:00
4fa40c7b47 improved write efficency, though we are still bottlenecking on RDS - will switch to local postgres for future runs 2026-05-20 22:38:23 -04:00
baf657a8ed updated PLAN.md and ARCHITECTURE.md with new instance type and performance concerns 2026-05-20 13:17:03 -04:00
b419b5bf6c updated plan.md after 3M test 2026-05-20 13:14:06 -04:00
8dce702e8d upped buffer sizes and switched to 2xlarge to increase speed 2026-05-20 12:59:12 -04:00
1df9a234cf updated pipeline README to use compression and new flow 2026-05-20 11:54:48 -04:00
6352b9253f upped swap to 8G 2026-05-20 11:54:17 -04:00
024e0513ba upped icon downloading concurrency 2026-05-20 11:00:17 -04:00
91f48f249a 1T for ec2 hd 2026-05-20 10:19:02 -04:00
ead6366ed0 up ulimit for more connection 2026-05-20 10:18:48 -04:00
6d8ba61102 update warc parsing with new 3 stage producer, worker, consumer model, increasing speed and saturating cores 2026-05-20 10:18:15 -04:00
0efec72e45 print every 100 bundles 2026-05-20 10:17:35 -04:00
426abe1c90 upped concurrency of icon downloading 2026-05-20 09:47:18 -04:00
3bc355e503 improved bundle cli output with progress 2026-05-20 09:46:59 -04:00
86cff37533 download cc-index to home not tmp (which is tmpfs) 2026-05-20 09:35:06 -04:00
9308b5e039 download cc-index first with aws cli instead of streaming it 2026-05-20 08:14:22 -04:00
564919c5cc added downloaded_at timestamp to icon table 2026-05-20 01:35:13 -04:00
ec33b2e857 bump up s3 warc retries to 6 to avoid 503 errors 2026-05-20 01:30:46 -04:00
081866f62e update bundle gen to use channels and goroutines to saturate disk and not block on db access + bundle coalesing and uploading 2026-05-20 01:28:52 -04:00
902928235c updated best icon selection logic 2026-05-20 01:15:08 -04:00
03e343a136 cap number of favicons to 50 per host 2026-05-20 00:53:24 -04:00
cd896427eb shuffle icon link batches before putting them in the channel 2026-05-20 00:50:40 -04:00
27203ff085 updated bot rate 2026-05-20 00:50:17 -04:00
963d9209ca cleaner dns error handling 2026-05-20 00:35:55 -04:00
c9ea462e97 check all CSP headers for iframe disallowing 2026-05-20 00:32:56 -04:00
a8177a1583 improve stats generation 2026-05-20 00:31:38 -04:00
0c9ad5bfd6 count iframes only if there isn't an error 2026-05-20 00:29:28 -04:00