|
|
a819dabb57
|
updated number of async writers to warc_parse to accomidate faster db nvme write speeds
|
2026-05-25 19:41:25 -04:00 |
|
|
|
ca90b7071e
|
optimize db for bulk insert by turning off indexes and vacuum
|
2026-05-25 14:16:40 -04:00 |
|
|
|
4fa40c7b47
|
improved write efficency, though we are still bottlenecking on RDS - will switch to local postgres for future runs
|
2026-05-20 22:38:23 -04:00 |
|
|
|
8dce702e8d
|
upped buffer sizes and switched to 2xlarge to increase speed
|
2026-05-20 12:59:12 -04:00 |
|
|
|
6d8ba61102
|
update warc parsing with new 3 stage producer, worker, consumer model, increasing speed and saturating cores
|
2026-05-20 10:18:15 -04:00 |
|
|
|
ec33b2e857
|
bump up s3 warc retries to 6 to avoid 503 errors
|
2026-05-20 01:30:46 -04:00 |
|
|
|
03e343a136
|
cap number of favicons to 50 per host
|
2026-05-20 00:53:24 -04:00 |
|
|
|
c9ea462e97
|
check all CSP headers for iframe disallowing
|
2026-05-20 00:32:56 -04:00 |
|
|
|
a8177a1583
|
improve stats generation
|
2026-05-20 00:31:38 -04:00 |
|
|
|
0c9ad5bfd6
|
count iframes only if there isn't an error
|
2026-05-20 00:29:28 -04:00 |
|
|
|
f45e4a6034
|
added warc parser
|
2026-05-17 20:25:59 -04:00 |
|