|
|
86cff37533
|
download cc-index to home not tmp (which is tmpfs)
|
2026-05-20 09:35:06 -04:00 |
|
|
|
9308b5e039
|
download cc-index first with aws cli instead of streaming it
|
2026-05-20 08:14:22 -04:00 |
|
|
|
564919c5cc
|
added downloaded_at timestamp to icon table
|
2026-05-20 01:35:13 -04:00 |
|
|
|
ec33b2e857
|
bump up s3 warc retries to 6 to avoid 503 errors
|
2026-05-20 01:30:46 -04:00 |
|
|
|
081866f62e
|
update bundle gen to use channels and goroutines to saturate disk and not block on db access + bundle coalesing and uploading
|
2026-05-20 01:28:52 -04:00 |
|
|
|
902928235c
|
updated best icon selection logic
|
2026-05-20 01:15:08 -04:00 |
|
|
|
03e343a136
|
cap number of favicons to 50 per host
|
2026-05-20 00:53:24 -04:00 |
|
|
|
cd896427eb
|
shuffle icon link batches before putting them in the channel
|
2026-05-20 00:50:40 -04:00 |
|
|
|
963d9209ca
|
cleaner dns error handling
|
2026-05-20 00:35:55 -04:00 |
|
|
|
c9ea462e97
|
check all CSP headers for iframe disallowing
|
2026-05-20 00:32:56 -04:00 |
|
|
|
a8177a1583
|
improve stats generation
|
2026-05-20 00:31:38 -04:00 |
|
|
|
0c9ad5bfd6
|
count iframes only if there isn't an error
|
2026-05-20 00:29:28 -04:00 |
|
|
|
56ae26cbef
|
added bmp decoder to bundler
|
2026-05-20 00:11:53 -04:00 |
|
|
|
7d24b406aa
|
redundant min
|
2026-05-20 00:10:04 -04:00 |
|
|
|
eb40995c60
|
just overwrite bundles, don't delete then re-add
|
2026-05-20 00:09:53 -04:00 |
|
|
|
2f1547a912
|
switched bundle host field to url to retain http
|
2026-05-19 23:38:14 -04:00 |
|
|
|
7f36e99443
|
updated random value to double precision float
|
2026-05-19 23:37:50 -04:00 |
|
|
|
a28cd2b056
|
updated pipeline README
|
2026-05-19 13:06:48 -04:00 |
|
|
|
3534f84b27
|
added about.html
|
2026-05-19 11:42:09 -04:00 |
|
|
|
1d5b7bd374
|
added random_order to host table schema
|
2026-05-19 10:47:05 -04:00 |
|
|
|
e6d5d5175c
|
fixed oom in bundle_gen and added randomOrder, still need a full redesign
|
2026-05-19 10:46:40 -04:00 |
|
|
|
cf17fc42b1
|
fixed icon downloading performance issues
|
2026-05-19 10:32:34 -04:00 |
|
|
|
5b3f6a6870
|
switched from s3 to disk for saving icons
|
2026-05-18 12:43:50 -04:00 |
|
|
|
ddeb8bc504
|
fix TOTAL_BUNDLES sed command in deploy script
|
2026-05-18 01:00:09 -04:00 |
|
|
|
21f2a75ed3
|
delete old tab bundles before making new ones
|
2026-05-18 00:49:50 -04:00 |
|
|
|
a977a8c0b3
|
added initial pipeline README
|
2026-05-18 00:40:57 -04:00 |
|
|
|
921f72d2aa
|
added deploy script
|
2026-05-18 00:40:27 -04:00 |
|
|
|
4963866427
|
updated scanning useragent
|
2026-05-18 00:26:13 -04:00 |
|
|
|
f89883e745
|
added bundle generation
|
2026-05-17 23:02:34 -04:00 |
|
|
|
ca06a91dc6
|
don't allow 1 pixel favicons
|
2026-05-17 23:01:53 -04:00 |
|
|
|
b94427f200
|
don't use svg icons, they aren't supported in coversion
|
2026-05-17 22:34:27 -04:00 |
|
|
|
664197e287
|
added select.sql query
|
2026-05-17 22:22:44 -04:00 |
|
|
|
5a2e37ae06
|
added icon downloader
|
2026-05-17 22:09:03 -04:00 |
|
|
|
f45e4a6034
|
added warc parser
|
2026-05-17 20:25:59 -04:00 |
|
|
|
db81015e0b
|
added query.sh to read the cc-index from s3 parquet files and dump it into our psql db
|
2026-05-17 19:12:25 -04:00 |
|
|
|
fcf203e1d8
|
added infra setup with terraform
|
2026-05-17 16:07:50 -04:00 |
|