diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 52076b2..cdb0038 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -273,19 +273,22 @@ WHERE url_path = '/' **Selection priority (decision flow):** -1. Standard square sizes (32x32, 64x64, 48x48, 16x16) — ideal for tab display. Prefer larger. -2. Other square sizes ≤64px — close enough. Prefer larger. -3. Non-square but both dimensions ≤64px — acceptable. Prefer larger. -4. Everything else (180x180, 192x192, SVG with no dimensions, etc.) — last resort, will be downscaled in bundle generation. +Target: 32x32 source icon. The frontend displays favicons at 16x16 CSS pixels, which is 32x32 physical pixels on 2x Retina screens. So 32x32 is the ideal source resolution — crisp on Retina without wasting bundle space. -Within the same tier: prefer PNG/GIF/ICO over WebP over SVG, then smaller file size as tiebreaker. +1. **Icons ≥32px** (preferred): smallest first, so closest to 32 wins. A 32x32 beats a 48x48 beats a 180x180. +2. **Icons <32px** (fallback): largest first. A 16x16 beats an 8x8. +3. **Unknown dimensions** (NULL width/height): last resort. + +Within the same size tier: +- Prefer PNG > ICO > GIF/JPEG/BMP > WebP +- Tiebreaker: smaller file size + +SVGs excluded (can't rasterize without external deps). Icons ≤2x2 excluded (tracking pixels). Does not distinguish between `favicon_ico` and `link_rel` sources — purely based on what was actually downloaded and its dimensions/format. Uses `DISTINCT ON (host_id)` for efficient single-pass selection. See `pipeline/04_best_icon/select.sql`. -**Note on SVG/WebP:** Lower priority because rasterizing SVG adds complexity and WebP-to-PNG re-encoding may increase size. Only selected when no raster alternatives exist. - **Stats emitted:** Hosts with icons selected, hosts without any icon. ### Stage 5: Bundle Generation diff --git a/pipeline/04_best_icon/select.sql b/pipeline/04_best_icon/select.sql index 27ae6b1..296c319 100644 --- a/pipeline/04_best_icon/select.sql +++ b/pipeline/04_best_icon/select.sql @@ -1,13 +1,14 @@ -- Best Icon Selection -- Picks the best completed icon for each host and stores its s3_key in hosts.best_icon_s3_key. -- +-- Target: 32x32 source icon (displayed at 16x16 CSS, crisp on 2x Retina). +-- -- Priority: --- 1. Standard square sizes (64 > 48 > 32 > 16) — ideal for tab display --- 2. Other square sizes ≤64 --- 3. Non-square sizes ≤64 on both axes --- 4. Anything larger (downloaded because rel_sizes was undeclared) --- 5. Among equal priority: prefer PNG/GIF/ICO over WebP (SVGs excluded — not supported in bundle generation) --- 6. Tiebreaker: smaller file size (less bandwidth in bundles) +-- 1. Icons ≥32px: prefer smallest first (closest to 32 — a 32x32 beats a 48x48 beats a 180x180) +-- 2. Icons <32px: prefer largest first (16x16 beats 8x8) +-- 3. Within same size: prefer PNG > ICO > GIF/JPEG/BMP > WebP +-- 4. Tiebreaker: smaller file size +-- SVGs excluded (not supported in bundle generation). Icons ≤2x2 excluded (tracking pixels). -- -- Usage: psql $DATABASE_URL -f pipeline/04_best_icon/select.sql @@ -17,22 +18,30 @@ FROM ( FROM icons i WHERE i.scan_state = 'completed' AND i.s3_key IS NOT NULL - AND i.content_type NOT IN ('image/svg+xml') + AND i.content_type != 'image/svg+xml' AND (i.width IS NULL OR i.width > 2) AND (i.height IS NULL OR i.height > 2) ORDER BY i.host_id, + -- Tier: ≥32 preferred over <32. NULL dimensions go last. CASE - WHEN i.width = i.height AND i.width IN (64, 48, 32, 16) THEN 0 - WHEN i.width = i.height AND i.width <= 64 THEN 1 - WHEN i.width IS NOT NULL AND i.width <= 64 AND i.height IS NOT NULL AND i.height <= 64 THEN 2 - ELSE 3 - END, - COALESCE(i.width, 0) DESC, - CASE - WHEN i.content_type IN ('image/png', 'image/gif', 'image/x-icon', 'image/vnd.microsoft.icon') THEN 0 - WHEN i.content_type = 'image/webp' THEN 1 + WHEN LEAST(COALESCE(i.width, 0), COALESCE(i.height, 0)) >= 32 THEN 0 + WHEN COALESCE(i.width, 0) > 0 THEN 1 ELSE 2 END, + -- Within ≥32: smallest first (closest to 32). Within <32: largest first. + CASE + WHEN LEAST(COALESCE(i.width, 0), COALESCE(i.height, 0)) >= 32 + THEN GREATEST(COALESCE(i.width, 0), COALESCE(i.height, 0)) + ELSE -GREATEST(COALESCE(i.width, 0), COALESCE(i.height, 0)) + END, + -- Format preference + CASE + WHEN i.content_type = 'image/png' THEN 0 + WHEN i.content_type IN ('image/x-icon', 'image/vnd.microsoft.icon') THEN 1 + WHEN i.content_type IN ('image/gif', 'image/jpeg', 'image/bmp') THEN 2 + WHEN i.content_type = 'image/webp' THEN 3 + ELSE 4 + END, i.file_size ASC ) sub WHERE h.id = sub.host_id;