updated best icon selection logic

This commit is contained in:
Joe Lothan 2026-05-20 01:15:08 -04:00
parent 03e343a136
commit 902928235c
2 changed files with 35 additions and 23 deletions

View file

@ -273,19 +273,22 @@ WHERE url_path = '/'
**Selection priority (decision flow):** **Selection priority (decision flow):**
1. Standard square sizes (32x32, 64x64, 48x48, 16x16) — ideal for tab display. Prefer larger. Target: 32x32 source icon. The frontend displays favicons at 16x16 CSS pixels, which is 32x32 physical pixels on 2x Retina screens. So 32x32 is the ideal source resolution — crisp on Retina without wasting bundle space.
2. Other square sizes ≤64px — close enough. Prefer larger.
3. Non-square but both dimensions ≤64px — acceptable. Prefer larger.
4. Everything else (180x180, 192x192, SVG with no dimensions, etc.) — last resort, will be downscaled in bundle generation.
Within the same tier: prefer PNG/GIF/ICO over WebP over SVG, then smaller file size as tiebreaker. 1. **Icons ≥32px** (preferred): smallest first, so closest to 32 wins. A 32x32 beats a 48x48 beats a 180x180.
2. **Icons <32px** (fallback): largest first. A 16x16 beats an 8x8.
3. **Unknown dimensions** (NULL width/height): last resort.
Within the same size tier:
- Prefer PNG > ICO > GIF/JPEG/BMP > WebP
- Tiebreaker: smaller file size
SVGs excluded (can't rasterize without external deps). Icons ≤2x2 excluded (tracking pixels).
Does not distinguish between `favicon_ico` and `link_rel` sources — purely based on what was actually downloaded and its dimensions/format. Does not distinguish between `favicon_ico` and `link_rel` sources — purely based on what was actually downloaded and its dimensions/format.
Uses `DISTINCT ON (host_id)` for efficient single-pass selection. See `pipeline/04_best_icon/select.sql`. Uses `DISTINCT ON (host_id)` for efficient single-pass selection. See `pipeline/04_best_icon/select.sql`.
**Note on SVG/WebP:** Lower priority because rasterizing SVG adds complexity and WebP-to-PNG re-encoding may increase size. Only selected when no raster alternatives exist.
**Stats emitted:** Hosts with icons selected, hosts without any icon. **Stats emitted:** Hosts with icons selected, hosts without any icon.
### Stage 5: Bundle Generation ### Stage 5: Bundle Generation

View file

@ -1,13 +1,14 @@
-- Best Icon Selection -- Best Icon Selection
-- Picks the best completed icon for each host and stores its s3_key in hosts.best_icon_s3_key. -- Picks the best completed icon for each host and stores its s3_key in hosts.best_icon_s3_key.
-- --
-- Target: 32x32 source icon (displayed at 16x16 CSS, crisp on 2x Retina).
--
-- Priority: -- Priority:
-- 1. Standard square sizes (64 > 48 > 32 > 16) — ideal for tab display -- 1. Icons ≥32px: prefer smallest first (closest to 32 — a 32x32 beats a 48x48 beats a 180x180)
-- 2. Other square sizes ≤64 -- 2. Icons <32px: prefer largest first (16x16 beats 8x8)
-- 3. Non-square sizes ≤64 on both axes -- 3. Within same size: prefer PNG > ICO > GIF/JPEG/BMP > WebP
-- 4. Anything larger (downloaded because rel_sizes was undeclared) -- 4. Tiebreaker: smaller file size
-- 5. Among equal priority: prefer PNG/GIF/ICO over WebP (SVGs excluded — not supported in bundle generation) -- SVGs excluded (not supported in bundle generation). Icons ≤2x2 excluded (tracking pixels).
-- 6. Tiebreaker: smaller file size (less bandwidth in bundles)
-- --
-- Usage: psql $DATABASE_URL -f pipeline/04_best_icon/select.sql -- Usage: psql $DATABASE_URL -f pipeline/04_best_icon/select.sql
@ -17,22 +18,30 @@ FROM (
FROM icons i FROM icons i
WHERE i.scan_state = 'completed' WHERE i.scan_state = 'completed'
AND i.s3_key IS NOT NULL AND i.s3_key IS NOT NULL
AND i.content_type NOT IN ('image/svg+xml') AND i.content_type != 'image/svg+xml'
AND (i.width IS NULL OR i.width > 2) AND (i.width IS NULL OR i.width > 2)
AND (i.height IS NULL OR i.height > 2) AND (i.height IS NULL OR i.height > 2)
ORDER BY i.host_id, ORDER BY i.host_id,
-- Tier: ≥32 preferred over <32. NULL dimensions go last.
CASE CASE
WHEN i.width = i.height AND i.width IN (64, 48, 32, 16) THEN 0 WHEN LEAST(COALESCE(i.width, 0), COALESCE(i.height, 0)) >= 32 THEN 0
WHEN i.width = i.height AND i.width <= 64 THEN 1 WHEN COALESCE(i.width, 0) > 0 THEN 1
WHEN i.width IS NOT NULL AND i.width <= 64 AND i.height IS NOT NULL AND i.height <= 64 THEN 2
ELSE 3
END,
COALESCE(i.width, 0) DESC,
CASE
WHEN i.content_type IN ('image/png', 'image/gif', 'image/x-icon', 'image/vnd.microsoft.icon') THEN 0
WHEN i.content_type = 'image/webp' THEN 1
ELSE 2 ELSE 2
END, END,
-- Within ≥32: smallest first (closest to 32). Within <32: largest first.
CASE
WHEN LEAST(COALESCE(i.width, 0), COALESCE(i.height, 0)) >= 32
THEN GREATEST(COALESCE(i.width, 0), COALESCE(i.height, 0))
ELSE -GREATEST(COALESCE(i.width, 0), COALESCE(i.height, 0))
END,
-- Format preference
CASE
WHEN i.content_type = 'image/png' THEN 0
WHEN i.content_type IN ('image/x-icon', 'image/vnd.microsoft.icon') THEN 1
WHEN i.content_type IN ('image/gif', 'image/jpeg', 'image/bmp') THEN 2
WHEN i.content_type = 'image/webp' THEN 3
ELSE 4
END,
i.file_size ASC i.file_size ASC
) sub ) sub
WHERE h.id = sub.host_id; WHERE h.id = sub.host_id;