rewrote icon selection in english rather than sql
This commit is contained in:
parent
5a2e37ae06
commit
6cf6049698
1 changed files with 15 additions and 30 deletions
|
|
@ -276,39 +276,24 @@ WHERE url_path = '/'
|
|||
|
||||
**Tool:** SQL script
|
||||
|
||||
**Process:** For each host, select the best icon from its completed downloads:
|
||||
**Process:** For each host, select the best icon from all its completed downloads.
|
||||
|
||||
```sql
|
||||
UPDATE hosts h SET best_icon_s3_key = (
|
||||
SELECT i.s3_key FROM icons i
|
||||
WHERE i.host_id = h.id
|
||||
AND i.scan_state = 'completed'
|
||||
ORDER BY
|
||||
-- Prefer standard square sizes
|
||||
CASE
|
||||
WHEN i.width = i.height AND i.width IN (64, 48, 32, 16) THEN 0
|
||||
WHEN i.width = i.height AND i.width <= 64 THEN 1
|
||||
WHEN i.width <= 64 AND i.height <= 64 THEN 2
|
||||
ELSE 3
|
||||
END,
|
||||
-- Among valid options, prefer larger
|
||||
i.width DESC,
|
||||
-- Prefer PNG/GIF/ICO over SVG/WebP for simpler processing
|
||||
CASE
|
||||
WHEN i.content_type IN ('image/png', 'image/gif', 'image/x-icon', 'image/vnd.microsoft.icon') THEN 0
|
||||
WHEN i.content_type IN ('image/webp') THEN 1
|
||||
WHEN i.content_type IN ('image/svg+xml') THEN 2
|
||||
ELSE 3
|
||||
END,
|
||||
-- Smaller file size as tiebreaker
|
||||
i.file_size ASC
|
||||
LIMIT 1
|
||||
);
|
||||
```
|
||||
**Selection priority (decision flow):**
|
||||
|
||||
**Note on SVG/WebP:** These are downloaded and stored during scanning but are lower priority for bundle selection. Rasterizing SVG to PNG adds complexity; WebP re-encoding to PNG may increase size. If a host ONLY has SVG/WebP icons, we still use them (convert in bundle generation). But if PNG/GIF/ICO alternatives exist, prefer those.
|
||||
1. Standard square sizes (32x32, 64x64, 48x48, 16x16) — ideal for tab display. Prefer larger.
|
||||
2. Other square sizes ≤64px — close enough. Prefer larger.
|
||||
3. Non-square but both dimensions ≤64px — acceptable. Prefer larger.
|
||||
4. Everything else (180x180, 192x192, SVG with no dimensions, etc.) — last resort, will be downscaled in bundle generation.
|
||||
|
||||
**Stats emitted:** Hosts with icons selected, hosts without any icon, icon size distribution, format distribution of selected icons.
|
||||
Within the same tier: prefer PNG/GIF/ICO over WebP over SVG, then smaller file size as tiebreaker.
|
||||
|
||||
Does not distinguish between `favicon_ico` and `link_rel` sources — purely based on what was actually downloaded and its dimensions/format.
|
||||
|
||||
Uses `DISTINCT ON (host_id)` for efficient single-pass selection. See `pipeline/04_best_icon/select.sql`.
|
||||
|
||||
**Note on SVG/WebP:** Lower priority because rasterizing SVG adds complexity and WebP-to-PNG re-encoding may increase size. Only selected when no raster alternatives exist.
|
||||
|
||||
**Stats emitted:** Hosts with icons selected, hosts without any icon.
|
||||
|
||||
### Stage 5: Bundle Generation
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue