CLIP T2I ↑: Text-to-Image similarity. Higher = better semantic alignment.
CLIP I2I ↑: Image-to-Image similarity (0-1). Higher = closer to ground truth.
HPSv2 ↑: Human Preference Score. Higher = more aesthetically pleasing.
MSE ↓: Mean Squared Error. Lower = more pixel-accurate.
Code Length ↓: SVG code size. Lower = more compact.