remix logo

Hacker Remix

Gemini 2.5: The First LLM That Understands PDF Layouts

16 points by serjester 2 weeks ago | 1 comment

simonw 2 weeks ago

This example is using bounding boxes, but it turns out Gemini 2.5 (both Pro and Flash) take that a step further and can return complex shaped segmentation masks identifying objects too: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...