4 points by martinald 16 hours ago | 1 comment
Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.
The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.
Given this is so new I'm struggling to find any tools which make this easier.
raymond_goo 16 hours ago
!pip install pytesseract pdf2image pillow
!apt install poppler-utils
#!apt install tesseract-ocr
from pdf2image import convert_from_path
import pytesseract
pages = convert_from_path('k.pdf', dpi=300)
all_text = ""
for page_num, img in enumerate(pages, start=1):
text = pytesseract.image_to_string(img)
all_text += f"\n--- Page {page_num} ---\n{text}"
print(all_text)