958 shaares
2 private links
2 private links
GitHub - VikParuchuri/surya: OCR, layout analysis, reading order, table recognition in 90+ languages
Surya is a document OCR toolkit that does: OCR in 90+ languages that benchmarks favorably vs cloud services It works on a range of documents (see #usage and #benchmarks for more details). [...] There is a hosted API for all surya models available https://www.datalab.to/: Works with PDF, images, word docs, and powerpoints [...] I benchmarked OCR against Google Cloud vision since it has similar language coverage to Surya. [...] This will evaluate surya and optionally tesseract on multilingual pdfs from common crawl (with synthetic data for missing languages).