Search: [VRAM] - Bob Wang's Shaarli

1032 shaares

Filters

Links per page

20 50 100

2 results tagged VRAM

GitHub - VikParuchuri/surya: OCR, layout analysis, reading order, table recognition in 90+ languages

Surya is a document OCR toolkit that does: OCR in 90+ languages that benchmarks favorably vs cloud services It works on a range of documents (see #usage and #benchmarks for more details). [...] There is a hosted API for all surya models available https://www.datalab.to/: Works with PDF, images, word docs, and powerpoints [...] I benchmarked OCR against Google Cloud vision since it has similar language coverage to Surya. [...] This will evaluate surya and optionally tesseract on multilingual pdfs from common crawl (with synthetic data for missing languages).

Surya · VikParuchuri · VRAM · Greatbatch, Inc.

October 15, 2024 at 11:32:15 PM GMT+8 * · permalink

·

https://github.com/VikParuchuri/surya

·

GitHub - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy

Marker converts PDF to markdown quickly and accurately. Supports a wide range of documents (optimized for books and scientific papers) [...] Here are some known limitations that are on the roadmap to address: Marker will not convert 100% of equations to LaTeX. [...] marker /path/to/input/folder /path/to/output/folder --workers 10 --max 10 --metadata_file /path/to/metadata.json --min_length 10000 --workers is the number of pdfs to convert at once. [...] Then run benchmark.py like this: python benchmark.py data/pdfs data/references report.json --nougat This will benchmark marker against other text extraction methods.

blob/master/data/examples/marker/switch_transformers.md · AGPL · Fundación Internacional de Lengua Española · blob/master/data/examples/marker/multicolcnn.md · Google, Inc. · OCRing · Meta · blob/master/data/examples/marker/thinkos.md · Greatbatch, Inc. · benchmark.py · blob/master/data/examples/marker/thinkpython.md · Comisión de Libertades e Informática · google.com/file/d/1ZSeWDo2g1y0BRLT7KnbmytV2bjWARWba/view?usp=sharing · blob/master/data/examples/nougat/thinkpython.md · master/data/examples/nougat/switch_transformers.md · marker/settings.py · RAM · MIN · github.io/tessdoc/Data-Files#data-files-for-version-400-november-29-2016 · blob/master/data/examples/nougat/multicolcnn.md · NUM · PDFs · worker count · IBM · blob/master/data/examples/nougat/thinkos.md · Surya · Ram · RAM Energy Resources, Inc. · PyTorch · VRAM · VikParuchuri · Lang · Ada · Ghostscript · Scotland Yard

June 3, 2024 at 7:36:18 AM GMT+8 * · permalink

·

https://github.com/VikParuchuri/marker

·