Extract PDF text in your browser with LiteParse for the web
Browser-based PDF text extraction using LiteParse, a spatial text parsing tool built on PDF.js and Tesseract.js.
Browser-based PDF text extraction using LiteParse, a spatial text parsing tool built on PDF.js and Tesseract.js.
A developer builds a browser-based version of LiteParse, an open-source PDF text extraction tool, using PDF.js and Tesseract.js.
A guide to using PDF.js for reading/parsing PDFs and PDF Lib for creating/modifying PDFs in Node.js, with code examples.
Part two of automating fuzz testing for a PDF parser using Nix, focusing on building a corpus of edge-case PDFs.
A developer asks when to use ML for parsing PDF fields with typos, and receives advice on using Levenshtein distance and human-in-the-loop solutions.