Eugene Yan • 9/4/2020

Mailbag: Parsing Fields from PDFs—When to Use Machine Learning?

A developer seeks guidance on whether to implement machine learning for parsing quote numbers from PDFs, where occasional typos cause errors. The article advises that a 99% success rate is already good and suggests using techniques like Levenshtein distance for text matching and flagging ambiguous cases for human review, rather than immediately jumping to a full ML solution.

0 comments

#Machine Learning #data extraction #ocr