Sunflower
Read OriginalSunflower is a tool designed to automate the extraction of the main textual content from multiple HTML documents from the same source. It works by having the user identify key strings in a document's essence, then uses the smallest containing HTML subtree to extract content from all documents in a collection. The article details its GUI, its use for building the National Corpus of Polish, and the author's shift to a Swing widget-based architecture for managing application state.
Comments
No comments yet
Be the first to share your thoughts!
Browser Extension
Get instant access to AllDevBlogs from your browser
Top of the Week
No top articles yet