Sunflower
Read OriginalSunflower is a tool designed to automate the extraction of the main textual content from multiple HTML documents from the same source. It works by having the user identify key strings in a document's essence, then uses the smallest containing HTML subtree to extract content from all documents in a collection. The article details its GUI, its use for building the National Corpus of Polish, and the author's shift to a Swing widget-based architecture for managing application state.
Comments
No comments yet
Be the first to share your thoughts!
Browser Extension
Get instant access to AllDevBlogs from your browser