Jeroen Reijn • 4/5/2010

Metadata extraction with Apache Tika

This technical article explains how to use Apache Tika, a toolkit from the Apache Software Foundation, to extract metadata and content from a wide range of file formats (PDF, Office docs, images, etc.) within a content management system. It covers Tika's purpose, supported formats, and includes a practical Maven dependency example for Java developers working with content repositories like Apache Jackrabbit.

0 comments

#Java #File Formats #Apache Tika