Apache Tika
Universal content analysis powered by Apache Tika — detect file types, extract text and metadata from hundreds of formats, and identify languages
CategoryDocuments
Version0.3.0
AuthorElite AI
Isolation ModeClassLoaderIsolated
Min Hefty Version0.26.0
PlatformsAll
DependenciesNone
Available For Planspremium
Instruments
| Name | Category | Risk Level | Keywords | Description |
|---|---|---|---|---|
| tika_detect | Filesystem | ReadOnly | tika, detect, mime, type, content-type, format, identify, file | Detect the MIME type of a file using magic bytes, filename patterns, and container analysis. |
| tika_parse | Filesystem | ReadOnly | tika, parse, extract, text, content, read, document, convert | Extract text content from any supported file format (PDF, Office, HTML, email, images, audio, video, and hundreds more). |
| tika_metadata | Filesystem | ReadOnly | tika, metadata, properties, exif, xmp, author, title, info | Extract all available metadata from a file — author, title, dates, dimensions, codec, GPS coordinates, and format-specific properties. |
| tika_language | Filesystem | ReadOnly | tika, language, detect, locale, identify, nlp | Detect the natural language of text content from a file or direct text input. |
| tika_batch_detect | Filesystem | ReadOnly | tika, batch, detect, mime, directory, scan, bulk, classify | Detect MIME types for multiple files or an entire directory tree in a single operation. |
| tika_supported_types | Application | ReadOnly | tika, types, formats, supported, mime, list, parsers, capabilities | List all file formats and MIME types supported by the Tika content analysis toolkit, with optional filtering. |