Apache Tika

Universal content analysis powered by Apache Tika — detect file types, extract text and metadata from hundreds of formats, and identify languages

Category

Documents

Version

0.3.0

Author

Elite AI

Isolation Mode

ClassLoaderIsolated

Min Hefty Version

0.26.0

Platforms

All

Dependencies

None

Available For Plans

premium

Instruments

Name	Category	Risk Level	Keywords	Description
tika_detect	Filesystem	`ReadOnly`	tika, detect, mime, type, content-type, format, identify, file	Detect the MIME type of a file using magic bytes, filename patterns, and container analysis.
tika_parse	Filesystem	`ReadOnly`	tika, parse, extract, text, content, read, document, convert	Extract text content from any supported file format (PDF, Office, HTML, email, images, audio, video, and hundreds more).
tika_metadata	Filesystem	`ReadOnly`	tika, metadata, properties, exif, xmp, author, title, info	Extract all available metadata from a file — author, title, dates, dimensions, codec, GPS coordinates, and format-specific properties.
tika_language	Filesystem	`ReadOnly`	tika, language, detect, locale, identify, nlp	Detect the natural language of text content from a file or direct text input.
tika_batch_detect	Filesystem	`ReadOnly`	tika, batch, detect, mime, directory, scan, bulk, classify	Detect MIME types for multiple files or an entire directory tree in a single operation.
tika_supported_types	Application	`ReadOnly`	tika, types, formats, supported, mime, list, parsers, capabilities	List all file formats and MIME types supported by the Tika content analysis toolkit, with optional filtering.