Apache Tika

Universal content analysis powered by Apache Tika — detect file types, extract text and metadata from hundreds of formats, and identify languages

Category
Documents
Version
0.3.0
Author
Elite AI
Isolation Mode
ClassLoaderIsolated
Min Hefty Version
0.26.0
Platforms
All
Dependencies
None
Available For Plans
premium

Instruments

NameCategoryRisk LevelKeywordsDescription
tika_detectFilesystemReadOnlytika, detect, mime, type, content-type, format, identify, fileDetect the MIME type of a file using magic bytes, filename patterns, and container analysis.
tika_parseFilesystemReadOnlytika, parse, extract, text, content, read, document, convertExtract text content from any supported file format (PDF, Office, HTML, email, images, audio, video, and hundreds more).
tika_metadataFilesystemReadOnlytika, metadata, properties, exif, xmp, author, title, infoExtract all available metadata from a file — author, title, dates, dimensions, codec, GPS coordinates, and format-specific properties.
tika_languageFilesystemReadOnlytika, language, detect, locale, identify, nlpDetect the natural language of text content from a file or direct text input.
tika_batch_detectFilesystemReadOnlytika, batch, detect, mime, directory, scan, bulk, classifyDetect MIME types for multiple files or an entire directory tree in a single operation.
tika_supported_typesApplicationReadOnlytika, types, formats, supported, mime, list, parsers, capabilitiesList all file formats and MIME types supported by the Tika content analysis toolkit, with optional filtering.

← Back to Plugins