CRITICAL ALERT: Apache Tika XXE Flaw (CVSS 10.0) Allows File Read via PDF Files
On 4 December 2025, the Apache Software Foundation disclosed a critical vulnerability — CVE-2025-66516, rated the maximum CVSS 10.0 — in the Apache Tika library. Because Tika underpins search engines, ECM platforms, DLP systems, and large-scale document-processing pipelines, the issue affects not only developers but a vast array of business services in which Tika operates as a hidden infrastructural component.
The vulnerability stems from the way Tika processes XFA markup embedded in PDF files. When parsing such documents, Tika fails to restrict the use of external XML entities, thereby enabling a classic XXE attack. A specially crafted PDF can compel Tika to read arbitrary files from the host or trigger SSRF requests to internal network resources that would ordinarily be inaccessible from the outside.
A fix is available in Apache Tika 3.2.2. Both tika-core and the tika-parser-pdf-module have been updated, meaning that patching only the PDF parser is insufficient — the entire dependency chain must be upgraded to ensure protection.
According to SKIPA, roughly 200 hosts in the Runet currently use Apache Tika, and approximately 95 percent appear to be vulnerable. PentOps clients were notified in advance and received detailed guidance on patching and risk reduction. The true number of Tika installations is likely far higher, however, as the library is often embedded transitively and not explicitly exposed in public-facing infrastructure.
Installations are vulnerable if they rely on tika-core versions 1.13 through 3.2.1, as well as tika-parser-pdf-module versions 2.0.0 through 3.2.1. In the 1.x line, the flaw is present in tika-parsers up to and including version 1.28.5. Crucially, the issue resides not only in the PDF parser but in the XML parser within tika-core itself, making partial updates ineffective.
Exploitation is feasible under common automated document-processing workflows. It is enough for Tika to ingest PDF files containing XFA markup, automatically parse them, and run in an environment with network or filesystem access. Sandboxing tools such as Firejail, Docker, or AppArmor, along with strict ACLs, do reduce the blast radius — but cannot eliminate it: a successful XXE attack still grants the adversary full freedom within the container’s security context.
Indicators of compromise may include errors or anomalies when handling XFA-enabled PDFs, unexpected outbound requests from Tika services to unfamiliar domains, attempts to access local paths such as file:///etc/passwd or user directories, sudden load spikes, and atypical exceptions in ingest or parsing workflows. Log entries from applications leveraging Tika may reveal XML-parsing failures or frequent fallback-mode activations.
Given the CVSS 10.0 severity, the default recommendation is unequivocal: upgrade Apache Tika to version 3.2.2, ensuring that both tika-core and the PDF module are updated in tandem. If immediate patching is impossible, temporarily disable or heavily restrict processing of XFA PDFs, introduce stringent validation and filtering of incoming files, and isolate Tika processes in tightly confined containers with minimal filesystem privileges and outbound connections blocked. It is also advisable to audit dependency chains, as Tika is often pulled in transitively by search modules and ECM platforms.
The final step is a thorough log audit. Review Tika logs and the logs of all integrated services that perform automated PDF analysis, paying close attention to unusual parsing errors, anomalous network activity, and attempts to access internal resources. Now that the vulnerability is fully documented and carries the highest possible severity rating, delaying updates or investigation is exceedingly risky.
Support Our Threat Intelligence
If you find our technology report and cybersecurity news helpful, consider supporting our work.