The AI Privacy Shield: How AnonyMask Automates Data Redaction for LLM and RAG Workflows

AnonyMask: Automated Masking and Unmasking of Explicit and Implicit Privacy Data

AnonyMask is a privacy-preserving tool designed to automatically detect, mask, and unmask privacy data across various file formats. It allows enterprises to leverage the power of Large Language Model (LLM) or Retrieval-Augmented Generation (RAG) while ensuring that private or confidential information remains secure and compliant. With a single click, users can anonymize both explicit and implicit privacy data before sending it to LLM or RAG for analysis—and restore the original content afterward using smart unmasking. AnonyMask offers a secure, customizable, and offline-capable privacy-preserving document compatible with common file types such as .pdf, .docx, .xlsx, .csv, and .txt.

Motivation Behind AnonyMask

The rapid adoption of AI in enterprise environments—especially for customer insights, HR analytics, financial processing, and legal document summarization—has introduced new privacy challenges. Real-world cases demonstrate how users—ranging from medical staff inputting patient data into LLM to employees sharing proprietary code—can unintentionally expose sensitive information when interacting with LLM or RAG systems, leading to risks of data retention, leakage, and privacy violations.

Despite regulations such as GDPR and UU PDP in Indonesia, many users are unaware of what personal data gets extracted, how it’s processed, and where it ends up. AnonyMask was created to address this gap by offering a secure and automated masking system before documents reach any LLM or RAG for analysis. It supports explicit and implicit privacy data detection and enables unmasking afterward—ensuring compliance, data protection, and peace of mind.

Main Features

No. Main Features Description
1. Automatic Privacy Data Masking Detects and masks both explicit and implicit (33 labels) privacy data using transformer-based AI models.
2. Multi-File Format Support Supports input and output in .txt.csv.pdf.docx.xlsx, and .xls formats.
3. Secure LLM/RAG Integration Prepares privacy-safe documents, ensuring no raw PII is exposed to external LLM or RAG.
4. Smart Unmasking Restores original content after LLM or RAG processing using internal token mapping—seamlessly reversing the masked values.
5. Customizable Masking Rules Allows users to define which entities to mask or exclude, offering full control over the masking process.
6. Privacy By Design All processing is performed offline and locally—no data is sent or stored externally, ensuring full confidentiality.
7. Transparent Logging Maintains logs of all masking and unmasking operations for traceability and auditability.
8. Multilingual Model Support Automatically detects privacy data in multiple languages such as English and Indonesian using models like XLM-RoBERTa.
9. Portable Desktop Application Runs as a standalone .exe without requiring external dependencies on the user’s machine.

Your Privacy, Your Rules

No. Interface Type Description
1. Masking Redacted Masking Replaces all with ****, fully hiding the original content.
Partial Masking Partially hides values, showing only fragments (e.g., J*** ***e0*******1) to retain readability.
Full Masking – Category Replaces with its category label (e.g., [Name][Email][DOB]).
Full Masking – Value Replaces with custom user input. If not specified, it defaults to the category label.
Full Masking – All Random Randomizes every privacy value independently, even if repeated data exists (e.g., John → Axel, next John → Rey).
Full Masking – Same Random Randomizes data consistently, so identical inputs get the same output every time (e.g., John → Axel, all John remain Axel).
2. Unmasking Automatic Unmasking Restores original content in the processed file using the token mapping log generated during masking.

Install & Use

Support Our Threat Intelligence

If you find our technology report and cybersecurity news helpful, consider supporting our work.

Crypto QR Code
USDT (TRC20):
TN8BdV8cp4T1Cd28gK9qTAnZknzzuwyUtm
USDT (ERC20):
0x3725e1a7d3bc5765499fa6aaafe307fabcd75bce