eclipse: AI Powered Sensitive Information Detection

eclipse

Eclipse was designed as a part of Nebula Pro, the first AI-Powered Penetration Testing Application. Eclipse was designed to address the growing concerns surrounding sensitive data management. Unlike traditional methods, Eclipse is not limited to identifying explicitly defined sensitive information; it delves deeper, detecting any sentences that may hint at or contain sensitive information.

Sensitive Information Detection: Eclipse can process documents to identify not only explicit sensitive information but also sentences that suggest the presence of such data else where. This makes it a useful invaluable tool for preliminary reviews when you need to quickly identify potentially sensitive content in your documents.

Privacy Preservation: With concerns about data privacy in the context of Large Language Models (LLMs), Eclipse offers a potential solution. Before you send your data to APIs hosting LLM(s), Eclipse can screen your documents to ensure no sensitive information is inadvertently exposed.

Appropriate Use Cases for Eclipse:

Preliminary Data Screening: Eclipse is ideal for initial screenings where speed is essential. It helps users quickly identify potentially sensitive information in large volumes of text.

Data Privacy Checks: Before sharing documents or data with external parties or services, Eclipse can serve as a first line of defense, alerting you to the presence of sensitive information.

Limitations:

Eclipse is designed for rapid assessments and may not catch every instance of sensitive information. Therefore:

  • Eclipse should not be used as the sole tool for tasks requiring exhaustive checks, such as legal document review, where missing sensitive information could have significant consequences.

  • Consider using Eclipse alongside thorough manual reviews and other security measures, especially in situations where the complete removal of sensitive information is crucial.

Compatibility

Eclipse has been extensively tested and optimized for Linux platforms. As of now, its functionality on Windows or macOS is not guaranteed, and it may not operate as expected.

Understanding the Output

The script identifies entities in the text and classifies them into the following categories:

  • O: No entity.
  • NETWORK_INFORMATION: Information related to network addresses, protocols, etc.
  • BENIGN: Text that is considered safe or irrelevant to security contexts.
  • SECURITY_CREDENTIALS: Sensitive information like passwords, tokens, etc.
  • PERSONAL_DATA: Personal identifiable information (PII) like names, addresses, etc.

Install & Use

Copyright (c) 2023, berylliumsec