OSINT-Collector: The Comprehensive Framework for Targeted Intelligence Gathering
OSINT-Collector
OSINT-Collector is an advanced framework that facilitates the collection, analysis, and management of OSINT information useful for conducting investigations in specific domains of interest.
Design and Architecture
In this framework has been used an Ontology approach:
- The OSINT Ontology describes how data extracted from OSINT sources should be inserted into the database, including their respective properties and relationships.
- Domain Ontologies describe various domains of interest. These ontologies are utilized to link the extracted data to entities within these domains, enabling deeper inferences.
Using the graphical interface, the user can select an OSINT tool, input required parameters, and initiate execution to perform a specific search. This execution request is sent via an HTTP request to the Launcher, which then executes the requested tools using the corresponding inputs. The resulting data are aggregated, filtered and sent via an HTTP request to the backend, which communicates with the database and performs the following operations:
- Insertion and linking of data based on the schema described by the OSINT Ontology.
- Analysis of textual documents using NLP techniques provided by cloud services to extract suspicious entities and moderate the text to identify dangerous categories.
- Linking of entities and categories extracted in the previous phase with the domain ontologies.
The user can visualize the search results through the graphical interface, with the framework highlighting the identified contents during the analysis, emphasizing suspicious entities and categories. Users can conduct further, more in-depth searches.
Using the OSINT Ontology allows for easily including new OSINT sources to leverage.
Interaction Flow
1. Tool Selection:
- The user selects the desired tool through the graphical interface based on the available capabilities. For instance, the user might choose “download telegram messages”.
- The frontend then sends a POST request to the Spring server to determine the required inputs for executing the chosen tool.
2. Input Provisioning:
- The server queries the database and returns the result to the frontend, which displays a form where the user must enter the required inputs. For example, the user might be prompted to enter the “username” of the Telegram group to analyze.
3. Tool Execution:
- The user enters the necessary inputs, such as the Telegram group username (e.g., “@osintcollector_group”).
- Upon clicking the “launch” button, the frontend sends a POST request to the launcher containing the tool and its respective inputs.
- The launcher then builds a Docker container using the tool image and launches it with the corresponding command (e.g., using the “telegram-tracker” image).
4. Output Processing:
- The container generates an output file containing all the messages downloaded from the Telegram group and sends it to the launcher.
- The launcher initiates and starts the Logstash container, which processes the output file obtained in the previous step.
- Logstash filters and refines the data, producing a new file containing the results of the tool, and sends it to the Spring server.
5. Data Storage and Analysis:
- The Spring server saves all the messages to the database along with their respective relationships and properties through queries.
- The database asynchronously analyzes the textual part of these messages by making requests to the Google Cloud APIs. These APIs return moderation categories and entities, which are then linked in the graph to their respective messages.
6. Accessing Results:
- The user can access the results by clicking on the “Telegram results” section.
- They can select the recently analyzed username from the list, and the frontend sends a request to the Spring server to retrieve all messages from the database.
- The frontend then displays the messages in a structured manner, highlighting all the information obtained and extracted in the previous steps.
This interaction pattern repeats every time a new OSINT search is initiated by the user, utilizing the available tools within the framework.