AI CAPTCHA Solver: New Tool Uses GPT-4o and Gemini to Beat Various Web Security Challenges

AI-Powered CAPTCHA Solver

This project is a Python-based command-line tool that uses large multimodal models (LMMs) like OpenAI’s GPT-4o and Google’s Gemini to automatically solve various types of CAPTCHAs. It leverages Selenium for web browser automation to interact with web pages and solve CAPTCHAs in real-time.

A successful solve is recorded as a GIF in the successful_solves directory.

Key Features

  • Multiple AI Providers: Supports both OpenAI (e.g., GPT-4o) and Google Gemini (e.g., Gemini 2.5 Pro) models.
  • Multiple CAPTCHA Types: Capable of solving a variety of CAPTCHA challenges.
  • Browser Automation: Uses Selenium to simulate human interaction with web pages.
  • Extensible: The modular design makes it easy to add support for new CAPTCHA types or AI models.
  • Benchmarking: Includes a script to test the performance and success rate of the solvers.

Supported CAPTCHA Types

The tool can solve the following CAPTCHA types found on the 2captcha.com/demo/ pages:

  1. Text Captcha: Simple text recognition.
  2. Complicated Text Captcha: Text with more distortion and noise.
  3. reCAPTCHA v2: Google’s “I’m not a robot” checkbox with image selection challenges.
  4. Puzzle Captcha: Slider puzzles where a piece must be moved to the correct location.
  5. Audio Captcha: Transcribing spoken letters or numbers from an audio file.

How It Works

  1. Launch Browser: The script starts a Firefox browser instance using Selenium.
  2. Navigate: It goes to the demo page for the specified CAPTCHA type.
  3. Capture: It takes screenshots of the CAPTCHA challenge (image, instructions, or puzzle).
  4. AI Analysis: The captured images or audio files are sent to the selected AI provider (OpenAI or Gemini) with a specific prompt tailored to the CAPTCHA type.
  5. Get Action: The AI returns the solution (text, coordinates, or image selections).
  6. Perform Action: The script uses Selenium to enter the text, move the slider, or click the correct images.
  7. Verify: The script checks for a success message to confirm the CAPTCHA was solved.

Install & Use

Support Our Threat Intelligence

If you find our technology report and cybersecurity news helpful, consider supporting our work.

Crypto QR Code
USDT (TRC20):
TN8BdV8cp4T1Cd28gK9qTAnZknzzuwyUtm
USDT (ERC20):
0x3725e1a7d3bc5765499fa6aaafe307fabcd75bce