Beyond the Lab: The Troubling Reality of Facial Recognition on Our Streets

Stories of mistaken arrests caused by facial recognition technology are no longer rare. In 2020, Detroit resident Robert Williams was taken into custody after a faulty match generated by a low-quality surveillance image. Four years later, a similar incident unfolded in London: activist Shaun Thompson was misidentified as a criminal by the Live Facial Recognition system, resulting in an aggressive police stop.

An independent audit of the London Metropolitan Police’s trials revealed that out of 42 supposed matches, only eight proved accurate. Despite such failures, the technology continues to be deployed in airports, shopping centers, and city streets, its adoption justified by laboratory statistics boasting up to 99.95% accuracy. Yet these numbers are deeply misleading—the reality on the ground is far less precise.

To understand the gulf between laboratory results and practical outcomes, one must examine the nature of benchmark tests conducted by the U.S. National Institute of Standards and Technology (NIST). Its Facial Recognition Technology Evaluation (FRTE) has become the gold standard underpinning adoption worldwide, including by British police. But these tests are ill-suited to measure performance in uncontrolled, real-world environments. They may demonstrate how a system works in an airport terminal, but reveal little about its reliability on a crowded street or in poor lighting. The reports, in turn, create an illusion of near-infallibility, while in reality the systems falter amid everyday noise and interference.

To build test sets, researchers compile databases of photographs against which algorithms are tasked with finding matches. Yet these collections come with significant limitations. First, the images are too “perfect”: static, evenly lit, and free of distortions. In the real world, faces are obscured by masks, glasses, shadows, motion blur, or crowds. Even NIST’s attempts to include webcam shots failed to bridge the gap, as those images remain far cleaner than typical footage from street cameras.

Second, the sheer scale of operational databases far exceeds laboratory sets. While test collections may contain millions of images, real police systems often handle hundreds of millions of profiles. The larger the pool, the higher the risk of false positives. Yet current standards do not adequately account for this exponential increase in error rates.

Third, demographic representation is uneven. Algorithms trained primarily on light-skinned subjects perform markedly worse on darker-skinned individuals, producing systemic biases. Reports by the UK’s National Physical Laboratory—the very documents supporting London’s use of facial recognition—barely account for adolescents and exclude children under twelve altogether, though minors are often subject to street-level scans. This gap renders official conclusions incomplete and casts doubt on the legitimacy of deploying such technology against youth.

There is now an urgent need to move beyond laboratory trials to independent, large-scale evaluations in real-world conditions. New assessment methods must measure accuracy in crowded spaces, across broad populations, and within diverse demographic groups. Equally critical is the establishment of legally binding minimum accuracy thresholds for applications in sensitive domains such as criminal investigations. Without real-world data and transparent oversight, decisions remain rooted in statistics divorced from reality, perpetuating cases like those of Williams and Thompson.

A study published in May 2025 by criminologists and computer scientists at the University of Pennsylvania added weight to these concerns. The authors demonstrated that as image quality declines, algorithmic accuracy plummets—particularly with blurred frames, altered angles, or low resolution. Moreover, these errors fall disproportionately on racial and gender minorities, with false matches and misidentifications significantly more likely among them.

While researchers note that, on average, facial recognition may surpass some traditional forensic methods—including fingerprinting and ballistics—their emphasis lies elsewhere: in practice, image quality becomes the critical factor, capable of transforming an advanced tool into an instrument of discrimination.

The problems are not purely technical. A 2023 report by the U.S. Government Accountability Office revealed that many American law enforcement agencies deploy facial recognition without adequate staff training or civil-rights policies. The consequences are starkly illustrated in the Algorithmic Justice League’s Comply to Fly? study, which found that the Transportation Security Administration uses facial recognition systems without properly informing passengers. Travelers often remain unaware that they may opt out of scans, and two-thirds of those who attempt to do so face hostility from TSA staff.

Against this backdrop, NIST has issued new recommendations on detecting “morphed” faces—digital composites blending features of multiple individuals—designed to evade authentication systems.

A February 2024 report prepared for the Innocence Project by researcher Alexandria Sanford highlighted that confirmed wrongful identifications are already on record: of seven known cases, six involved Black citizens. In 2025, the Electronic Frontier Foundation added two more names to the list of Americans wrongly arrested. Civil rights advocates insist that, regardless of claimed accuracy rates, the very use of facial recognition in policing is too dangerous to be tolerated—and must be outlawed.