Scoring Insecurity: Study Reveals Chaos in Vulnerability Ranking Systems
Against the backdrop of the rapidly growing number of vulnerabilities confronting companies worldwide, researchers from the Rochester Institute of Technology, the University of Hawaiʻi, and Leidos have conducted the most extensive comparative study to date of four of the most widely used public vulnerability scoring systems—CVSS, EPSS, SSVC, and the Exploitability Index.
The authors analyzed 600 real-world vulnerabilities from Microsoft’s Patch Tuesday disclosures to determine how consistent these systems are with one another, how effectively they handle prioritization, and how accurately they predict the likelihood of exploitation.
The findings proved deeply concerning: all four systems displayed stark divergences in their assessments of the same CVEs. Correlation between them was extremely low—so much so that the perceived severity of a threat often depended entirely on which framework was used. This created a paradoxical situation in which the same vulnerability could be deemed critical by one system and disregarded entirely by another.
In practice, this leads to chaos in decision-making. The systems frequently cluster hundreds of CVEs into the same “top-tier” categories, offering little meaningful differentiation. For example, CVSS and the Exploitability Index classify more than half of vulnerabilities as high-priority, while EPSS highlights only four, creating the opposite issue of excessive selectivity and a heightened risk of overlooking dangerous flaws.
Special attention was given to EPSS as a predictive tool. Despite its stated goal of forecasting the probability of exploitation within 30 days, fewer than 20% of CVEs later known to have been exploited received high EPSS scores before appearing in CISA’s KEV catalog. Moreover, 22% of vulnerabilities had no EPSS score at all until after exploitation was confirmed. This severely undermines its reliability as a preventive instrument.
SSVC, meanwhile, provides qualitative decision categories such as “track” or “act.” Yet it too presents challenges, as its decisions rely on the difficult-to-compare metric of “impact on mission and well-being,” which hampers cross-organizational consistency.
The researchers also examined whether vulnerability types (as categorized by CWE) influenced scoring discrepancies. They found no systematic connection: even within the same CWE, assessments varied widely, highlighting the autonomous logic of each system and the absence of a universal framework.
The study makes clear that using any of these systems in isolation, without contextual adaptation to the specific needs of an organization, risks producing misleading priorities. The authors recommend against treating a single framework as an absolute source of truth. Instead, they urge combining multiple indicators, enriched with internal data and organizational policies. In particular, they emphasize the need to distinguish between severity and likelihood of exploitation—two fundamentally different axes requiring different evaluative tools.
Ultimately, the research underscores the necessity of rethinking the entire approach to vulnerability scoring. Modern organizations require not a solitary numerical value, but transparent, interpretable, and contextually adaptable tools that account for real-world exploitation conditions, the criticality of assets, and business logic. Only then can a truly effective and trustworthy vulnerability management process be achieved.