A study by North Carolina State University (NCSU) scholars showed that some GitHub repositories leak API tokens and cryptographic keys. The researchers analyzed more than one billion GitHub files distributed across millions of repositories. The researchers used the GitHub search API to capture and analyze 4,394,476 files from 681,784 libraries, and 23,237,633,353 files from 3,374,973 libraries were recorded in Google’s BigQuery database.
The researchers looked for text strings with specific API tokens or encryption key formats in these files and found 575,456 API and cryptographic keys, of which 201,642 were unique, all of which were distributed across more than 100,000 GitHub projects. There is little overlap between the key found using the Google Search API and the key found through the Google BigQuery data set. The researchers said that 6% of the APIs and encryption keys they track were deleted within an hour of the leak, more than 12% of the keys were deleted after one day, and 19% of the keys were exposed for 16 days.
Researchers said, “this also means 81% of the secrets we discover were not removed. It is likely that the developers for this 81% either do not know the secrets are being committed or are underestimating the risk of compromise.”