Google announces open-source robots.txt parser
Google claims that the Robots Exclusion Protocol (REP) has been a standard for the past 25 years, bringing a lot of uncertainty to webmasters and crawler tool developers. Google now announced that it will take the lead in working to make REP become an industry standard, as part of this effort, it is open-source parser robots.txt own use, the source code is hosted on GitHub, using Apache License 2.0 license.
“robots.txt”by JeremyGrosser is licensed under CC BY-NC-SA 2.0
Google wrote:
“Today, we announced that we’re spearheading the effort to make the REP an internet standard. While this is an important step, it means extra work for developers who parse robots.txt files.”
“We open sourced the C++ library that our production systems use for parsing and matching rules in robots.txt files. This library has been around for 20 years and it contains pieces of code that were written in the 90’s. Since then, the library evolved; we learned a lot about how webmasters write robots.txt files and corner cases that we had to cover for, and added what we learned over the years also to the internet draft when it made sense.”