Due to security flaws, Google’s TensorFlow project completely abandons support for YAML

TensorFlow, an open-source machine learning and artificial intelligence project provided by Google, has already given up support for YAML. The reason for the abandonment is that there may be security issues in continuing to support YAML. In the latest version, Google removes YAML support to solve the problem of untrusted deserialization vulnerability execution. The vulnerability number is CVE-2021-37678. The severity level is high and the CVSS score is 9.3. It was submitted to Google by researcher Arjun Shibu.

TensorFlow Quantum

YAML is a more readable format for expressing data serialization. Researchers found that the TensorFlow code loads the yaml.unsafe_load() function, using the flaw, the attacker can execute arbitrary code when the application deserializes the Keras model provided in YAML format. Deserialization vulnerabilities usually occur when applications read malformed or malicious data from untrue sources. The deserialization vulnerability in TensorFlow may lead to DoS denial of service. Worse, this vulnerability can even execute arbitrary code. This is why the CVSS score of this vulnerability reaches 9.3 points (out of 10 points).

The unsafe_load function can deserialize YAML data fairly freely and can parse all tags even those known to be untrusted tags. Ideally, unsafe_load should only be called on input from trusted sources without any malicious content, however, the attacker can also use the deserialization mechanism to execute the code that the attacker wants to execute by injecting malicious payload into the YAML data that has not been serialized.

After the researchers notified Google of the vulnerability, the maintainers of TensorFlow decided to completely abandon the use of YAML and switch to JSON deserialization. The maintainer also gave examples of other vulnerabilities and fixes caused by YAML. The maintainer recommends that developers use JSON deserialization instead of YAML, or a better alternative to H5 serialization.

It is worth noting that TensorFlow is not the only project that uses the YAML unsafe_load function. Searching on Github can find that a large number of Python projects use this unsafe function. In view of potential security issues, these projects should solve this problem in time, and developers who use these projects should also pay attention to safety.

In addition, TensorFlow is expected to resolve the vulnerability in version 2.6.0, that is, to remove YAML support. The previous versions of 2.5.1, 2.4.3, and 2.3.4 will also be fixed. Developers who use this project should upgrade to the latest version in time.