Claude Finds Vulnerabilities in Firefox
In a post, the AI startup detailed the partnership with Mozilla and how Claude Opus 4.6 was able to achieve this feat. The company said the frontier model was able to solve nearly all the tasks in the CyberGym benchmark, and the researchers decided to test the model in real-world scenarios. Since browsers come with a higher concentration of technically complex vulnerabilities, Anthropic decided to partner with Mozilla.
“We chose Firefox because it’s both a complex codebase and one of the most well-tested and secure open-source projects in the world. This makes it a harder test of AI’s ability to find novel security vulnerabilities than the open-source software we previously used to test our models,” the researchers said.
To prepare Claude Opus 4.6, the team built a dataset of older Firefox common vulnerabilities and exposures (CVEs) to see if the model could reproduce them. The large language model (LLM) was able to reproduce a large percentage of the historical CVEs without any issue. Then, the researchers tasked the model with finding new vulnerabilities in the (then) latest version of the browser.
At first, the exercise only focused on Firefox’s JavaScript engine, but later, other areas of the software were also included. By the time the experiment came to an end, Claude had analysed nearly 6,000 C++ files and had submitted a total of 112 unique reports. Each of these reports was validated by the team and submitted to Firefox. Claude was able to find 22 vulnerabilities and 14 high-severity bugs.
Anthropic revealed that most of the reported issues were fixed by Mozilla with the Firefox 148 update, and the remainder are said to be fixed in upcoming releases. Additionally, the browser company has also started using Claude internally for security purposes. The researchers said that they spent $4,000 (roughly Rs. 3,69,200) in application programming interface (API) credits for this experiment.