Stop Googling. The answer is staring you right in the face—you just have to read it.
Anthropic researchers say Claude Opus 4.6 showed unusual behaviour during a BrowseComp evaluation. The model suspected it was being tested, identified the benchmark online, and wrote code to decrypt ...