The world of cybersecurity is undergoing a quiet revolution, and it's all thanks to the relentless advancements in artificial intelligence and machine learning. Researchers at the UK AI Security Institute (AISI) have been tracking the progress of large language models (LLMs) in cybersecurity tasks, and the results are astonishing. These models are not just getting better at replacing human professionals; they're doing it at an unprecedented rate.
One of the key metrics AISI uses is the "time window benchmark for cybersecurity." This benchmark estimates how much work an AI can do compared to a human in a given time frame. The findings are eye-opening. For instance, the model Claude Sonnet 4.5 can complete tasks that a human cybersecurity expert would take 16 minutes to finish, achieving this feat about 80% of the time, with a token limit of 2.5 million. This is a significant improvement from previous estimates, and it's happening fast.
The human-comparable task time, as measured by AISI, is shrinking rapidly. If token limits were removed, AI models might even outperform humans further. This rapid progress has led AISI to adjust its expectations. In February 2026, they reduced the expected doubling period for task times from 8 to 4.7 months, based on the progress made since late 2024. With the release of Anthropic's Mythos Preview and OpenAI's GPT-5.5, AISI had to recalculate again, and the news is even more impressive.
The new doubling time estimate is shorter than 4.7 months, and AISI points to similar findings from other AI research houses, like METR, which specializes in software engineering. These models are not just getting faster; they're also becoming more versatile. For example, the latest Mythos Preview checkpoint solved a complex 32-step simulated corporate network attack, and it managed to complete a previously unsolved challenge, a seven-step industrial control system attack, in three out of ten attempts.
To put this into perspective, another AI model, Opus 4.6, evaluated in February 2026, could only complete 22 out of 32 steps for the same simulated attack. This demonstrates the rapid evolution of AI capabilities in cybersecurity.
However, it's important to note that these benchmarks are narrow assessments, focusing on specific tasks rather than a broad range of cybersecurity capabilities. The real-world implications are still uncertain. The curl project, for instance, found only one confirmed vulnerability in the codebase of the Mythos model, which is a significant achievement but also a reminder that AI is not infallible.
In conclusion, the rapid progress in AI-powered cybersecurity is both exciting and concerning. While these models are becoming increasingly capable, the pace of their development and their real-world impact remain uncertain. As AISI points out, we need to continue monitoring and understanding these advancements to ensure that they enhance, rather than replace, human expertise in cybersecurity.