The Pentagon is eager to incorporate AI “agents”—software that can autonomously execute complex tasks like customer service, scheduling, or code writing—into more of what soldiers and defense civilians do. But a growing body of research shows that agents built from well-known large language models exhibit unpredictable and dangerous behaviors even in benign settings.
Edgerunner AI, a veteran-founded startup, built a different kind of agent tool for the military, one trained by former operators and experts on actual military tasks and in real combat settings. The Wednesday release of the new tool, WarClaw, is part of a trend away from large, big-name frontier models toward smaller, more custom ones that offer more user control.
Public interest in what’s called “agentic AI” rose 6,100 percent between Oct. 2024 and October 2025. Demand for software that can autonomously achieve complex tasks by designing and implementing processes and then fine-tuning the results without continuous prompting is forecast to rise from $4 billion last year to more than $100 billion by 2030.
The Defense Department is moving on the trend early. In January, as part of its AI strategy rollout, it announced development of an “Agent Network” to build “AI-enabled battle management and decision support, from campaign planning to kill chain execution” and to build a “playbook for rapid and secure AI agent development and deployment” for business processes.
Edgerunner’s AI agent, WarClaw, “searches and analyzes databases, interprets intelligence reports, pulls relevant information from the web, drafts documents and briefings, and automates routine processes. Integrations include Microsoft PowerPoint, Word, Excel, Teams, Outlook, and more,” according to a company statement. But it is very different from similar tools from well-known model makers like Anthropic, xAI, or OpenAI.
Tyler Xuan Saltsman, the founder of Edgerunner AI, told Defense One in an exclusive interview that agents that come from such companies pose a particular risk to the military; the claim is backed by recent scholarship.
In March, scientists from Harvard, MIT, and others found agents built from Anthropic’s Claude or Kimi, and then run in OpenClaw (agent software that works with large language models), exhibited “unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover.”
AI agents also offer an “illusion of control,” which is particularly dangerous in military contexts, according to a March paper from Cornell University. The researchers found that agentic systems can absorb corrections or resist assessments in ways that military planners and monitors can’t see because the processes to expose them don’t exist. “A waypoint-following drone cannot misinterpret an instruction; a pre-programmed targeting system cannot absorb a correction; a conventional sensor network cannot resist an operator’s assessment. Agentic systems can do all of these things, and current governance frameworks have no mechanisms for detecting, measuring, or responding to these failures,” the authors write.
AI agents derived from the most well-known large language models also don’t like following orders from military commanders, rejecting commands some 98 percent of the time, according to a paper Saltsman co-wrote. He told Defense One that his time working with these models compelled him to adopt a completely alternative approach to model development.
WarClaw
Products from OpenAI, Anthropic, and Grok represent only one way to develop high-functioning models—a method built on harvesting huge amounts of data from the internet and then both training and running them in large, energy-intensive data centers. Because these models are mostly consumer-facing, they’re designed to keep users asking questions, contributing prompts and data, and looking at advertisements. Those incentives help to explain why the models respond to users with flattery and reassurance, even when users are wrong.
Saltsman said the chronic inclination toward sycophancy in popular large language models is a national security problem, broadly. But for the military, it’s an even bigger risk.
So Edgerunner AI took a different tack. They use large enterprise cloud resources to train the models, but the models are capable of running on premises with no internet connection, which is essential for military operations. That gives the user much more control over how much (or little) energy goes into them. Unlike better-known large language models, they are trained on a highly curated data set that’s specific to the military, and the trainers include military subject matter experts and former operators.
Most importantly, he said, the agents are designed to run autonomously, saving the operator time and attention. But autonomous does not mean without human supervision or control. The models can’t just pick whatever strategy they might like to complete a task without their operator’s permission. And the processes are designed to be auditable and transparent, as opposed to the opaque functioning of other models.
That unique approach has caught the attention of military users via contracts and cooperative research and development agreements with the Kennedy Special Warfare Center and School that trains special forces groups, and Special Operations Command. The company is working with the Navy to integrate their software onto submarines and warships, via the Interagency Intelligence and Cyber Operations Network, and they are working with Lockheed Martin and the Army on the Next Generation Command and Control system.
The unique attributes of their approach to model building—curated data, communication independence, and control over process—are also what civilian users increasingly want from AI, according to surveys. That suggests future dual-use potential for the company in an environment where both consumers and warfighters are looking for alternative futures for AI.
Read the full article here







Leave a Reply