I keep coming back to the same thing with agents: the model is not the product. The runtime is.
Give a model tools and it will do something interesting. Getting a system you can trust is different work. You have to decide what state exists before the next call, what the model is allowed to see, what actions it can take, and what evidence has to be observed before the workflow moves forward.
The parts I care about are boring in the right way: state machines, typed actions, validators, trace logs, escalation paths. The model can propose, classify, draft, or choose between bounded actions. The system around it owns the things that have to stay true.
Models are going to be wrong sometimes. What matters is whether the platform around the model can still enforce the properties you care about when that happens.