Why LLMs Still Fail at Real-World Mathematical Optimization

The short answer

LLMs still fail at real-world mathematical optimization because most business optimization problems are not fully described in the prompt. The missing pieces are usually the most important ones: implicit constraints, messy data, unspoken business rules, service commitments, cost tradeoffs, and the managerial judgment needed to decide what should be optimized in the first place.

A model can generate a Gurobi, Pyomo, or OR-Tools script in seconds. That is useful. But in enterprise operations, the script is rarely the bottleneck.

Optimization is not a code generation problem. It is a disciplined translation of business reality into mathematical structure, executable data, and accountable decisions.

That distinction matters for supply chain leaders, CFOs, COOs, and analytics teams. A beautiful model that solves the wrong problem is not an innovation. It is a faster way to produce operational risk.

The demo is easy. The enterprise problem is not.

The classic AI optimization demo is seductive. A user writes a short description:

We have warehouses, customers, shipping costs, and demand. Minimize total cost.

The model returns Python code with decision variables, an objective function, and demand constraints. The solver runs. A neat allocation appears.

The issue is that real businesses do not operate inside textbook examples. A distribution network may include hundreds of facilities, thousands of SKUs, multi-period demand, capacity calendars, temperature zones, carrier constraints, minimum order quantities, labor rules, customs limitations, penalty clauses, and exceptions that everyone in operations knows but nobody documented formally.

The model may ask for shipping costs. The company may only have GPS coordinates, historical invoices, and carrier rate cards. The model may need monthly demand by SKU and region. The company may have transaction-level sales lines, returns, substitutions, promotions, and stockout distortions. The model may assume all routes are available. The logistics team knows that three of them are commercially impossible during peak season.

This is why prompt-only optimization breaks down. The prompt is not the problem. The organization is the problem space.

Why one prompt is not enough

The common failure pattern is simple: the LLM assumes that the text it receives contains the full optimization problem. In professional environments, it almost never does.

Several gaps appear immediately:

The objective function is vague or politically contested.
Key constraints are known by people, not systems.
Operational data exists, but not in solver-ready form.
The mathematical formulation requires assumptions that were never approved.
The generated code may run, but produce decisions that violate business reality.
The output is hard to audit, explain, or defend in a management meeting.

A warehouse allocation model, for example, may minimize transport cost while ignoring service levels. A production planning model may maximize throughput while ignoring changeover time. A route optimization model may reduce distance while ignoring delivery windows, driver rules, and customer priority.

These are not minor bugs. They are formulation failures.

The real work: turning operations into mathematics

Mathematical optimization has always required translation. AI does not remove that requirement. It can accelerate it when the process is designed correctly.

A serious optimization workflow must answer several questions before code appears:

What decision is actually being made?
Who owns the decision and who is affected by it?
What is the primary objective: cost, revenue, service, resilience, speed, or risk reduction?
Which constraints are hard constraints and which are negotiable penalties?
What data is required, and where does it live?
Which parameters must be derived from raw operational data?
How will the solution be validated against reality?
Which human approvals are needed before execution?

This is where deep AI knowledge and deep business knowledge meet. AI in operations is not a purely technical field. It combines applied mathematics, software engineering, process design, domain expertise, management experience, and organizational change.

The best practitioners are rarely people who only know how to prompt a model. They understand how decisions flow through a business.

ORPilot points to the right pattern

A more mature direction is emerging through agentic optimization systems. ORPilot, an open-source project developed by Guangrui Xie, is a useful example because it changes the sequence of work.

Instead of jumping straight from prompt to solver code, it behaves more like a structured optimization consultant. It interviews the user, clarifies the objective, identifies decision variables, defines constraints, determines required data schemas, generates code, executes the model, repairs failures, and explains results in business language.

That sounds less magical than a one-shot answer. It is also far more useful.

The important idea is not a specific tool. The important idea is the operating model:

Clarify the business decision.
Define the mathematical formulation.
Specify the data structure.
Transform raw data into parameters.
Generate solver code.
Run in a controlled environment.
Validate feasibility and business logic.
Explain the recommendation.
Route exceptions to a qualified human.

This is the difference between a chatbot and a professional system.

A better agent does not guess. It interrogates.

For optimization use cases, an AI agent should be slightly uncomfortable to work with at the beginning. It should ask questions. It should refuse to solve an underdefined problem. It should surface missing assumptions rather than quietly invent them.

A good interaction may look like this:

Goal: minimize total fulfillment cost
Decisions: shipment quantity from each facility to each customer
Required data: demand, facility capacity, shipping cost, service level rules
Hard constraints: demand satisfaction, capacity, prohibited lanes
Soft constraints: late delivery penalties, preferred facility usage
Human review: approve objective and constraints before solver execution
Validation: compare solution against historical operating rules

This structure is not bureaucracy. It is quality control.

In non-deterministic AI workflows, human-in-the-loop is essential. But the goal is not to place a person in front of every micro-decision. That would destroy the productivity gain. The goal is to redesign supervision so that one experienced operator who previously managed a single process can now oversee hundreds of AI-supported processes.

Human judgment should be applied at leverage points:

Approving objectives and tradeoffs.
Validating new constraint categories.
Reviewing low-confidence or high-impact recommendations.
Handling exceptions with financial or customer risk.
Auditing outcomes and improving the process.

That is how AI creates operational efficiency without turning governance into a bottleneck.

The CFO should care about formulation risk

Optimization errors are not abstract technical issues. They become financial outcomes.

A flawed model can recommend inventory moves that increase working capital. It can cut transport cost while triggering service penalties. It can reduce labor hours on paper while creating overtime in another facility. It can maximize utilization while increasing fragility in the supply chain.

For finance leaders, the central question is not whether the AI can produce an answer. The question is whether the answer reflects the economic reality of the business.

This requires auditability. Every AI-generated optimization recommendation should be traceable:

Which objective was optimized?
Which constraints were included?
Which constraints were excluded?
Which data sources were used?
Which assumptions were introduced?
What changed from the previous run?
Who approved the model for operational use?

Without this discipline, AI optimization becomes a black box with a confident interface.

Why internal capability matters

Organizations should not treat AI optimization as a collection of clever external demos. They need internal capability to build, manage, evaluate, and govern AI agents.

This includes two parallel tracks.

The first track is AI literacy. Employees must learn how to communicate effectively with models, how to challenge outputs, how to define problems, and how to recognize when a model is overconfident. This is now a core business skill.

The second track is agent development. Companies need platforms and operating procedures for building AI agents quickly, connecting them to enterprise systems, controlling permissions, monitoring behavior, and improving them over time.

In many organizations, information systems departments will gradually become a kind of human resources function for AI agents. They will onboard agents, assign access, monitor performance, retire weak agents, and ensure each one has a clear role, owner, and escalation path.

Tooling will vary. Microsoft-heavy environments may use Copilot Studio for ecosystem-native agents. Technical teams may adopt workflow platforms such as n8n where governance allows it. Claude Code and similar coding-oriented systems are already highly practical for implementation work, while enterprise deployment still requires careful security review. The tool decision matters, but it matters less than the organizational capability around it.

The danger of shallow AI expertise

There is a growing market of self-declared AI experts who can produce impressive demos but have limited experience with real business processes, data quality problems, management constraints, or operational accountability.

Large enterprises often have enough internal maturity to filter weak advice. Small and mid-sized companies are more exposed. They may invest in fragile automation, accept unrealistic promises, or deploy AI workflows without proper governance.

AI is a multidisciplinary professional field. Academic depth matters. Applied experience matters. Management understanding matters. In optimization specifically, the gap between a working demo and a reliable operational system is large enough to affect margins, service levels, and trust.

Companies should look for people who can discuss not only models and prompts, but also process design, solver behavior, data engineering, risk, incentives, and change management.

What enterprises should do now

For organizations considering AI-assisted optimization, the right move is not to wait. The right move is to start with discipline.

Begin with bounded use cases where the decision is valuable but controllable. Examples include internal resource allocation, shipment consolidation, workforce scheduling assistance, or procurement scenario analysis. Avoid handing autonomous control to AI before the organization can validate formulation quality and data reliability.

A practical adoption sequence looks like this:

Select one operational decision with measurable financial impact.
Document the current human decision process.
Define the objective and constraints with business owners.
Map the raw data to solver-ready parameters.
Build an agentic workflow with interview, validation, execution, and explanation stages.
Keep humans responsible for approvals at the right leverage points.
Measure decisions against historical performance and business exceptions.
Expand only after the process is stable.

This approach is slower than a demo and faster than a failed transformation.

The strategic lesson

LLMs are not useless for mathematical optimization. They are extremely useful when placed inside a disciplined workflow. They can draft formulations, generate code, inspect errors, produce explanations, and help analysts move faster.

But they should not be treated as independent optimization experts. Not yet.

The future belongs to agentic systems that combine structured questioning, data preparation, solver integration, automated testing, business explanation, and human oversight. The winners will not be the companies with the flashiest chatbot. They will be the companies that know how to turn AI into a reliable operating capability.

Real optimization is not about asking a model for an answer. It is about building a system that knows when an answer is not good enough.