Simulation

Physics-Based Models vs. Machine Learning for Process Prediction

Lars Bergstrom
Lars Bergstrom  ·   ·  8 min read
Comparison visualization of physics-based model vs machine learning predictions for process plants

This debate runs through nearly every industrial data science project we've encountered: should you build a physics-based process model, or train a machine learning model on historian data? The answer matters operationally — not because one approach is categorically superior, but because they fail in fundamentally different ways, and in process plants, how a model fails is as important as how well it performs under normal conditions.

What ML Does Well in Process Contexts

Machine learning models — gradient boosted trees, LSTM networks, neural process models — are genuinely good at some things that physics-based models struggle with. They excel at capturing empirical correlations in high-dimensional sensor data where the underlying physical mechanisms are difficult to model from first principles. Fouling behavior on heat exchanger surfaces, catalyst deactivation kinetics, fermentation titer prediction from spectroscopic features — these are domains where the physics is messy, multiscale, or simply not well characterized, and ML can extract predictive signal from the historical data without requiring you to parameterize a complex mechanistic model.

In terms of pure interpolation accuracy — predicting outcomes for operating conditions that fall within the range of historical training data — a well-trained ML model on a large, clean historian dataset can match or exceed a physics-based model. If your process has been running stably for five years at consistent operating conditions, an LSTM trained on that data can produce tight yield predictions.

The Silent Failure Problem

The problem with ML models in process plant contexts is the mode of failure when operating conditions shift outside the training distribution. ML models fail silently — they continue to produce confident-looking predictions even when operating in regions where their training data provides no reliable guidance. There's no native mechanism in a gradient boosted tree or a standard neural network to say "I've never seen this combination of inlet conditions before, and my prediction here is essentially extrapolation."

Consider a scenario that plays out in specialty chemical plants regularly: a new feedstock supplier is qualified, and the new lot has slightly different trace impurity composition. The ML model has never been trained on this exact impurity profile. Its prediction output looks exactly the same as it would for the familiar feedstock — no confidence degradation, no uncertainty signal — and the operator has no way to know that the model is extrapolating into a region where its predictions may be unreliable.

For a non-critical monitoring application — a secondary dashboard metric, a suggested maintenance flag — this may be acceptable. For a model that's being used to drive real-time setpoint recommendations on a live reactor, silent failure is an operational risk that process engineers take seriously.

How Physics-Based Models Degrade Gracefully

A physics-based model encodes the governing equations of the process — heat balance, mass balance, reaction kinetics (Arrhenius or more complex network models), phase equilibrium (vapor-liquid equilibrium via modified Raoult's law or equations of state for non-ideal systems). When operating conditions shift, the model doesn't extrapolate in a vacuum — it applies the same equations that govern the physical system.

More importantly, a physics-based model has defined uncertainty boundaries. When a parameter falls outside its calibration envelope — say, an Arrhenius pre-exponential factor that was fit to data in a 80–100°C range is being applied at 110°C — the model can flag this explicitly. The prediction doesn't disappear, but it widens its confidence interval proportionally, and the system can signal to the operator that the current prediction is outside normal operating range.

This is what we mean by "graceful degradation." The physics-based model at 110°C gives you a noisier, wider-confidence prediction that's still anchored to physically plausible outcomes. The ML model at 110°C may give you a narrow, confident prediction that happens to be extrapolating far outside its training manifold.

The Building and Maintenance Cost Argument

The legitimate objection to physics-based models is the build cost. A rigorous first-principles model of a distillation column — with Murphree tray efficiencies, VLE calculations via a cubic equation of state, and a tuned dynamic response model — requires significant chemical engineering effort to develop and validate. An LSTM trained on historian data requires a good ML engineer and clean training data, but the domain-specific knowledge burden is lower.

This is a real tradeoff, and we're not dismissing it. The question is: what is the maintenance cost over time? A physics-based model needs to be recalibrated when the process changes — new catalyst, modified feedstock spec, heat exchanger replacement. But those updates are interpretable: you're changing specific parameters with physical meaning. An ML model needs to be retrained when the data distribution shifts, and the retraining process can be opaque — you may improve overall RMSE while inadvertently degrading performance on the specific failure modes you care about.

For a continuous-process plant running the same process for years, the physics-based model's ongoing maintenance is modest once the initial build is done. The ML model needs continuous monitoring of its prediction quality and periodic retraining — which, in our experience, plants often under-invest in, leading to models that are quietly degrading while still generating confident output.

The Hybrid Approach: Physics + Data-Driven Residuals

The practically interesting middle ground is a hybrid architecture: a physics-based model as the primary predictor, with a data-driven residual model that learns the systematic errors in the physics model and corrects them. This approach — sometimes called "grey-box" modeling in process systems engineering literature — captures the best of both approaches.

The physics model provides the structural constraint: predictions stay physically plausible even when extrapolating. The residual model, trained on the difference between physics-model predictions and actual measurements, accounts for effects the physics model doesn't capture — catalyst aging trends, seasonal utility variations, subtle heat exchanger fouling that wasn't explicitly modeled.

Critically, when operating conditions shift outside the training data, the residual model's contribution automatically decreases in weight (its confidence degrades, which can be computed via ensemble methods or Gaussian process uncertainty). The physics model carries the load in novel regimes. This is the architecture we use in Twynvex's twin engine — physics-first, with data-driven correction on top, and an explicit confidence signal that propagates through to the operator interface.

What This Means for Operator Trust

Operator trust in a prediction model is earned through transparent failure modes. If an operator asks "why is the twin predicting yield will drop?" and the answer comes back as a causal chain — inlet temperature is 2°C below design, which reduces rate constant by 3.8%, which lowers cumulative conversion by 2.1 percentage points, which will push distillate purity to 96.8% at current reflux ratio — that's a reasoning chain an experienced process engineer can evaluate and challenge.

If the answer is "the neural network's hidden layer activations indicate a yield decline," the operator cannot evaluate that. They either trust it blindly or dismiss it. Neither is operationally sound.

This isn't an argument that ML models are bad — it's an argument that black-box models are inappropriate as the primary driver for real-time process decisions in high-consequence manufacturing environments. The transparency of the physics-based causal chain is not a nice-to-have feature. It's the basis on which a control room operator can make an informed decision to override a recommended setpoint change. That interpretability is foundational, not optional.