ML Models for PdM
Survival analysis, anomaly detection, RUL estimation, and deep learning
Types of PdM Models
Types of PdM Models
Different ML approaches address different PdM questions:
| Approach | Question Answered | Methods |
|---|---|---|
| Anomaly Detection | Is the tool behaving abnormally right now? | Isolation Forest, Autoencoders, PCA, One-Class SVM |
| Classification | Will this component fail within N hours? | Random Forest, XGBoost, Neural Networks |
| RUL Estimation | How many hours until failure? | LSTM, CNN on time series, survival models |
| Survival Analysis | What's the probability of survival past time T? | Cox PH, Weibull, Random Survival Forests |
Key Concept: The Rare Failure Problem
In a well-maintained fab, actual failures are rare (class imbalance: 99.9%+ normal). This creates challenges for supervised learning. Approaches: anomaly detection (unsupervised), synthetic oversampling (SMOTE), cost-sensitive learning, or semi-supervised methods that learn "normal" and flag deviations.
Deep Learning for PdM
Deep Learning for PdM
Deep learning has shown promise for PdM, particularly for directly modeling raw sensor time series:
- 1D-CNNs: Convolutional networks applied to sensor time series can automatically learn relevant temporal patterns without manual feature engineering.
- LSTMs/GRUs: Recurrent networks capture long-range dependencies across multiple process runs (e.g., slow drift over hundreds of runs).
- Transformer-based models: Attention mechanisms can identify which time steps and which sensors are most predictive of impending failure.
- Autoencoders: Learn a compressed representation of "normal" equipment behavior. Large reconstruction error = abnormal behavior.
In practice, gradient-boosted trees (XGBoost, LightGBM) on engineered features often outperform deep learning in this domain due to limited training data and the effectiveness of domain-informed features.
Survival Analysis and RUL Estimation
Survival Analysis and RUL Estimation
Survival analysis is the statistical backbone of PdM. The central object is the survival function:
S(t) = P(T > t)
i.e. the probability that a component is still alive at time t. The complement is the cumulative failure probability F(t) = 1 − S(t), and the instantaneous failure rate (hazard) is h(t) = f(t) / S(t).
1. The Weibull model — the workhorse
Fab equipment lifetimes are routinely fit with the two-parameter Weibull distribution:
S(t) = exp(−(t/η)β) h(t) = (β/η)(t/η)β−1
| Shape parameter β | Meaning | Typical fab example |
|---|---|---|
| < 1 | Decreasing hazard ("infant mortality") | New chamber after install — early-life bugs |
| = 1 | Constant hazard (memoryless / exponential) | Random faults — power supply, sensor failures |
| > 1 | Increasing hazard ("wear-out") | Heater coil, RF generator, focus ring |
2. Cox proportional hazards — using covariates
Adds an exponential effect of covariates x on the baseline hazard:
h(t | x) = h0(t) · exp(β·x)
This lets you say, e.g., "a 10% higher RF reflected power doubles the instantaneous failure rate," without committing to a specific h₀ shape.
3. RUL from a Weibull HI model
Once you have an estimated S(t) and the component has already survived to time tnow, the Remaining Useful Life is the expectation:
RUL(tnow) = E[T − tnow | T > tnow] = ∫tnow∞ [S(u)/S(tnow)] du
import numpy as np
from scipy.special import gamma
def weibull_rul(t_now: float, eta: float, beta: float) -> float:
"""Remaining useful life under a Weibull lifetime distribution.
Mean lifetime is eta * Gamma(1 + 1/beta); conditional mean
given survival to t_now uses numeric integration of S(u)/S(t_now).
"""
if t_now < 0:
raise ValueError("t_now must be non-negative")
# Integrate from t_now to a horizon ~5x mean
horizon = 5 * eta * gamma(1 + 1 / beta)
u = np.linspace(t_now, horizon, 4000)
S = np.exp(-(u / eta) ** beta)
S_now = np.exp(-(t_now / eta) ** beta)
return np.trapezoid(S / S_now, u)
# Example: focus ring with eta=1500 RF-hours, beta=2.5 (wear-out)
print(f"RUL at 800 RF-hr: {weibull_rul(800, 1500, 2.5):.0f} hours")
print(f"RUL at 1400 RF-hr: {weibull_rul(1400, 1500, 2.5):.0f} hours")
Key Concept: Censored Data
Most components on the floor right now haven't failed yet — their lifetimes are right-censored. Fitting Weibull/Cox models with maximum likelihood properly accounts for censoring (via the survival contribution S(t) for censored points). Use lifelines or scikit-survival in Python rather than ad-hoc dropping of unfinished runs.
Knowledge Check
Knowledge Check
1 / 3Why is anomaly detection often preferred over supervised classification for PdM?