Deploying VM in Production

Handling model drift, building adaptive models, and estimating confidence for every prediction

Model Drift in Semiconductor Fabs

Unlike many ML domains where data distributions shift gradually, semiconductor processes experience abrupt, predictable drift events:

Sources of Drift

Preventive Maintenance (PM): Chamber cleaning, part replacement. Sensor baselines shift overnight. This is the #1 cause of VM model degradation.
Consumable wear: Focus rings, showerheads, and electrostatic chuck coatings wear slowly, causing gradual drift between PMs.
Recipe changes: Process engineers tweak recipes for yield improvement. Even small changes can invalidate VM models.
Upstream variation: Changes in incoming wafer properties (e.g., different film thickness from a preceding step) alter the relationship between sensors and outcomes.

Analogy: The Recalibrated Scale

Imagine you built a model to predict someone's weight from their height. Then everyone starts wearing heavy boots (PM event). Your model's predictions are all off by 5 kg — not because the model is wrong, but because the "input distribution" shifted. You need to recalibrate.

import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error

def detect_vm_drift(predictions, actuals, timestamps,
                    window_size=50, threshold_multiplier=2.0):
    """Detect VM model drift using moving-window residual monitoring."""
    residuals = actuals - predictions

    # Baseline statistics from qualification period
    baseline_mae = np.mean(np.abs(residuals[:window_size]))
    baseline_bias = np.mean(residuals[:window_size])
    baseline_std = np.std(residuals[:window_size])

    # Moving window monitoring
    drift_alerts = []
    for i in range(window_size, len(residuals)):
        window = residuals[i-window_size:i]
        current_mae = np.mean(np.abs(window))
        current_bias = np.mean(window)

        # Check for drift
        mae_ratio = current_mae / baseline_mae
        bias_shift = abs(current_bias - baseline_bias) / baseline_std

        if mae_ratio > threshold_multiplier or bias_shift > 3.0:
            drift_alerts.append({
                'timestamp': timestamps.iloc[i],
                'mae_ratio': mae_ratio,
                'bias_shift_sigma': bias_shift,
                'type': 'accuracy' if mae_ratio > threshold_multiplier else 'bias'
            })

    return pd.DataFrame(drift_alerts)

Adaptive and Self-Updating Models

Static models degrade. Production VM systems must adapt. Here are the major strategies:

1. Moving Window Retraining

Retrain the model periodically using only the most recent N wafers. Simple and effective, but requires a steady stream of physical metrology data.

2. Bias Correction (Global Bias + Local Bias)

Instead of retraining, adjust predictions by the recent average residual. This handles systematic shifts (like post-PM offset) cheaply.

3. Incremental / Online Learning

Update model weights with each new metrology measurement without full retraining. Works well for linear models; harder for tree ensembles.

import numpy as np
from collections import deque

class AdaptiveVMPredictor:
    """VM predictor with bias correction and drift detection."""

    def __init__(self, base_model, bias_window=20, confidence_window=50):
        self.base_model = base_model
        self.bias_window = bias_window
        self.recent_residuals = deque(maxlen=bias_window)
        self.confidence_residuals = deque(maxlen=confidence_window)
        self.global_bias = 0.0

    def predict_with_confidence(self, X):
        """Predict with bias correction and confidence interval."""
        raw_pred = self.base_model.predict(X.reshape(1, -1))[0]

        # Apply bias correction
        corrected_pred = raw_pred + self.global_bias

        # Confidence interval from recent residual distribution
        if len(self.confidence_residuals) >= 10:
            residual_std = np.std(list(self.confidence_residuals))
            ci_lower = corrected_pred - 1.96 * residual_std
            ci_upper = corrected_pred + 1.96 * residual_std
        else:
            ci_lower = ci_upper = None  # Not enough data

        return {
            'prediction': corrected_pred,
            'raw_prediction': raw_pred,
            'bias_correction': self.global_bias,
            'ci_lower': ci_lower,
            'ci_upper': ci_upper,
        }

    def update(self, actual_value, predicted_value):
        """Update bias correction when actual metrology arrives."""
        residual = actual_value - predicted_value
        self.recent_residuals.append(residual)
        self.confidence_residuals.append(residual)

        # Update global bias
        if len(self.recent_residuals) >= 5:
            self.global_bias = np.mean(list(self.recent_residuals))

Key Concept: PM-Aware Adaptation

Smart VM systems detect PM events (from equipment logs or MES) and take special action: widen confidence intervals immediately after PM, increase physical metrology sampling to rebuild the bias estimate, and optionally switch to a "post-PM" model variant trained specifically on post-PM data.

Confidence Estimation

A VM prediction without a confidence estimate is dangerous. The fab needs to know when to trust a VM prediction and when to send the wafer for physical measurement.

Approaches to VM Confidence

Residual-based: Use the distribution of recent residuals to estimate prediction intervals. Simple and effective.
Novelty detection: Measure how "different" the current wafer's FDC data is from training data. High novelty → low confidence.
Ensemble disagreement: Train multiple models; large disagreement → low confidence.
Conformal prediction: A principled, distribution-free framework that guarantees coverage probability.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neighbors import LocalOutlierFactor
import numpy as np

class VMConfidenceEstimator:
    """Estimate VM prediction confidence using novelty + ensemble methods."""

    def __init__(self, n_models=5):
        self.models = [
            GradientBoostingRegressor(
                n_estimators=200, max_depth=4,
                subsample=0.8, random_state=i
            ) for i in range(n_models)
        ]
        self.novelty_detector = LocalOutlierFactor(
            n_neighbors=20, novelty=True
        )

    def fit(self, X_train, y_train):
        for model in self.models:
            # Bootstrap sampling for diversity
            idx = np.random.choice(len(X_train), len(X_train), replace=True)
            model.fit(X_train[idx], y_train[idx])
        self.novelty_detector.fit(X_train)

    def predict_with_confidence(self, X):
        predictions = np.array([m.predict(X) for m in self.models])
        mean_pred = predictions.mean(axis=0)
        std_pred = predictions.std(axis=0)  # Ensemble disagreement

        # Novelty score (negative = more outlier-like)
        novelty_scores = self.novelty_detector.decision_function(X)

        # Combined confidence: low std AND high novelty score = high confidence
        # Normalize to 0-1 range
        confidence = 1.0 / (1.0 + std_pred) * np.clip(
            (novelty_scores + 2) / 4, 0, 1
        )

        return mean_pred, confidence, std_pred

Did You Know?

Some advanced fabs use a "smart sampling" strategy driven by VM confidence: if the VM model is confident about a wafer's metrology, skip physical measurement. If confidence is low, route the wafer to a metrology tool. This dynamically adjusts sampling rate — measuring more when it matters and less when the model is sure.

VM Model Building

ML-Enhanced Run-to-Run Control

Knowledge Check

1 / 3

What is the #1 cause of VM model degradation in production?

Software bugs in the data pipelinePreventive maintenance (PM) events that shift sensor baselinesRunning out of GPU memoryChanges in wafer diameter