Virtual Metrology & Process Control

VM Model Building

Feature engineering from sensor traces, regression models, and rigorous validation for fab-grade predictions

Feature Engineering from Sensor Traces

Feature Engineering from Sensor Traces

The raw data from a process chamber is a collection of time-series traces — one per sensor, sampled at 1–100 Hz over the duration of the process step (often 30–300 seconds). Converting these into tabular features suitable for ML is the most critical step in VM.

Step-Based Segmentation

Semiconductor recipes have discrete steps (e.g., "strike plasma," "main etch," "over-etch," "purge"). Feature extraction should be done per step, not over the entire trace, because the physics and expected sensor behavior differ dramatically between steps.

import pandas as pd
import numpy as np
from scipy import stats

def extract_trace_features(trace: np.ndarray, sensor_name: str,
                           step_name: str) -> dict:
    """Extract summary statistics from a single sensor trace within one step."""
    features = {}
    prefix = f"{step_name}_{sensor_name}"

    # Basic statistics
    features[f"{prefix}_mean"] = np.mean(trace)
    features[f"{prefix}_std"] = np.std(trace)
    features[f"{prefix}_min"] = np.min(trace)
    features[f"{prefix}_max"] = np.max(trace)
    features[f"{prefix}_range"] = np.ptp(trace)
    features[f"{prefix}_median"] = np.median(trace)

    # Shape statistics
    features[f"{prefix}_skew"] = stats.skew(trace)
    features[f"{prefix}_kurtosis"] = stats.kurtosis(trace)

    # Trend features
    x = np.arange(len(trace))
    slope, intercept, r_value, _, _ = stats.linregress(x, trace)
    features[f"{prefix}_slope"] = slope
    features[f"{prefix}_r_squared"] = r_value ** 2

    # Integral (area under curve — proxy for total energy/dose)
    features[f"{prefix}_integral"] = np.trapz(trace)

    # Settling features (important for pressure, temperature)
    features[f"{prefix}_first_10pct_mean"] = np.mean(trace[:len(trace)//10])
    features[f"{prefix}_last_10pct_mean"] = np.mean(trace[-len(trace)//10:])
    features[f"{prefix}_settling_delta"] = (
        features[f"{prefix}_last_10pct_mean"] -
        features[f"{prefix}_first_10pct_mean"]
    )

    return features

# Example: 50 sensors × 4 steps × 14 features = 2,800 features per wafer

Key Concept: Domain-Guided Features

Generic statistics work, but domain knowledge unlocks better features. For example: in a CVD process, the ratio of precursor flow to carrier gas flow matters more than either alone. In plasma etch, the time derivative of optical emission at a specific wavelength signals endpoint. Always collaborate with process engineers to identify physically meaningful derived features.

Choosing the Right Model

Choosing the Right Model

VM is fundamentally a supervised regression problem with high dimensionality and limited labels. Here's what works in practice:

Model Landscape for VM

  • PLS (Partial Least Squares): The workhorse of traditional VM. Handles multicollinearity naturally, interpretable, fast. Often the baseline to beat.
  • LASSO / Elastic Net: Automatic feature selection via L1 penalty. Excellent when only a handful of sensors truly matter.
  • Gradient Boosting (XGBoost, LightGBM): Captures nonlinearities and interactions. Often the best accuracy, but requires more tuning and care with small datasets.
  • Neural Networks: Generally overkill for tabular VM. Exception: when using raw time-series traces (1D-CNN or LSTM) instead of hand-crafted features.
from sklearn.cross_decomposition import PLSRegression
from sklearn.linear_model import LassoCV, ElasticNetCV
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score
import numpy as np

def compare_vm_models(X_train, y_train):
    """Compare common VM model architectures."""
    models = {
        'PLS (10 components)': PLSRegression(n_components=10),
        'LASSO (auto-alpha)': LassoCV(cv=5, max_iter=10000),
        'ElasticNet': ElasticNetCV(cv=5, max_iter=10000),
        'GBM': GradientBoostingRegressor(
            n_estimators=200, max_depth=4,
            learning_rate=0.05, subsample=0.8
        ),
    }

    for name, model in models.items():
        scores = cross_val_score(
            model, X_train, y_train,
            cv=5, scoring='neg_mean_absolute_error'
        )
        mae = -scores.mean()
        std = scores.std()
        print(f"{name:25s}  MAE: {mae:.3f} ± {std:.3f} nm")

    # Typical results for etch depth VM:
    # PLS (10 components)         MAE: 0.45 ± 0.08 nm
    # LASSO (auto-alpha)          MAE: 0.42 ± 0.07 nm
    # ElasticNet                  MAE: 0.41 ± 0.07 nm
    # GBM                         MAE: 0.36 ± 0.09 nm

Analogy: The Right Tool for the Job

PLS is like a reliable sedan — it gets you there safely every time. LASSO is a sports car — fast and sleek but needs the right road. GBM is a pickup truck — powerful and versatile but burns more fuel. In VM, start with PLS as your baseline, then see if GBM's extra complexity buys meaningful improvement.

Validation Strategy for VM

Validation Strategy for VM

Standard k-fold cross-validation can be dangerously optimistic for VM. Semiconductor data has temporal and grouping structure that must be respected.

Why Random Splits Fail

  • Temporal leakage: If you randomly split, future wafers leak into training. In production, you only have past data to train on.
  • Lot correlation: Wafers from the same lot are highly similar. Random splits put correlated wafers in both train and test.
  • Chamber drift: A model validated on data from the same chamber state will look great — until the chamber is cleaned and all sensor baselines shift.
from sklearn.model_selection import TimeSeriesSplit
import pandas as pd
import numpy as np

def temporal_cv_for_vm(X, y, timestamps, n_splits=5):
    """Time-series cross-validation that respects temporal ordering."""
    # Sort by time
    sort_idx = timestamps.argsort()
    X_sorted = X.iloc[sort_idx]
    y_sorted = y.iloc[sort_idx]

    tscv = TimeSeriesSplit(n_splits=n_splits)
    results = []

    for fold, (train_idx, test_idx) in enumerate(tscv.split(X_sorted)):
        X_train = X_sorted.iloc[train_idx]
        y_train = y_sorted.iloc[train_idx]
        X_test = X_sorted.iloc[test_idx]
        y_test = y_sorted.iloc[test_idx]

        # Ensure no temporal leakage
        train_end = timestamps.iloc[sort_idx[train_idx[-1]]]
        test_start = timestamps.iloc[sort_idx[test_idx[0]]]
        assert train_end < test_start, "Temporal leakage detected!"

        model = GradientBoostingRegressor(n_estimators=200, max_depth=4)
        model.fit(X_train, y_train)
        preds = model.predict(X_test)
        mae = np.mean(np.abs(y_test - preds))
        results.append({'fold': fold, 'mae': mae, 'n_test': len(y_test)})
        print(f"Fold {fold}: MAE = {mae:.3f} nm (n={len(y_test)})")

    return pd.DataFrame(results)

# Additional validation: test on data AFTER a chamber clean event
# This is the hardest but most realistic test

Key Concept: Chamber Clean Boundary Test

The ultimate VM stress test: train on data from one "chamber life" (period between preventive maintenance events) and test on the next chamber life. If your model survives this, it's ready for production. Many models that look great on random splits collapse after a chamber clean because sensor baselines shift.

VM Performance Metrics

VM Performance Metrics

Fab engineers don't think in terms of RMSE or R². They care about whether VM predictions are good enough to replace or supplement physical metrology. Industry-standard metrics include:

Key Metrics

  • MAE / RMSE: Basic accuracy. Must be well within the metrology tool's own repeatability (typically 0.5–2% of the measured value).
  • MAPE (Mean Absolute Percentage Error): Normalized accuracy. Target: <1% for reliance VM.
  • Cp/Cpk of residuals: Process capability indices applied to VM errors. Cpk > 1.33 is a common reliance threshold.
  • Prediction interval coverage: Do the 95% confidence intervals actually contain 95% of true values?
import numpy as np
from scipy import stats

def vm_performance_report(y_true, y_pred, spec_limit=2.0):
    """Generate a fab-standard VM performance report."""
    residuals = y_true - y_pred
    n = len(residuals)

    mae = np.mean(np.abs(residuals))
    rmse = np.sqrt(np.mean(residuals ** 2))
    mape = np.mean(np.abs(residuals / y_true)) * 100
    r_squared = 1 - np.sum(residuals**2) / np.sum((y_true - y_true.mean())**2)

    # Cpk of residuals (how well-centered within spec)
    mu = np.mean(residuals)
    sigma = np.std(residuals)
    cpk = min(spec_limit - mu, spec_limit + mu) / (3 * sigma)

    # Coverage check for 95% prediction interval
    z_95 = 1.96
    lower = y_pred - z_95 * sigma
    upper = y_pred + z_95 * sigma
    coverage = np.mean((y_true >= lower) & (y_true <= upper)) * 100

    print(f"=== VM Performance Report ===")
    print(f"Samples:         {n}")
    print(f"MAE:             {mae:.4f} nm")
    print(f"RMSE:            {rmse:.4f} nm")
    print(f"MAPE:            {mape:.2f}%")
    print(f"R²:              {r_squared:.4f}")
    print(f"Cpk (residuals): {cpk:.2f}  {'PASS' if cpk > 1.33 else 'FAIL'}")
    print(f"95% PI Coverage: {coverage:.1f}%  {'PASS' if coverage >= 93 else 'FAIL'}")

    return {'mae': mae, 'rmse': rmse, 'mape': mape,
            'r2': r_squared, 'cpk': cpk, 'coverage': coverage}

Did You Know?

Metrology tools themselves have measurement uncertainty. An ellipsometer measuring 50 nm film thickness might have a repeatability of ±0.2 nm. If your VM model achieves MAE of 0.3 nm, you're approaching the noise floor of the measurement itself — going below that is impossible without a better reference.

Knowledge Check

Knowledge Check

1 / 3

Why should feature extraction from sensor traces be done per recipe step rather than over the entire trace?