What is Virtual Metrology?

Predicting metrology from sensor data — why it matters for cost, speed, and 100% coverage

The Metrology Bottleneck

In semiconductor manufacturing, metrology — the act of measuring wafer properties like film thickness, critical dimension (CD), overlay, or etch depth — is essential for maintaining process quality. But physical metrology is expensive and slow:

Sampling rate: Typically only 5–10% of wafers are measured. The rest fly blind.
Cycle time: A wafer might wait hours for metrology, blocking downstream processing.
Cost: Metrology tools (SEM, ellipsometer, OCD) cost $5–20M each and require dedicated operators.

Analogy: The Hospital Blood Test

Imagine a hospital that can only run blood tests on 1 in 10 patients. The other 9 are treated based on symptoms alone. Virtual metrology is like building an ML model that predicts blood test results from vital signs (temperature, blood pressure, heart rate) — the "sensors" you already have. It's not a replacement for the lab, but it covers the 90% gap.

Key Concept: Virtual Metrology (VM)

Virtual Metrology uses ML models to predict post-process metrology values from equipment sensor data (FDC — Fault Detection and Classification data) collected during wafer processing. The equipment already logs thousands of sensor traces per wafer — pressures, temperatures, RF power, gas flows, endpoint signals — all at sub-second resolution.

Why VM Matters

Virtual Metrology delivers three game-changing capabilities:

1. 100% Wafer Coverage

Instead of sampling 5–10% of wafers, VM provides predictions for every single wafer. This means no wafer passes through the fab un-measured — defective wafers are caught immediately, not lots later.

2. Faster Feedback

Physical metrology introduces 2–8 hour delays. VM predictions are available within seconds of process completion, enabling real-time control loops that would be impossible with physical measurements alone.

3. Cost Reduction

With VM providing reliable predictions, fabs can reduce physical metrology sampling — freeing up expensive tools and operators for other tasks, or deferring metrology tool purchases entirely.

Did You Know?

A single advanced fab may have 50+ metrology tools costing $500M+ in aggregate. VM doesn't eliminate them, but even a 30% reduction in sampling saves tens of millions annually while improving quality coverage.

The VM Data Pipeline

import pandas as pd
import numpy as np

# Typical FDC sensor data: one row per wafer, thousands of features
# Each feature is a summary statistic of a sensor trace during processing
fdc_data = pd.DataFrame({
    'wafer_id': ['W001', 'W002', 'W003'],
    'chamber_pressure_mean': [4.52, 4.51, 4.53],
    'chamber_pressure_std': [0.012, 0.015, 0.011],
    'rf_power_mean': [250.1, 249.8, 250.3],
    'rf_power_std': [1.2, 1.5, 1.1],
    'gas_flow_Ar_mean': [100.2, 100.1, 100.3],
    'etch_time': [45.2, 45.1, 45.3],
    'endpoint_signal_peak': [0.87, 0.85, 0.88],
    # ... typically 500-5000 features per wafer
})

# Metrology target: the value we want to predict
metrology = pd.DataFrame({
    'wafer_id': ['W001', 'W002', 'W003'],
    'etch_depth_nm': [52.1, 51.8, 52.3],  # measured by ellipsometer
})

# In reality: ~70% of wafers have FDC but NO metrology
# VM fills this gap
print(f"FDC coverage: 100% of wafers")
print(f"Metrology coverage: ~10% of wafers")
print(f"VM goal: predict metrology for the other 90%")

VM System Architecture

A production VM system has several interacting components:

Data Sources

FDC (Fault Detection and Classification): Equipment sensor traces — the primary input features.
Context data: Recipe ID, chamber ID, wafer slot position, lot history, consumable age (e.g., hours since last chamber clean).
Upstream metrology: Measurements from previous process steps (e.g., incoming film thickness before etch).

The Modeling Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import GradientBoostingRegressor

# A typical VM pipeline
vm_pipeline = Pipeline([
    ('scaler', StandardScaler()),           # Normalize sensor ranges
    ('pca', PCA(n_components=50)),          # Reduce 2000+ features
    ('model', GradientBoostingRegressor(    # Predict metrology
        n_estimators=200,
        max_depth=5,
        learning_rate=0.05
    ))
])

# Key challenge: feature count >> sample count
# 2000 FDC features, but only 500 labeled wafers (those with metrology)
print("This is a classic high-dimensional, low-sample-size problem")
print("PCA, LASSO, or domain-guided feature selection are essential")

Key Concept: FDC Summary Statistics

Raw sensor traces are time-series (e.g., pressure sampled at 10 Hz over a 60-second step). Before modeling, these are summarized into statistics: mean, std, min, max, slope, integral, peak value, time-to-peak, etc. A single process step with 50 sensors and 10 statistics each yields 500 features — and multi-step recipes can produce 2,000–5,000 features per wafer.

Types of Virtual Metrology

Conjecture VM vs. Reliance VM

The SEMI E133 standard defines two tiers:

Conjecture VM: Predictions used for monitoring and trending only. No process decisions are made based on them. Lower accuracy bar — useful for early warning.
Reliance VM: Predictions trusted enough to replace physical metrology for process control decisions. Requires rigorous validation, confidence bounds, and continuous monitoring.

Global vs. Local Models

Global VM: One model trained on data from all chambers running the same recipe. Simpler to maintain, but may miss chamber-specific quirks.
Local VM: Separate models per chamber. Better accuracy, but requires more labeled data per chamber and more maintenance overhead.

Did You Know?

Leading fabs like TSMC and Samsung have deployed VM on hundreds of process steps. TSMC's "VM Mark II" system reportedly covers 80%+ of critical process steps with conjecture-level VM, and a growing fraction with reliance-level VM for active process control.

# Global vs. Local VM comparison
from sklearn.model_selection import GroupKFold
from sklearn.metrics import mean_absolute_error

def evaluate_global_vs_local(X, y, chamber_ids):
    """Compare global model vs per-chamber local models."""
    # Global model: train on all chambers
    gkf = GroupKFold(n_splits=5)
    global_errors = []
    for train_idx, test_idx in gkf.split(X, y, groups=chamber_ids):
        vm_pipeline.fit(X.iloc[train_idx], y.iloc[train_idx])
        preds = vm_pipeline.predict(X.iloc[test_idx])
        global_errors.append(mean_absolute_error(y.iloc[test_idx], preds))

    # Local models: one per chamber
    local_errors = []
    for chamber in chamber_ids.unique():
        mask = chamber_ids == chamber
        X_ch, y_ch = X[mask], y[mask]
        if len(y_ch) < 50:  # Need minimum samples
            continue
        # Simple train/test split for each chamber
        split = int(0.8 * len(y_ch))
        vm_pipeline.fit(X_ch.iloc[:split], y_ch.iloc[:split])
        preds = vm_pipeline.predict(X_ch.iloc[split:])
        local_errors.append(mean_absolute_error(y_ch.iloc[split:], preds))

    print(f"Global MAE: {np.mean(global_errors):.3f} nm")
    print(f"Local MAE:  {np.mean(local_errors):.3f} nm")
    # Local usually wins by 10-30%, but needs more data per chamber

Defect Inspection

VM Model Building

Knowledge Check

1 / 3

What is the primary data source used as input features for Virtual Metrology models?

Physical metrology measurements from downstream toolsEquipment sensor data (FDC) collected during wafer processingDesign layout (GDSII) filesFinal electrical test results