Yield Prediction & Optimization

Yield Prediction Models

Analytical yield models, regression, ensemble methods, spatial models, and handling limited data

Analytical Yield Models: Poisson, Murphy, and Negative Binomial

Analytical Yield Models: Poisson, Murphy, and Negative Binomial

Before ML, the industry built a small family of closed-form yield models that still anchor every modern analysis. They all start from one observation: yield falls roughly exponentially with the product of defect density (D₀) and critical area (A).

1. Poisson

YPoisson = exp(−D₀·A)

Assumes defects are spatially random and independent. Works for small dies and low defect counts, but underestimates yield for large dies because it ignores defect clustering.

2. Murphy's model

Murphy (1964) argued that D₀ itself varies across the wafer (some dies have more particles than others). Averaging Poisson yield over a triangular distribution of D₀ gives:

YMurphy = [(1 − exp(−D₀·A)) / (D₀·A)]2

This is the most-cited closed-form correction — it predicts noticeably higher yields than pure Poisson for large dies.

3. Seeds' model

Seeds substituted an exponential D₀ distribution instead of triangular:

YSeeds = 1 / (1 + D₀·A)

4. Negative binomial — the modern default

The most-used analytical model in production today. It explicitly parameterises clustering via a cluster parameter α (smaller α = more clustering):

YNB = (1 + D₀·A / α)−α

It smoothly recovers the previous models: α → ∞ gives Poisson, α = 1 gives Seeds, and finite α < 1 captures the typical clustering seen on real wafers.

Putting the models in code

import numpy as np

def yield_poisson(D0, A):
    return np.exp(-D0 * A)

def yield_murphy(D0, A):
    x = D0 * A
    return ((1 - np.exp(-x)) / x) ** 2

def yield_seeds(D0, A):
    return 1.0 / (1.0 + D0 * A)

def yield_negative_binomial(D0, A, alpha):
    return (1.0 + D0 * A / alpha) ** (-alpha)

# A 600 mm² die at D0 = 0.5 defects/cm² ⇒ D0·A = 3
for name, fn, args in [
    ("Poisson",    yield_poisson,    ()),
    ("Murphy",     yield_murphy,     ()),
    ("Seeds",      yield_seeds,      ()),
    ("NB (α=2)",   yield_negative_binomial, (2.0,)),
    ("NB (α=0.5)", yield_negative_binomial, (0.5,)),
]:
    Y = fn(0.5, 6.0, *args)
    print(f"{name:12s}  Y = {Y*100:.1f}%")
D₀·APoissonMurphySeedsNB (α=0.5)
137%40%50%58%
35%11%25%38%
60.25%2.8%14%27%

Key Concept: Why ML Doesn't Replace These Models

ML models predict yield wafer-by-wafer from features; analytical models predict yield from defect density. Fabs use them together: the analytical model decomposes yield into D₀ for each defect type (random, edge, systematic), and the ML model predicts how each D₀ will respond to process changes.

Modeling Approaches for Yield Prediction

Modeling Approaches for Yield Prediction

Different modeling strategies serve different yield prediction needs:

  • Wafer-level regression: Predict overall wafer yield from process parameters using XGBoost, Random Forest, or neural networks. Best for identifying process factors that drive yield variation.
  • Die-level prediction: Predict pass/fail or bin for individual dies. Much more data points but requires die-level features (design, spatial position, nearby metrology). Logistic regression, gradient boosting.
  • Spatial models: Account for across-wafer variation patterns. Gaussian Process models capture spatial correlations. CNN-based models treat the wafer map as an image.
  • Virtual WAT: Predict WAT electrical parameters from inline metrology and FDC data — faster than waiting for actual WAT measurements.

Key Concept: Feature Importance for Yield

The most valuable output of yield models is often not the prediction itself but the feature importance ranking. Knowing which process parameters most strongly influence yield directs engineering attention to the highest-impact improvements. SHAP values provide interpretable per-wafer explanations.

Handling Small Datasets

Handling Small Datasets

Semiconductor yield data has unique challenges:

  • Small n, large p: Hundreds of process parameters (features) but often only hundreds or thousands of wafers with yield data. Regularization is essential.
  • Non-stationary: Process conditions change over time (maintenance, recipe updates, material lot changes), so historical data may not represent current conditions.
  • Censored data: Wafers scrapped mid-process never get final yield data.

Strategies for small datasets:

  • Strong regularization (Lasso, Ridge, ElasticNet) to prevent overfitting
  • Bayesian methods that incorporate prior knowledge
  • Transfer learning from similar products or process nodes
  • Physics-informed features that encode domain knowledge
  • Cross-validation with time-aware splits (no data leakage from future)

Knowledge Check

Knowledge Check

1 / 3

Why does the negative binomial model usually fit large-die yield better than the pure Poisson model?