Glossary - Elneuro | Statistical & Analytics Terms

A

Statistics

Alternative Hypothesis (H1)

The hypothesis that contradicts the null hypothesis in statistical testing. It represents what the researcher is trying to prove or the effect they expect to find.

Statistics

ANOVA (Analysis of Variance)

A statistical method for comparing means of three or more groups by analyzing variance. Tests whether observed differences between group means are statistically significant.

Time Series

ARIMA

AutoRegressive Integrated Moving Average. A class of time series models that combines autoregression, differencing, and moving average components for forecasting.

Time Series

Autocorrelation

The correlation of a time series with its own past and future values. Measures the degree to which current values are related to previous values.

B

ML

Bagging (Bootstrap Aggregating)

An ensemble method that trains multiple models on random subsets of data (bootstrap samples) and combines their predictions to reduce variance.

Statistics

Bayes' Theorem

A fundamental theorem relating conditional probabilities. P(A|B) = P(B|A)P(A)/P(B). Forms the basis of Bayesian statistics and inference.

ML

Bias-Variance Tradeoff

The balance between model simplicity (high bias, low variance) and complexity (low bias, high variance). Optimal models minimize total error.

ML

Boosting

An ensemble technique that builds models sequentially, with each new model focusing on correcting errors from previous models.

C

Reliability

Censored Data

Observations where the exact failure time is unknown. Right-censored: item hasn't failed by observation end. Left-censored: failure occurred before observation.

Statistics

Central Limit Theorem

The sampling distribution of the mean approaches a normal distribution as sample size increases, regardless of the population's distribution.

Statistics

Confidence Interval

A range of values that, with a specified probability (e.g., 95%), contains the true population parameter. Width depends on sample size and variability.

SPC

Control Chart

A graph used in SPC to monitor process behavior over time. Contains center line, upper control limit (UCL), and lower control limit (LCL).

SPC

Cpk (Process Capability Index)

Measures how well a process meets specifications, accounting for centering. Cpk = min[(USL-μ)/3σ, (μ-LSL)/3σ]. Values >1.33 generally indicate capable processes.

ML

Cross-Validation

A technique for evaluating model performance by partitioning data into training and validation sets multiple times. K-fold CV uses k partitions.

SPC

CUSUM Chart

Cumulative Sum control chart. Detects small, sustained shifts in process mean by accumulating deviations from target.

D

ML

Decision Tree

A predictive model that makes decisions based on a series of questions about feature values. Splits data recursively to maximize information gain or reduce impurity.

Statistics

Degrees of Freedom

The number of independent values that can vary in a calculation. Affects the shape of distributions used in hypothesis testing (t, chi-square, F).

DOE

Design of Experiments (DOE)

A systematic approach to planning experiments that efficiently explores factor-response relationships while controlling for nuisance variables.

E

ML

Ensemble Methods

Techniques that combine multiple models to produce better predictions than any single model. Includes bagging, boosting, and stacking.

SPC

EWMA Chart

Exponentially Weighted Moving Average chart. Weights recent observations more heavily, effective for detecting small process shifts.

Time Series

Exponential Smoothing

Forecasting method that assigns exponentially decreasing weights to older observations. Includes simple, double (Holt), and triple (Holt-Winters) variants.

F

DOE

Factorial Design

An experimental design that studies all combinations of factor levels. A 2^k design has k factors at 2 levels each.

Reliability

Failure Rate (Hazard Rate)

The instantaneous rate of failure at time t given survival to t. h(t) = f(t)/R(t). Can be constant, increasing, or decreasing.

Statistics

F-Distribution

A probability distribution arising as the ratio of two chi-square distributions. Used in ANOVA and regression significance tests.

ML

Feature Engineering

The process of creating new features from raw data to improve model performance. Includes transformations, interactions, and domain-specific features.

G

ML

Gradient Descent

An optimization algorithm that iteratively adjusts parameters to minimize a loss function by moving in the direction of steepest descent.

ML

Gradient Boosting

A boosting method where each successive model fits the negative gradient of the loss function, effectively fitting residuals.

H

Reliability

Hazard Function

See Failure Rate. The conditional probability density of failure at time t given survival to t.

Statistics

Hypothesis Testing

A statistical procedure for making inferences about population parameters based on sample data. Tests a null hypothesis against an alternative.

I

DOE

Interaction Effect

When the effect of one factor depends on the level of another factor. Detected in factorial designs through analysis of interaction terms.

K

Reliability

Kaplan-Meier Estimator

A non-parametric estimator of the survival function that handles censored data. The product-limit estimator.

ML

K-Means Clustering

An unsupervised algorithm that partitions data into k clusters by minimizing within-cluster variance around centroids.

L

ML

Linear Regression

A method for modeling the relationship between a dependent variable and independent variables using a linear equation. Minimizes sum of squared errors.

ML

Logistic Regression

A classification algorithm that models the probability of binary outcomes using the logistic function. Despite its name, used for classification.

M

Statistics

Maximum Likelihood Estimation (MLE)

A method for estimating parameters by finding values that maximize the likelihood of observing the data.

Reliability

MTBF (Mean Time Between Failures)

For repairable systems, the average time between consecutive failures. Includes repair time in the cycle.

Reliability

MTTF (Mean Time To Failure)

For non-repairable items, the expected time until failure. MTTF = integral of the reliability function.

N

Statistics

Normal Distribution

The bell-shaped probability distribution characterized by mean and standard deviation. Many natural phenomena approximate this distribution.

Statistics

Null Hypothesis (H0)

The default hypothesis in statistical testing, typically representing no effect or no difference. Rejected only when evidence is sufficient.

O

ML

Overfitting

When a model learns noise in training data rather than true patterns, resulting in poor generalization to new data. Characterized by low training error but high test error.

P

Statistics

p-value

The probability of obtaining results at least as extreme as observed, assuming the null hypothesis is true. Small p-values (typically <0.05) suggest rejecting H0.

ML

PCA (Principal Component Analysis)

A dimensionality reduction technique that transforms data into uncorrelated components ordered by variance explained.

SPC

Process Capability

The ability of a process to produce output within specification limits. Measured by Cp, Cpk, Pp, Ppk indices.

R

ML

Random Forest

An ensemble of decision trees trained on bootstrap samples with random feature subsets. Reduces overfitting through averaging.

Statistics

R-squared (Coefficient of Determination)

The proportion of variance in the dependent variable explained by the model. Ranges from 0 to 1; higher values indicate better fit.

ML

Regularization

Techniques that constrain model complexity to prevent overfitting. L1 (Lasso) and L2 (Ridge) add penalty terms to the loss function.

Reliability

Reliability Function R(t)

The probability that an item survives beyond time t. R(t) = 1 - F(t) = P(T > t).

DOE

Response Surface Methodology (RSM)

A collection of statistical techniques for exploring relationships between variables and optimizing responses using polynomial models.

S

SPC

Shewhart Chart

The original control chart developed by Walter Shewhart. Uses 3-sigma limits to distinguish common cause from special cause variation.

Statistics

Standard Deviation

A measure of dispersion representing the average distance of observations from the mean. Square root of variance.

Time Series

Stationarity

A property of time series where statistical properties (mean, variance) remain constant over time. Required for ARIMA modeling.

ML

Support Vector Machine (SVM)

A classification algorithm that finds the optimal hyperplane maximizing the margin between classes. Can use kernels for non-linear boundaries.

T

Statistics

t-Distribution

A probability distribution similar to normal but with heavier tails. Used when sample size is small and population variance unknown.

Statistics

t-Test

A hypothesis test for comparing means. Variants include one-sample, two-sample (independent), and paired t-tests.

Statistics

Type I Error

Rejecting a true null hypothesis (false positive). Probability equals significance level α.

Statistics

Type II Error

Failing to reject a false null hypothesis (false negative). Probability denoted β; power = 1 - β.

U

ML

Underfitting

When a model is too simple to capture underlying patterns. Characterized by high error on both training and test data.

V

Statistics

Variance

A measure of dispersion representing the average squared deviation from the mean. For sample: s² = Σ(xi - x̄)²/(n-1).

W

Reliability

Weibull Distribution

A flexible lifetime distribution with shape (β) and scale (η) parameters. β<1: decreasing hazard; β=1: constant (exponential); β>1: increasing hazard.

SPC

Western Electric Rules

A set of decision rules for identifying out-of-control conditions on control charts beyond simple limit violations.

Glossary of Terms

Alternative Hypothesis (H1)

ANOVA (Analysis of Variance)

ARIMA

Autocorrelation

Bagging (Bootstrap Aggregating)

Bayes' Theorem

Bias-Variance Tradeoff

Boosting

Censored Data

Central Limit Theorem

Confidence Interval

Control Chart

Cpk (Process Capability Index)

Cross-Validation

CUSUM Chart

Decision Tree

Degrees of Freedom

Design of Experiments (DOE)

Ensemble Methods

EWMA Chart

Exponential Smoothing

Factorial Design

Failure Rate (Hazard Rate)

F-Distribution

Feature Engineering

Gradient Descent

Gradient Boosting

Hazard Function

Hypothesis Testing

Interaction Effect

Kaplan-Meier Estimator

K-Means Clustering

Linear Regression

Logistic Regression

Maximum Likelihood Estimation (MLE)

MTBF (Mean Time Between Failures)

MTTF (Mean Time To Failure)

Normal Distribution

Null Hypothesis (H0)

Overfitting

p-value

PCA (Principal Component Analysis)

Process Capability

Random Forest

R-squared (Coefficient of Determination)

Regularization

Reliability Function R(t)

Response Surface Methodology (RSM)

Shewhart Chart

Standard Deviation

Stationarity

Support Vector Machine (SVM)

t-Distribution

t-Test

Type I Error

Type II Error

Underfitting

Variance

Weibull Distribution

Western Electric Rules