

Cross-validation is a model evaluation technique used in machine learning to measure how well a model generalizes to unseen data. Instead of relying on a single training/test split, cross-validation repeatedly divides the dataset into multiple subsets, training and validating the model across multiple runs to produce a more reliable performance estimate and detect overfitting.
This evaluation strategy helps identify whether a model performs consistently or only succeeds due to a favorable data split — ensuring more trustworthy real-world performance.
The dataset is divided into k folds. The model is trained on k–1 folds and validated on the remaining fold. The process repeats k times, and results are averaged.
Preserves class distribution in every fold. Essential for imbalanced classification problems.
A special case of k = dataset size. Each data point is used as a validation set once. Highly accurate, but computationally expensive.
Maintains chronological data order to avoid leakage. Required for forecasting, anomaly detection, and sequential models.
Runs K-Fold multiple times using different random splits for improved stability.
A model trained on a single split achieves high accuracy, but fails when deployed. After using stratified k-fold cross-validation, performance variance across folds reveals the model was overfitted — prompting feature refinement and tuning before deployment.