Assessing the Influence of Cross-Validation Fold Size on Predictive Performance and Model Selection Criteria

Stewart, Leo; Adams, Lillian; Rivera, Logan

Submit Your Article

Assessing the Influence of Cross-Validation Fold Size on Predictive Performance and Model Selection Criteria

Posted: Mar 01, 2009

Abstract

Cross-validation remains a cornerstone technique in machine learning for model evaluation and selection, yet the fundamental question of how fold size influences both predictive performance assessment and model selection criteria remains inadequately explored. This research introduces a comprehensive framework for analyzing cross-validation fold size effects through a novel multi-dimensional evaluation approach that simultaneously considers predictive accuracy, model selection consistency, computational efficiency, and variance-bias tradeoffs. We conducted extensive empirical investigations across 24 diverse datasets spanning classification, regression, and time-series forecasting tasks, employing 12 distinct machine learning algorithms. Our methodology introduces a novel cross-validation stability metric that quantifies the consistency of model selection decisions across different fold configurations. The results reveal a previously undocumented non-monotonic relationship between fold size and model selection reliability, with optimal performance occurring at intermediate fold sizes rather than the extremes commonly employed in practice. We demonstrate that traditional k-fold cross-validation with k=5 or k=10, while computationally efficient, systematically underestimates model variance in high-dimensional settings, leading to overconfident performance estimates. Conversely, leave-one-out cross-validation exhibits excessive variance in model selection for small to medium-sized datasets. Our findings challenge conventional wisdom by showing that the optimal fold size is strongly dependent on dataset characteristics, model complexity, and the specific evaluation metric employed. We propose a data-driven fold size selection procedure that adapts to dataset properties and outperforms fixed fold-size approaches across all experimental conditions. This research provides practitioners with actionable insights for configuring cross-validation procedures and establishes a new paradigm for understanding the complex interplay between fold size, model evaluation, and selection reliability in machine learning workflows.

Downloads: 30

Abstract Views: 1316

Rank: 79497

Hi, Racheal

Assessing the Influence of Cross-Validation Fold Size on Predictive Performance and Model Selection Criteria

Abstract

Related eJournals

Guidelines

Policies

Get In Touch