Posted: Mar 01, 2009
Cross-validation remains a cornerstone technique in machine learning for model evaluation and selection, yet the fundamental question of how fold size influences both predictive performance assessment and model selection criteria remains inadequately explored. This research introduces a comprehensive framework for analyzing cross-validation fold size effects through a novel multi-dimensional evaluation approach that simultaneously considers predictive accuracy, model selection consistency, computational efficiency, and variance-bias tradeoffs. We conducted extensive empirical investigations across 24 diverse datasets spanning classification, regression, and time-series forecasting tasks, employing 12 distinct machine learning algorithms. Our methodology introduces a novel cross-validation stability metric that quantifies the consistency of model selection decisions across different fold configurations. The results reveal a previously undocumented non-monotonic relationship between fold size and model selection reliability, with optimal performance occurring at intermediate fold sizes rather than the extremes commonly employed in practice. We demonstrate that traditional k-fold cross-validation with k=5 or k=10, while computationally efficient, systematically underestimates model variance in high-dimensional settings, leading to overconfident performance estimates. Conversely, leave-one-out cross-validation exhibits excessive variance in model selection for small to medium-sized datasets. Our findings challenge conventional wisdom by showing that the optimal fold size is strongly dependent on dataset characteristics, model complexity, and the specific evaluation metric employed. We propose a data-driven fold size selection procedure that adapts to dataset properties and outperforms fixed fold-size approaches across all experimental conditions. This research provides practitioners with actionable insights for configuring cross-validation procedures and establishes a new paradigm for understanding the complex interplay between fold size, model evaluation, and selection reliability in machine learning workflows.
Downloads: 30
Abstract Views: 1316
Rank: 79497