Posted: Mar 26, 2025
The relationship between dimensionality and overfitting represents one of the most fundamental challenges in statistical learning theory. Traditional understanding, largely shaped by the seminal work on the curse of dimensionality, posits that as the number of features increases, models become increasingly prone to overfitting due to the exponential growth of the hypothesis space relative to available training data. This conventional wisdom has guided feature selection practices, regularization strategies, and model architecture decisions for decades. However, our investigation reveals that this relationship is far more complex and nuanced than previously acknowledged. We propose that the dimensionality-overfitting relationship exhibits a non-monotonic pattern characterized by alternating phases of vulnerability and resilience. This pattern emerges from the interplay between the ambient dimensionality of the feature space and the intrinsic dimensionality of the underlying data manifold. Our research introduces the concept of dimensional resonance zones—specific dimensional ranges where models demonstrate heightened sensitivity to overfitting—and establishes that these zones are predictable and manipulable through appropriate model design. The novelty of our approach lies in the integration of geometric topology with statistical learning theory, enabling us to characterize the structural properties of high-dimensional spaces that influence model generalization. By examining the curvature, connectivity, and density properties of data manifolds across different dimensional regimes, we provide a more sophisticated understanding of when and why overfitting occurs. This perspective challenges the oversimplified narrative that more dimensions invariably lead to worse generalization.
Downloads: 15
Abstract Views: 1237
Rank: 344968