Posted: May 23, 2012
Missing data represents one of the most persistent and challenging problems in statistical analysis and machine learning applications across diverse domains including healthcare, social sciences, and business analytics. The prevalence of incomplete datasets necessitates the development and application of imputation techniques that can generate plausible values for missing observations. While numerous imputation methods have been proposed in the literature, their comparative evaluation has traditionally focused on predictive accuracy metrics, often neglecting the crucial aspect of parameter validity preservation. This research addresses this significant gap by developing a comprehensive evaluation framework that simultaneously assesses both predictive accuracy and statistical validity across multiple imputation approaches. The fundamental challenge in missing data imputation lies in the inherent tension between generating values that maintain the original data distribution's statistical properties while also enabling accurate predictions in downstream modeling tasks. Conventional evaluation paradigms have predominantly emphasized the latter, potentially leading to the adoption of imputation methods that produce biased parameter estimates or distorted covariance structures. This oversight has profound implications for inferential statistics, where the validity of parameter estimates is paramount for drawing meaningful conclusions from data. Our research makes several distinctive contributions to the field of missing data analysis. First, we introduce a novel evaluation framework that systematically examines the impact of imputation techniques on both predictive performance and parameter validity across different missing data mechanisms and proportions. Second, we investigate the often-overlooked relationship between imputation
Downloads: 83
Abstract Views: 1558
Rank: 263986