
Submanifold Density Estimation
Arkadas Ozakin and Alexander Gray
Neural Information Processing Systems (NIPS) 2009, appeared 2010
Worstcase theoretical results have led to the common wisdom that kernel density estimation is ineffective in more than a few dimensions.
We show that for realistic highdimensional data, which due to correlations between variables can be modeled as lying on a much
lowerdimensional submanifold embedded in the original space, a corrected kernel density estimate is in fact subject to the manifold's
dimension. Thus, we show that kernel estimation can be much more effective in highdimensional settings than previously thought.
[pdf]
Abstract:
Kernel density estimation is the most widelyused practical method for accurate
nonparametric density estimation. However, longstanding worstcase theoretical
results showing that its performance worsens exponentially with the dimension
of the data have quashed its application to modern highdimensional datasets for
decades. In practice, it has been recognized that often such data have a much
lowerdimensional intrinsic structure. We propose a small modification to
kernel density estimation for estimating probability density functions on Riemannian
submanifolds of Euclidean space. Using ideas from Riemannian geometry, we
prove the consistency of this modified estimator and show that the convergence
rate is determined by the intrinsic dimension of the submanifold. We conclude
with empirical results demonstrating the behavior predicted by our theory.
@inproceedings{ozakin2010subkde,
Author = "Arkadas Ozakin and Alexander G. Gray",
Title = "{Submanifold Density Estimation}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 22 (Dec 2009)",
Year = "2010",
publisher = {MIT Press}
}

See also
Curses of Dimensionality in Computational Geometry
The expansion constant is a notion of the intrinsic dimension of a dataset, which we have extended to create related data properties for the analysis of allnearestneighbors and hierarchical clustering algorithms.
[see webpage here]
In preparation
Adaptive and Parameterized Analysis of Proximity Data Structures
We are pursuing the development of theoretical techniques for more realistic runtime analysis of practical algorithms, partly based on more detailed notions of intrinsic dimensionality.
Volumetric Manifold Learning
We have developed a new approach to manifold learning based on ideas from Riemannian geometry.
