|
|
 |
Computational Geometry
Nearest neighbor search and other fundamental proximity problems based on metrics are ubiquitous across science and
engineering, and also serve as illustrative computational prototypes for solving other important problems.
fastlab
Big Ideas
People
Stuff
|
 |
|
 |
Rank-Approximate Nearest Neighbor Search:
Retaining Meaning and Speed in High Dimensions
Parikshit Ram, Dongryeol Lee, Hua Ouyang, and Alexander Gray
Neural Information Processing Systems (NIPS) 2009, appeared 2010
A new formulation of approximate nearest-neighbor search which allows for the first time direct control of the error in terms of ranks, which retain meaning while the traditionally-used numerical distances are known to become meaningless as dimension grows. Empirical results
show favorable accuracy for the same runtime compared to existing methods such as LSH.
[pdf]
Abstract:
The long-standing problem of efficient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 fingerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1 + epsilon) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search
present in the ranks (ordering) of the distances. This paper presents a simple,
practical algorithm allowing the user to, for the first time, directly control the
true accuracy of NN search (in terms of ranks) while still achieving the large
speedups over exact NN. Experiments with high-dimensional datasets show that
it often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior.
@inproceedings{ram2010ranknn,
author = "Parikshit Ram and Dongryeol Lee and Hua Ouyang and Alexander G. Gray",
title = "{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 22 (Dec 2009)",
year = "2010",
publisher = {MIT Press}
}
|
An Investigation of Practical Approximate Nearest Neighbor Algorithms
Ting Liu, Andrew W. Moore, Alexander Gray, and Ke Yang
Neural Information Processing Systems (NIPS) 2004, appeared 2005
A way to do approximate nearest-neighbor search using a data structure called a spill tree, which yields demonstrates significant speedup over other methods such as LSH.
[pdf]
Abstract:
This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate k-NN search algorithms on this structure. We show why these structures should be able to exploit the same random projection-based approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31-fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.
@inproceedings{liu2005approxnn,
Author = "Ting Liu and Andrew W. Moore and Alexander G. Gray and Ke Yang",
Title = "{An Investigation of Practical Approximate Nearest Neighbor
Algorithms}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 17 (Dec 2004)",
Year = "2005",
publisher = {MIT Press}
}
|
|
|
 |
|
|
All-Nearest-Neighbors
The all-nearest-neighbors problem (as well as the standard single-query nearest-neighbor problem) is a special case of generalized N-body problem.
[see webpage here]
|
Nearest-Neighbor Classification
Nearest-neighbor classification is a special variant of nearest-neighbor, which we showed can be solved more efficiently than the standard nearest-neighbor search problem can, with a specialized approach.
[see full entry here]
|
Data Structures for Proximity Problems
A large variety of hierarchical data structures can be used for proximity problems.
[see webpage here]
|
Euclidean Minimum Spanning Tree
We demonstrate the most overall efficient algorithm for the classic EMST problem,
in the context of hierarchical clustering. [see webpage here]
|
|
|
|