Nearest Neighbors
We have a dictionary D of vectors to be preprocessed, and a query vector q. We need to find nearest neighbor of q in D. Distance d(u,v) is the Hamming distance but any position i in which u and v differ can not differ by more than k (|u[i]-v[i]| \leq k). We can think of this as L_{\leq k,1} distance where L_{a,b} means you apply "a" operator to each dimension and "b" operator over the ensuing results. Hamming distance is L_{\infty,1}, standard L_1 is L_{1,1}. Any pointers welcome. Btw, for those who care, the string matching problem with this distance can be solved in O(nk polylog (m)) time.
Labels: aggregator
7 Comments:
any position i in which u and v differ can not differ by more than k
Making sure I understand: (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1)?
Or are you saying the dataset itself is promised to satisfy a restriction that |u_i-v_i| <= k for all i, for all u,v in the dataset?
The dataset is completely general and the distance function is what you defined, that is, (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1).
-- Metoo
Wouldn't this version solve the partial-matching problem then?
In particular, we can reduce partial-match by using the following encoding for {0,1,*}:
0 -> 0
1 -> 2
* -> 1
and consider the problem for k=1 (i.e., L_{\le1, 1} using the notation from the post).
A easier version of the problem could be as follows:
the distance on each coordinate is min{|u_i-v_i|,k}. In this case, at least for constant approximation, one can do something non-trivial.
-alex
Sigh, even for k=1. I should have emailed Alex first. :) I should not look for Partial Match in some of the other places in streaming where I wanted to work with this distance.
- Metoo
Can't you do it with Kd-trees?
http://en.wikipedia.org/wiki/Kd-tree
Probably there are details in your problems that I didn't understand....
Hugo: The problem is about worst case bounds and tradeoffs. What will kd trees give?
-- Metoo
Kushilevitz,Ostrovsky,Rabani paper gives 1+\eps approximation for this problem
Post a Comment
<< Home