my slice of pizza: Nearest Neighbors

Friday, December 11, 2009

Nearest Neighbors

We have a dictionary D of vectors to be preprocessed, and a query vector q. We need to find nearest neighbor of q in D. Distance d(u,v) is the Hamming distance but any position i in which u and v differ can not differ by more than k (|u[i]-v[i]| \leq k). We can think of this as L_{\leq k,1} distance where L_{a,b} means you apply "a" operator to each dimension and "b" operator over the ensuing results. Hamming distance is L_{\infty,1}, standard L_1 is L_{1,1}. Any pointers welcome. Btw, for those who care, the string matching problem with this distance can be solved in O(nk polylog (m)) time.

Labels: aggregator

7 Comments:

Anonymous said...: any position i in which u and v differ can not differ by more than k

Making sure I understand: (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1)?

Or are you saying the dataset itself is promised to satisfy a restriction that |u_i-v_i| <= k for all i, for all u,v in the dataset?; 6:35 AM
Anonymous said...: The dataset is completely general and the distance function is what you defined, that is, (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1).

-- Metoo; 1:51 PM
Anonymous said...: Wouldn't this version solve the partial-matching problem then?
In particular, we can reduce partial-match by using the following encoding for {0,1,*}:
0 -> 0
1 -> 2
* -> 1
and consider the problem for k=1 (i.e., L_{\le1, 1} using the notation from the post).

A easier version of the problem could be as follows:
the distance on each coordinate is min{|u_i-v_i|,k}. In this case, at least for constant approximation, one can do something non-trivial.

-alex; 3:58 PM
Anonymous said...: Sigh, even for k=1. I should have emailed Alex first. :) I should not look for Partial Match in some of the other places in streaming where I wanted to work with this distance.

- Metoo; 5:39 AM
Hugo Penedones said...: Can't you do it with Kd-trees?

http://en.wikipedia.org/wiki/Kd-tree

Probably there are details in your problems that I didn't understand....; 6:04 AM
Anonymous said...: Hugo: The problem is about worst case bounds and tradeoffs. What will kd trees give?
-- Metoo; 8:09 AM
Anonymous said...: Kushilevitz,Ostrovsky,Rabani paper gives 1+\eps approximation for this problem; 10:24 PM

my slice of pizza

Friday, December 11, 2009

Nearest Neighbors

7 Comments:

About Me

Previous Posts