## Friday, December 11, 2009

### Nearest Neighbors

We have a dictionary D of vectors to be preprocessed, and a query vector q. We need to find nearest neighbor of q in D. Distance d(u,v) is the Hamming distance but any position i in which u and v differ can not differ by more than k (|u[i]-v[i]| \leq k). We can think of this as L_{\leq k,1} distance where L_{a,b} means you apply "a" operator to each dimension and "b" operator over the ensuing results. Hamming distance is L_{\infty,1}, standard L_1 is L_{1,1}. Any pointers welcome. Btw, for those who care, the string matching problem with this distance can be solved in O(nk polylog (m)) time.

Labels:

#### 7 Comments:

Anonymous said...

any position i in which u and v differ can not differ by more than k

Making sure I understand: (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1)?

Or are you saying the dataset itself is promised to satisfy a restriction that |u_i-v_i| <= k for all i, for all u,v in the dataset?

6:35 AM
Anonymous said...

The dataset is completely general and the distance function is what you defined, that is, (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1).

-- Metoo

1:51 PM
Anonymous said...

Wouldn't this version solve the partial-matching problem then?
In particular, we can reduce partial-match by using the following encoding for {0,1,*}:
0 -> 0
1 -> 2
* -> 1
and consider the problem for k=1 (i.e., L_{\le1, 1} using the notation from the post).

A easier version of the problem could be as follows:
the distance on each coordinate is min{|u_i-v_i|,k}. In this case, at least for constant approximation, one can do something non-trivial.

-alex

3:58 PM
Anonymous said...

Sigh, even for k=1. I should have emailed Alex first. :) I should not look for Partial Match in some of the other places in streaming where I wanted to work with this distance.

- Metoo

5:39 AM
Hugo Penedones said...

Can't you do it with Kd-trees?

http://en.wikipedia.org/wiki/Kd-tree

Probably there are details in your problems that I didn't understand....

6:04 AM
Anonymous said...

Hugo: The problem is about worst case bounds and tradeoffs. What will kd trees give?
-- Metoo

8:09 AM
Anonymous said...

Kushilevitz,Ostrovsky,Rabani paper gives 1+\eps approximation for this problem

10:24 PM