### Nearest Neighbors

We have a dictionary D of vectors to be preprocessed, and a query vector q. We need to find nearest neighbor of q in D. Distance d(u,v) is the Hamming distance but any position i in which u and v differ can not differ by more than k (|u[i]-v[i]| \leq k). We can think of this as L_{\leq k,1} distance where L_{a,b} means you apply "a" operator to each dimension and "b" operator over the ensuing results. Hamming distance is L_{\infty,1}, standard L_1 is L_{1,1}. Any pointers welcome. Btw, for those who care, the string matching problem with this distance can be solved in O(nk polylog (m)) time.

Labels: aggregator

## 7 Comments:

any position i in which u and v differ can not differ by more than kMaking sure I understand: (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1)?

Or are you saying the dataset itself is promised to satisfy a restriction that |u_i-v_i| <= k for all i, for all u,v in the dataset?

The dataset is completely general and the distance function is what you defined, that is, (u_i,v_i) is called a match if |u_i - v_i| <= k (contributes 0 to distance), and is a mismatch otherwise (contributes 1).

-- Metoo

Wouldn't this version solve the partial-matching problem then?

In particular, we can reduce partial-match by using the following encoding for {0,1,*}:

0 -> 0

1 -> 2

* -> 1

and consider the problem for k=1 (i.e., L_{\le1, 1} using the notation from the post).

A easier version of the problem could be as follows:

the distance on each coordinate is min{|u_i-v_i|,k}. In this case, at least for constant approximation, one can do something non-trivial.

-alex

Sigh, even for k=1. I should have emailed Alex first. :) I should not look for Partial Match in some of the other places in streaming where I wanted to work with this distance.

- Metoo

Can't you do it with Kd-trees?

http://en.wikipedia.org/wiki/Kd-tree

Probably there are details in your problems that I didn't understand....

Hugo: The problem is about worst case bounds and tradeoffs. What will kd trees give?

-- Metoo

Kushilevitz,Ostrovsky,Rabani paper gives 1+\eps approximation for this problem

Post a Comment

<< Home