## Wednesday, February 08, 2006

### Duplicates

Here is an algorithmic problem. Consider read-write array A[1,n] of numbers. Find the smallest index i such that A[i]=A[j] for some j, that is the smallest index that has a duplicate in the array, using O(1) extra space only. We work in the compare and move model typically used for sorting, and are not allowed to "stuff bits" into array items. This problem was given to me by Martin Pal and Vitaliy Kulikov of Google. Results we can claim are: O(nlog^2 n) time without destroying the array, and O(nlog n) time if one is allowed to destroy the array. Any thoughts/pointers welcome. The problem has nice connections to in-place mergesort, for example.

Anonymous said...

Seems like an interesting problem. Is there a google motivation for this?

How big is n? How big does expect i to be? With a little extra storage, a bloom-filter type solution could be quite efficient...

1:13 PM
metoo said...

I will be interested in any motivation any one can provide. :)

I am looking for O() bounds, so n-> infty, for worst case i. Unfortunately bloom filter type results will use more than O(1) extra storage. This may be useful in practice, but see above! :)

3:03 PM
metoo said...

I thought about your Bloom filter suggestion. Here is a toy problem that may be interesting. We get to updates (i,c_i), and want to store them so that we can retrieve for any given i, the \min c_i over all updates to i. This is different from \sum c_i that others have studied, eg., count bloom filters.

8:01 AM
loser2007 said...

How do you do this in O(n log^2 n)?

And about the bloom filter problem just keep a min instead of a count?

10:52 PM