## Sunday, April 30, 2006

### Interesting String Problem

A student recently gave me the gift of a book by Dennis Shasha on puzzles. Here is an algorithmic version of a puzzle in that book.

A string S of characters is interesting if for all pairs of characters x and y in the alphabet, the distances between their occurrences in S do not repeat, that is, S does not have S[i]=S[j]=x and S[k]=S[l]=y such that k-i=l-j, for i < j < k < l. AAB is interesting but ACBADB is not since (A,B) appear separated by 2 places twice. How long does it take to check if a given string S is interesting?

D. Eppstein said...

Characters a and b have a repeated distance S[i]=S[j]=a, S[i+d]=S[j+d]=b, iff there is some distance j-i that is shared both by a pair of a's and by a pair of b's. So, an equivalent formulation is that there are no equal a-a and b-b distances.

Also, at most one letter can have more than O(sqrt(n)) occurences, because if two letters a and b did then they'd have too many a-b distances and one would have to be repeated.

So in time O(n^{3/2}) we can build lists of all a-a distances for all but the most frequent letter, and test for some distance that is shared by two or more letters.

I think it's possible to build a table of a-a distances between the positions of the most frequent letter in O(n polylog time) by reducing the problem to binary integer multiplication (or polynomial multiplication, or your favorite FFT formulation).

So that would give an O(n^{3/2}) algorithm.

Maybe this can be improved by applying the FFT part to more than one of the most frequent letters?

6:27 PM
D. Eppstein said...

BTW, I'm assuming here that AABB counts as uninteresting, as it has two AB pairs at distance two, although that repetition doesn't satisfy the i<j<k<l part of your formulation.

11:01 AM
metoo said...

I am also assuming AABB is uninteresting, I see that I did not formalize it properly.

2:24 PM
Anonymous said...

I say briefly: Best! Useful information. Good job guys.
»

9:53 PM