### CPM + A Research Problem

I enjoy being at CPM (Combinatorial Pattern Matching) since I have a sense of being at home with friends; the conference more or less deals with string, array and tree matching, and applications.

CPM conf has been good at connecting with suitable

Input: Given k pairs of strings (fn_i, ln_i) of each length l. May be preprocessed.

Output: Given a query (p,q), find all i's such that fn_i contains p and ln_i contains q.

Target: With linear preprocessing we can answer the query in O(k+l) time by indexing each dimension separately and merging the two output lists. With O(kl^2) preprocessing, query time comes down to O(l + polylog (k)). What we need is a solution with better tradeoff between preprocessing time/space and query (same flavor as the vaunted nearest neighbors problem in high dimensions). Check out the reference (this is written in external memory parlance, but can be easily massaged to the standard RAM).

CPM conf has been good at connecting with suitable

- math (periodicity properties of strings, decomposition theorems),
- data structures (dynamic, compressed, cache oblivious),
- approximation/randomization algorithms (sequence comparison, string embeddings, Karp-Rabin fingerprints), and going out to find
- applications (computational biology, compression, databases, information retrieval).

CPM has also been good at training some top class algorithmers; another niche area with this claim is Scheduling Theory. Still, CPM is less successful than it could become. Connection to more approx/random main stream algorithms, finding applications for the techniques in web retrieval, and perhaps collocation in the future with other more visible confs could bring more attention to CPM.

Instead of focusing on applications, I spoke about fundamental problems in string matching: extending non-standard string matching framework to develop a theory of alignments, 2 dimensional indexing, edit distance embeddings and others. Here is a research problem:Input: Given k pairs of strings (fn_i, ln_i) of each length l. May be preprocessed.

Output: Given a query (p,q), find all i's such that fn_i contains p and ln_i contains q.

Target: With linear preprocessing we can answer the query in O(k+l) time by indexing each dimension separately and merging the two output lists. With O(kl^2) preprocessing, query time comes down to O(l + polylog (k)). What we need is a solution with better tradeoff between preprocessing time/space and query (same flavor as the vaunted nearest neighbors problem in high dimensions). Check out the reference (this is written in external memory parlance, but can be easily massaged to the standard RAM).

## 1 Comments:

Are the slides for your talk available?

Thanks.

Post a Comment

<< Home