Thursday, October 01, 2009

Web Random Bits

We often need random bits. People toss a coin or use noise from transistors or whatever to obtain random bits in reality. Here are two tasks.
  • Devise a method to get random bits from the web. The procedure should not use network phenomenon such as round trip delay time to random hosts etc, but rather should be at application level. Likewise, the procedure should not be to go to a website which tosses coins for you. Rather, it should use some existing web phenomenon or content. Btw, quantifying/discussing potential biases will be nice.
  • Devise a method for two people to coordinate and get public random bits from the web. The method can't assume that two people going to a website simultaneously will see identical content or be able to synchronize perfectly.
Of course, it should be fast and fun.

Labels:

11 Comments:

Anonymous Anonymous said...

interesting problem! you could use a website that is updated regularly, like say a news site/wikipedia, you could use the document size, or the last-modified timestamp mod 2.

successive tries may be highly correlated, so you could XOR results from several pages, such as outgoing links.

9:28 PM  
Blogger NQH said...

This comment has been removed by the author.

9:56 PM  
Blogger NQH said...

I was thinking of the parity of the length of the latest Twitter post, but didn't see Anonymous post.

Anyhow, parity of, say, the sum of stock prices seems fast.

10:16 PM  
Anonymous Anonymous said...

Go to reddit.com.

Look at the top submission of the day. Take the number of votes mod 2.

Of course this only gives you one bit per day...

1:50 AM  
Blogger Vinayak said...

I am sure some video streaming site tells you the number of people watching a video at a particular time. This number should be quite random.

1:53 PM  
Blogger M. said...

The second problem is interesting. If I understand correctly, two people should be able to use the web like a PRNG -- if they both "synchronize", i.e. start with the same seed in the PRNG context, then they should both see the same stream of random bits.

If two people can't synchronize, then you might as well assume that they're trying to retrieve the same "random stream" at completely different times. Assumptions about the rate of change of the web probably shouldn't be touched with a ten foot pole, so not being able to synchronize "perfectly" means that things like near-concurrent accesses of the same webpage (for example) are not a good idea.

And while you're at it, there's no need for the "two people" restriction either.

So here's the idea--set up a common webmail account WITHOUT a spam filter, and have a bot that crawls the web and submits the email address to every spammy newsletter/site it can possibly find. Spam starts to accrue mercilessly.

Now, for the random bits, the n parties that want the same bits exchange two Unix epoch timestamps amongst themselves, both larger than the time the email account was set up at. The random bit stream is then obtained by concatenating the text and headers of all spam received between those two timestamps, taking the MD5 of successive chunks, and then randomly shuffling the resultant bits using a simple O(N) shuffle, a PRNG and the first timestamp as the seed (this last bit might be gratuitous, but still entertaining).

Of course, there are a multitude of ways in which this could be spiced up with more randomness, but the basic idea relies on the ever-changing nature of, say, spam subject lines and habits.

2:32 PM  
Blogger AC said...

For Problem 2, figure out the length of the input instance (you are computing something on some instance, right?) and then use an appropriately long private random string. By Newman's theorem, this suffices. Har har!

7:06 PM  
Blogger codingplayground said...

1. What about using a Randomness extractor (see http://en.wikipedia.org/wiki/Randomness_extractor ) starting and considering Twitter search as an (high-entrophy) source. There are applications like this based on RF-noise, Lavalamp, and similar funny sources. I would adopt either MD5 or SHA-1 hash function. This will give all the benefits of the crypto-hash function and bias-reduction.

2. Synching is a bit more hard, since the query is not enough. Time is another important factor: the results you see at time t are potentially different from the ones seen at time t + \eps. The index can change and the load balance mechanism in search can bring you to a different search cluster. I guess you must relay on some proxy which must capture a snapshot of twitter and then you must syncronize on the query and the time. There are a couple of services like this out of there.

2:29 AM  
Blogger babar ganesh said...

interesting that all these solutions are polling type solutions.

what could you do that was streaming? ie information is coming in from different sites; find a way to turn that into random bits?

5:37 PM  
Anonymous cheap viagra said...

amazing info

10:17 AM  
Anonymous Cheap Viagra Online said...

interesting problem!

10:17 AM  

Post a Comment

<< Home