Saturday, October 27, 2007


Yeap, not Very Large Databases (VLDB) but XLDB. There was an invite-only meeting at the Stanford Linear Accelerator Center organized by Jacek Becla and others on such monsters, petabytes and up. It was a conversation involving
  • scientists who generate petabytes on large scale experiments (sky telescope imagery etc) and have to process and make them available to the community,
  • univ professors, chiefly, database (stonebraker, dewitt)
  • companies that have experience (ebay, yahoo, google, at&t) and
  • vendors who cater (oracle, ibm, teradata, netezza, objectivity).
It was a quintessential conversation. The scientists told us they had data problems, and had to generate petabytes scale infrastructure on small budgets (most of their money goes into building the devices) using grad students and postdocs. The database professors complained that the scientists' problems did not have many commonalities and needed to abstract further to get useful data models and operations. The companies with experience said they could process such large datasets today, with their resources: disks, network connections and many, many people; they were curious why there was a challenge! Scientists complained that they were a small market for vendors and that when database research students worked with them, they produced papers and no prototypes, certainly, no products. The vendors said they were keen on working with the scientists, to get some experience to develop new generation of tools that may be useful for a variety of applications (if any).

Hence, many realities collided. What emerged was many opportunities to learn, collaborate and build for longterm future. Curiously, many mentioned mapreduce+SQLengine as the answer. Decades of database research and the current hammer is mapreduce? IBM+Google are partnering to bring Apache's Hadoop (open source implementation of mapreduce) to university students.


Post a Comment

<< Home