my slice of pizza: DIMACS Parallelism 2020: John Gustafson

Nitish Korula is a thinker, able to assimilate and relate vast amount of information; I have been looking for a sufficiently broad topic to get him to weigh in, and this proves ideal:

----

Thanks to Muthu for the opportunity to write a guest post, and thanks to the organizers for putting together this workshop. There were several great talks on Monday, but one that really stood out was a "keynote" by John Gustafson, director of research at Intel Labs. It exemplified many of the qualities of a great keynote address; it was provocative and comprehensive, with the speaker demonstrating a mastery of every aspect of his subject.

Gustafson's vision for 2020 was billion-core computing. Not a billion cores on a single chip, but a billion cores in a data-center sized single computing unit. Getting there, though, will be more difficult than simply continuing on our current path; our performance trend curves are flattening out on many dimensions. Will Moore's law continue to hold for another 10 years? Intel imagines that we will be at 5 nm gates by then, but 5 nm is only about 50 atoms wide; at this scale electrons travel 'slowly' and "the speed of light isn't what it used to be". Even if we can get performance improvements, this is likely to require significant power increases, so the "performance per watt" will go in the wrong direction.

The talk was far from pessimistic, though; the message was that building these billion-core machines will require radically new ideas in many areas of computer science, and it's an exciting time to be doing research on them. Some of the themes woven through the talk were a need for a better understanding and use of memory, better algorithms for scientific computing simulations, and better performance metrics. I can't possibly do justice to them all, but here are a few highlights:

Shekhar Borkar, the director of Intel's Microprocessor Technology Lab, says "Caches are for morons"! More seriously, reading data from memory takes a long time, and so we have been trying techniques like speculative fetching of large blocks of memory into a cache. According to Intel's best estimates, roughly 79% of data in a cache is never accessed. Gustafson suggests that we "cherish every bit" of memory we access, which leads to a second suggestion:
Throw out old Numerical Analysis textbooks! Algorithm designers have historically "measured" algorithm run times by counting the number of floating point operations / additions / multiplications. This made sense decades ago, when floating point arithmetic took 100 times as long as reading a word from memory. Now, one multiplication takes about 1.3 nanoseconds (to go through the entire pipeline; this underestimates throughput), compared to 50-100 nanoseconds for the memory access. Why do our algorithms measure the wrong thing? We should be counting memory accesses; it isn't reasonable to ignore the constant factor of 50.
Processor utilization is a terrible way to measure performance of a supercomputer! As one of many such analogies, he compared processors to snow plows at O'Hare airport; there are dozens of plows at the airport, almost all of which are idle most of the year. But when you need them, you need all of them; it wouldn't make sense to cut back on the number of snowplows because the average utilization is low. We should design programs that use as many processors as needed; see the next point.
We need new arithmetics! (What?) Today, we typically use 64-bit floating point arithmetic for everything, because "64 bits is enough". Sometimes it isn't, and at other times, it's far more than the application needs. When it isn't, you run into rounding errors that propagate; the heuristics to deal with this don't always work, and so you could have 64 bits of precision with errors from the third bit onwards. And when you don't need 64 bits, why are you wasting memory? You should be "cherishing every bit". One way of dealing with this problem is interval arithmetic, which explicitly maintains an interval guaranteed to contain the "true" answer. Unfortunately, interval arithmetic has its own issues; it's very easy to get the interval [-\infinity,+\infinity] if one isn't careful. Still, there are applications (such as $n$-body problems) where interval arithmetic gives good results. (And when you have a billion cores, you can split up the interval and have different cores working on different sub-intervals.) For these approaches to catch on, we need new algorithms for interval arithmetics and/or new arithmetics.
We need new programming languages and development environments to allow programmers to understand / interact with the hardware they're programming for. Should compilers receive a "memory description" in addition to source code? Should programmers use a 3-D environment in which they can specify the spatial relationship of their billion cores and memory?

There was much more in this vein; I didn't touch on system software and reliability, networks and communication (Gustafson thinks we should increase communication budgets by a factor of ~8), etc. As you can probably imagine, the controversial points generated a lot of
questions and heated debate. Like many great talks, it energized the audience and suggested new lines of work on areas including architecture, compiler design, human-computer interaction, networks, scientific computing, programming languages, and (of course) algorithms.

Labels: aggregator

my slice of pizza

Sunday, March 20, 2011

DIMACS Parallelism 2020: John Gustafson

1 Comments:

About Me

Previous Posts