Java, screensavers, and virtual supercomputers: an interview with Nelson Minar, CTO of Popular Power
Nelson Minar likes distributed processing. So far in his career, first at the Santa Fe Institute and then at the MIT Media Lab, Minar has done basic research in loosely-coupled distributed processing, trying to get a handle on the nature of efficient distributed computing. At MIT’s Media Lab, Nelson studied with Pattie Maes in the Software Agents group, where his work culminated in the release of Hive, a software platform for creating distributed applications.
Now, he’s created Popular Power, a Silicon Valley company dedicated to linking cycle-wasting idle computers together to form virtual supercomputers. Popular Power will provide a source of income to the computer owners, and high-test computing power to Popular Power customers.
JavaWorld columnist Mark Johnson recently had the opportunity to speak with Minar about his company’s business model, the technological challenges facing Popular Power, and Java’s role in answering those challenges. You’ll find numerous informative URLs, including a link to Popular Power, in the Resources section.
JavaWorld: What’s the mission of Popular Power?
Nelson Minar: Our goal is to take all the computers out there on the Internet, make use of their idle resources, and resell to those who have large computation or network jobs.
JavaWorld: Tell me more about your first application, the influenza virus study.
Nelson Minar: Part of our business plan is to do a mix of nonprofit and for-pay applications. We’re starting out with a couple of nonprofit applications, the first of which is this influenza research. The goal is to develop computer models and to do computer simulation of effective vaccines for the flu on the human immune system to help people develop more effective flu vaccines in the future.
JavaWorld: And so there’s a human immune system model and a flu vaccine model, and the two models crunch against one another in this big distributed supercomputer?
Nelson Minar: Exactly.
JavaWorld: It seems that most distributed supercomputing projects like this focus on one of three kind of main areas: either mathematical problems like crypto cracking, prime number searches, and obscure number-theoretic problems; distributed rendering; or signal processing, like SETI@Home. Why are there so many of these projects in just these small areas and what new categories of projects do you envision for the future?
Nelson Minar: One of the early challenges of building this type of system was to find large problems that were easy to break up into lots of little bits of work. And in particular, with the earliest projects, like the key cracking, they were also trying to keep the code size and data size down, to make the networking problem simpler.
These problems we mentioned, like key cracking, were great examples of that. SETI@Home changed the model a little bit in that they started shipping around a fair amount of data. The average data set size in SETI@Home is about 300K. There we started to see more complex data processing and distributed computing. Popular Power brings a new capability to this idea. We’re using mobile code to change the kinds of jobs that your computer runs.
Instead of having to get one carefully-tuned client that works exactly right and ship that out to everyone, we ship a metaclient, something that’s able to do any kind of work. When we tell a client to do more work for us, we don’t just send them some data, we also send them the program to run. And that allows us to have a very flexible distributed system, where we can dispatch different jobs all the time.
JavaWorld: So this is actually a coarse-grained, loosely coupled distributed computing infrastructure, not just an app.
Nelson Minar: Exactly. That’s the key factor distinguishing our technology from what’s been done in the past. We have a general purpose infrastructure, rather than individual applications.
JavaWorld: Obviously, problems that require low latency or involve data that are not cleanly partitionable, as you said, aren’t appropriate for this kind of approach. Or are they? Are there any other nonrealtime, computationally intensive problems for which you think this kind of computing is better suited, or worse?
Nelson Minar: You mentioned a couple of problems that are hard for this kind of architecture. One of them: if you need incredibly fast response, like under a second. Obviously, that’s not going to work very well with our sort of distributed system. The other one is when the problem does not break up into individual pieces. That’s not very effective on our system.
However, we’ve found that many times if you look at these problems in a different way, it turns out we could be helpful anyway.
JavaWorld: I read recently where someone said a supercomputer is a device that turns a computing problem into an I/O problem.
Nelson Minar: [laughter]
JavaWorld: That’s interesting because you’re creating a supercomputer out of nonsupercomputers. Can you describe Popular Power in light of that comment?
Nelson Minar: We’re taking thousands and thousands (or up to millions, at some point), of unreliable, small computers out there on the Internet, and we’re ringing them all together with our dispatcher architecture to build one large, reliable supercomputer-type system. And, so, very simply, we get lots and lots of little jobs, and we dispatch them out to the clients. And most of our server-side technology is about how to do that dispatch effectively.
You mentioned I/O-type problems. All computing problems involve some sort of I/O. The Internet distribution actually puts higher requirements on that, right? Internet links are much slower than local area networks, and so part of the requirement in building a job into Popular Power is making sure that you manage that I/O. And, there are some problems for which that’s not possible. If you need to work on a terabyte database, we’re not going to be able to distribute that very effectively today.
But again, often, you could break up the data into smaller chunks and ship each chunk to a different node and that works fine.
JavaWorld: That’s interesting about the I/O. You know, it’s one thing to get organizations to donate processor time they’re not using anyway. But it’s another to ask them to put up with additional network traffic. With that in mind, how do you answer this potential objection in an organizational setting?
Nelson Minar: On an organizational side, we’re providing tools so that an IT manager can control what the Popular Power clients on their network do. So, for instance, if they want to enforce hours of operation, the clients [could be] only allowed to run in the middle of the night. We offer that opportunity for an IT manager as a way to control the nodes on their network.
JavaWorld: If I’m an IT manager, it’s easy for me to say, “Don’t run those stupid things, because I don’t want to be messing with this stuff.” What’s your story with them?
Nelson Minar: Popular Power actually provides an IT department the chance to make some money with their computers. Because they’re able to use the spare resources of their own computers and get paid for that time. So there’s a financial incentive for an IT department to use this on the public Internet.
We also have a local intranet version of this system where a big company that does drug design, for example, might have tens of thousands of computers on their intranet, internal network, and they would run Popular Power solely within their own network — and do their own work inside of it. We’re seeing a lot of demand for that type of capability.
JavaWorld: I see. So you can go to someone like, say, Pfizer, or Burroughs-Wellcome, and say, “How would you like to turn all these computers that are sitting on overnight, heating up your offices, and turn them into a virtual supercomputer?”
Nelson Minar: For your own application.
JavaWorld: With all the various distributed supercomputing projects appearing, (following the success of SETI@Home, again), what do you think are going to be the deciding factors in attracting participants to a company like yours?
Nelson Minar: There are several key points, I think, that are going to encourage people to run our system.
On one hand, you want to have the system not impact your own life on your own computer. If somebody’s running Popular Power at home they probably don’t want to be bothered by it when they’re using their computer on their own. So, for instance, Popular Power, especially in Windows, runs as a screen saver. It only activates when you’re literally not using your computer.
We’ve also put a fair amount of work into making sure that as soon as you move the mouse or return to the computer, it goes out of memory immediately. It’s the “you don’t want Popular Power to annoy you” incentive.
Another part is, “How does it improve your life? How does it help you?” There are a couple of different incentives here, the simplest one is that we’re going to pay people for their time. So, if you run Popular Power, you can make somewhere between and 5 a month. That’s 50 percent off on your Net bill, if you’re on a dial-up.
Another incentive is the ability to contribute to these nonprofit projects — that’s actually been the primary incentive in the past. People just volunteer to do this stuff because it was fun, or they wanted to help out. It’s amazing how many people really are motivated by that.
JavaWorld: So, more incentives?
Nelson Minar: Actually … the first of these distributed computing efforts I ran was called DESCHALL. It was back in 1997, cracking a DES key. And there was this wonderful war that was going on between MIT, where I was a student, and Carnegie-Mellon, where we were running neck-and-neck. And, every week, they’d publish new stats, and one of us would pull ahead and start teasing the other people, and say, “Yeah, your computers are too slow. There are not enough of you.” And they would work really hard to sign up a bunch of people and beat us the next week.
And, that kind of competition was great for the project, right? Suddenly thousands of more computers are signing up.
JavaWorld: So, you’re harnessing not only excess compute time, but excess testosterone?
Nelson Minar: [laughter] There is a certain geek pride in that kind of thing. Again, it’s the feeling that you’re part of something bigger. In this case you’re not just part of all of Popular Power, but you’re part of the neighborhood, which is your local friends. And you can see how the neighborhood’s doing as well.
JavaWorld: Let’s go technical now. What projects have you worked on in the past that have influenced the design of the system?
Nelson Minar: The big project there is the Hive project, which was at the MIT media lab, which I headed up. Hive is an open source toolkit. You can actually get it off the Net [see Resources]. And what I was doing with Hive was trying to build peer-to-peer, decentralized systems with mobile agents.
It’s a lot of the same rough technologies. You have lots of nodes out there that are all contributing to a larger network of computation. You have mobile code moving around the system. With Hive, I was trying very aggressively to build a really decentralized system. What it came out looking a lot like was Jini, but with a different design philosophy underneath it. I was working on device integration networks, so the applications were similar to some of the early Jini work.
JavaWorld: What sort of hardware are you using for your servers and where did it come from?
Nelson Minar:The servers are actually also built on Java. We’re using WebLogic Server as our server platform. And the hardware they run on is currently VA Linux rack mounts. That will probably change some time in the future, but that’s what we started out with.
JavaWorld: So you don’t anticipate moving from one hardware server platform to another to be a concern?
Nelson Minar: No, not at all. The server is designed to be easily installable.
JavaWorld: Because of the portability of Java?
Nelson Minar: Correct. The big issue for us is that we need a server-side Java VM that is reliable and scalable. And the Linux Java is a little farther behind than I would like it to be. So, you know, at some point, we might switch over to Sun, just to get a more reliable Java environment.
I’m a big Linux fan, just as an aside, here. And I really much prefer to use Linux hardware and Linux OS, but as I said, the VMs are a little further behind in Linux right now.
JavaWorld: Does your system allow a node to work on several different projects, depending on whatever needs doing, or what your priorities are? Or does a node have to be signed up to do a particular project?
Nelson Minar: Oh, no. The nodes are able to switch projects at will.
JavaWorld: And they can switch projects? This week I’m doing influenza, and key cracking next week, and it just takes turns? Or do I need to switch projects, manually?
Nelson Minar: On the client side, you don’t need to switch anything. There are a couple of ways that you interact with the jobs in our system. First of all, there’s a screen saver that could change by different jobs. If it’s running influenza, we have a flu screen saver. If you’re running — let me pick a simple example — some sort of search-engine application, you might see some animation of the Internet being searched. So you would see different jobs on your screen.
One of the unique things about Popular Power is we have a way for the user to express their preference about what kind of job they run. You can choose, “Do I want to run more nonprofit applications, or more profit applications?” There’s a little slider in the preferences panel.
And that way, you have some say over whether your time is being donated to the public good. Or are you going to be making money with your spare time? But, in terms of the technical architecture, every single work unit that we hand you, every task, can be different. A different job, different code, everything.
JavaWorld: We’re going to start sliding towards Java now. How do you prevent users from modifying what their node is doing? How do you keep people from hacking it, either because they’re trying to be nice, or because they’re not?
Nelson Minar: This is a really interesting problem in distributed systems with mobile code. I often refer to it as the problem of protecting the program from the computer it’s running on. The deep answer is that at some level, you can’t 100 percent prevent someone from hacking the client. They own the computer. They have complete control over the system. I believe (and some mathematician would have to back me up), but I believe that it’s technically impossible to do that exactly.
Instead, we have a bunch of strategies to manage that risk in the client. Part of it is that the client is obfuscated. It’s a little hard to tell what’s going on. A fair amount of effort would have to be put into hacking it. I’m sure someone could figure it out if they wanted to; people reverse engineer everything. But, the license forbids you from doing that, and the code is obfuscated to the point where it’s hard to do that.
The major line of defense for us, though, is to exploit the redundancy of our network, to verify the work that’s done. So, if you hand out the same task to several different clients, and the results come back and are not the same, then you know something funny happened. There are some obfuscation and some encryption techniques that can make it pretty hard to do any manipulation of the job. I mean, hard enough that someone would have to have a real incentive to do that. For most of the kinds of jobs we’re talking about, people don’t have that incentive. It’s a risk management strategy that I feel really, really comfortable with.
JavaWorld: How did you choose Java for your implementation?
Nelson Minar: The key reason for using Java on the client is that it provides a security sandbox. It provides a way to restrict what the code can do. So, part of our system is that we’re taking code from customers and then pushing it out to these thousands of computers all over the Internet. We don’t want to be in the position of having to read every customer’s code and make sure that it doesn’t do anything wrong. And I’m not worried about our customer trying to steal stuff off somebody’s hard drive, or being malicious. I’m just worried about simple bugs in their code. Those are both grouped under the term Byzantine failure.
The only way to securely make sure that a Byzantine failure doesn’t harm the client’s machine is some sort of restriction like the Java sandbox. And Java’s the premier technology for doing that, and so that’s our choice.
JavaWorld: What has been the greatest challenge with your Java limitations?
Nelson Minar: Again, on the client side, I think the greatest challenge doesn’t have anything specifically to do with Java. There’s a lot of failure handing you have to do. So we’ve got this giant distributed system, and we want to make sure, if something goes wrong, that the error-recovery strategy is appropriate. Java makes that easier with the exception framework, right? So, if you manage exceptions through your system well, then that works.
The big rule for exception handling is something I don’t see written in Java books very often: the message should rethrow exceptions. When I started writing out to code in Java four or five years ago, I would always catch the exception, and print an error and then continue. And, one day, a while ago, I realized that’s the wrong thing to do. You want to throw the exception further up, until someone can handle it. And, once I figured that out, it became very easy to write code that handled errors in a more appropriate way.
It’s just a little tip that good programmers know, and it takes a while to learn. So, the Java exception handling allows us to catch those errors, but you still have to do a lot of distributed systems design.
A simple example: The client tries to connect to the server, and the server’s down, right? For some reason, the Net’s off line, or whatever. You can’t get to it. When should the client try to reconnect? And if you do that wrong, then you can end up in a situation where all of your clients are trying to connect to your server right when it comes back up. And that’s a big problem, so you have to do careful backup strategies.
JavaWorld: So, basically, the hard part of your system is getting that whole distributed computing piece right. And it’s not a concern about Java; you’d have to do that in any other language. It sounds like Java actually makes that easier.
Nelson Minar: Java makes that easier, certainly. I’m trying to think of something that Java would make harder for us.
JavaWorld: Because of the p-code thing, with Java everyone always jumps on performance. So why don’t we hit that?
Nelson Minar: Performance is certainly one issue. Actually, I think at this point it’s not really a problem. Certainly, in general, Java performance is pretty good. I think if you take any sufficiently complicated program, a Java program’s going to run as fast as any C program. Programs are not slow because the execution environment’s slow. It’s more the algorithms and the design that go into it. Certainly the JIT technologies, like HotSpot, have made raw Java performance pretty good.
The other thing is, with Popular Power, we have just so much computation that you could afford to be a little slower. So, in one sense, we’re trading off a little bit of speed for a much greater sense of reliability and security, and therefore more computer supply. And so that trade-off works really well for us.
JavaWorld: That’s great. So, what JVMs are you using?
Nelson Minar: Currently, on the Windows client we’re using the stock Sun JRE 1.2.2. On the Linux client, I believe we’re using the Blackdown 1.2 release. I forget which version of that. On the server, we’ve actually been experimenting with a bunch of different VMs.
JavaWorld: To what extent are you finding differences between the VMs?
Nelson Minar: I think all the VMs we’re using are all derived from Sun code right now. And we’ve had some experience with the IBM VM in there, too, so I should probably toss that in.
We’re finding that all of those really work very effectively, in that they all implement Java 2 correctly. The real difference we see is performance and how fast they run, which is a function of the JIT. And then also, how reliable they are. And, some of them are better than others.
JavaWorld: Do you have concerns about scalability, then?
Nelson Minar: I’m not too worried about scalability. Basically our server looks a lot like a Web server. The clients post an HTTP request to us and we respond with a document. And people know how to build really big scalable Web servers. That’s the biggest thing people have been working on the last four years. So we use WebLogic Server as our core server technology, which has some really nice clustering technology, some fail over, and a very efficient design, especially in the I/O subsystem. So that scales really effectively. And this is a great example where server-side Java has been successful.
JavaWorld: What are the other basic high-level requirements for your design? It sounds to me like the two basic high-level requirements that I hear are security/error handling stuff; and code mobility. And Java obviously meets those things with the sandbox. And then, of course, the second one is just your programming and class loaders. So, what other high-level requirements are missing from that description of what you’re doing?
Nelson Minar: On the client side, actually our client breaks up into two pieces. There’s the Java portion and then a native code version. On the Java portion, I think we’ve hit the major ones. The last one I would throw in there is ease of network programming, and the fact that Java has high-level objects for sockets and things makes it a lot easier to write network code in Java.
The other requirement on the client side, which we’re not using Java for, is the interface to the operating system. On Windows, we have a little program that’s a screen saver. And when the Windows invokes the screen saver, that in turn invokes a Java VM underneath to do the work. And so that native code there is stuff that knows about, like, the Windows registry and how to be a screen saver in Windows, and how to catch mouse events, and that kind of thing. And we’re not using Java for that, because it’s really too native-code specific. In the Linux version it’s just a shell script. I mean, it’s pretty simple. So, that’s another client side requirement. Again, Java doesn’t match that so directly.
There’s another piece of this which we haven’t talked about yet, which I think maybe you’d be interested in. We also, in the current system, require that the customer’s code be written in Java. And you asked about performance of that, and that works well. Often times, it depends on the vertical market, whether that customer’s application code is in Java now or is easy to port to Java.
When people think supercomputing, they often first think of FORTRAN. Well, that’s the old-school way of thinking of things. And we’re finding that in a lot of markets, like the financial industries, people’s new performance-intensive code is being written in Java. And there’s actually an interesting history here, where, especially on Wall Street, people were working in Smalltalk, and then they moved to Objective C, and now they use Java. It’s the parallel evolution. C/C++ is what most people think of. But we’re finding that in several of the vertical markets we’re looking at, a lot of the newer customer code is being written in Java already. So that works really well for us.
JavaWorld: What about XML?
Nelson Minar: Yeah, actually, we’re using XML in an interesting way. As I mentioned before, our communication protocols are wrapped in HTTP. Those messages themselves consist of XML and serialized Java objects. So, we use XML to send metadata around the system. It’s a nice, formatted data stream, and then we serialize Java objects consisting of the actual code that we’re using.
JavaWorld: Anything else?
Nelson Minar: You’ve done a very thorough job covering the space. I guess the one other thing I’d like to emphasize is that we’ve been up and running with this stuff since April, so we actually have a fair amount of experience now in keeping a network like this working. We’re doing real jobs with, I think, real socially relevant computation in the nonprofit side, and we’re working on the customer relationships to be the first company to do paying jobs as well. I feel like we’re in a pretty strong position. Java has been a big help for us in doing that because it gave us the head start in the technology that we needed.
JavaWorld: OK, great. Well, I think that’s about it. Thanks very much for your time.
Nelson Minar: Thanks.