Computing on the edge

CTO Justin Chapweske talks about Onion Networks‘ WebRAID end-to-end content delivery solution

EVERYONE SEES APPLICATION logic moving out to the edge of the network over time, but the prevailing wisdom of the day is that IT organizations will have to rely on CDNs (content delivery networks) to make that happen. Justin Chapweske, CTO of Onion Networks, disagrees. He argues that IT organizations can move application logic to the edge of their own networks using software developed by his company. In an interview with InfoWorld Editor in Chief Michael Vizard and Test Center Director Steve Gillmor, Chapweske, who was a pioneer of peer-to-peer networking with a product called Swarmcast from startup OpenCola, explains why WebRAID technology from his new company is ultimately the most cost-effective way to deliver computing at the edge.

InfoWorld: What is Onion Networks?

Chapweske: Onion Networks is an enterprise content delivery company, and our technology really focuses on leveraging and augmenting existing infrastructure investments within an enterprise such as caches, Web servers, and even desktop PCs. We use that to move content quickly, reliably, and as cost effectively as possible. So rather than building out this whole huge new network, we look at what’s already in the enterprise, all of those existing network resources, and leverage them to provide a solution.

InfoWorld: How does this differ from the approach being taken by the providers of CDNs?

Chapweske: You don’t have to go and build a $100 million Akamai-like thing just for your own business. You’ve already got all these caches in place. Our technology really allows enterprise application developers to integrate CDN technology directly into the enterprise software. So not only is it for enabling content delivery applications, like distributing training videos and that sort of thing, our technology can actually integrate very well with back-end enterprise applications to give them the ability to move around huge amounts of content.

InfoWorld: How would you describe the software architecture you have put in place?

Chapweske: We basically have an end-to-end content delivery solution. Most CDN technologies essentially have a content delivery cloud, where you have all these servers that are moving this content between each other, and then client machines access this cloud and use HTTP to download the content from the cloud. Now within those types of CDNs, you have essentially a last-mile problem, where if the connection between the cloud and the … content consumer becomes congested or goes down, there’s no logic on that last mile to be able to compensate for that. Our technology is actually very end-to-end, so we have technology that can sit on the content consumer side. This is what’s called a WebRAID technology, which transfers content reliably between the storage and the operating system, and very quickly. WebRAID technology is essentially RAID over HTTP. That allows us to transfer content from multiple Web sites, multiple caches and peers on the network in parallel. So we get very fast downloads and we can get very reliable downloads. In the case of a traditional CDN, where you might have congestion on this last mile or that link may go down entirely, the WebRAID technology is intelligent enough to find all sorts of other ways to get that content on the network. It can aggressively locate content on the network. And reliability is incredibly important, because if they’re transferring multigigabyte files using just straight HTTP libraries and that download fails, usually you have to start all over again. So we can integrate this WebRAID library directly into enterprise applications to provide that fault-tolerant content delivery.

InfoWorld: What impact do you foresee Onion Networks having on the deployment of Web services?

Chapweske: [WebRAID] allows you to create essentially content-driven Web services that move very large amounts of content and have it still be economical to do so, instead of a totally centralized approach where you have a single server spitting out all the content.

InfoWorld: One of the hardest things to deal with on a network is video. How does Onion Networks help distribute video efficiently on a network?

Chapweske: One of the really unique things about WebRAID is we can actually do parallel streaming. We can use this technology to download video files from multiple points in parallel. WebRAID can actually download content from T1 lines in parallel, giving you super-fast downloads in the range of multimegabytes per second. Furthermore, if it’s a video- or an audio-type application, you can also view the content as it is being downloaded. That’s huge for training, for video on demand, and those types of applications. Jibe announced they have teamed up with us to create a media delivery application specifically for video on demand. Our technology enables you to cost-effectively distribute full-blown movies. That’s not something that you can do with an Akamai-type system; it’s just too expensive. And it’s not something you can do without a parallel download system because it’d be too slow and the user experience wouldn’t be there. We also think training is going to be an absolutely phenomenal market. Instead of flying everybody everywhere and dealing with the hassle and the costs involved with that, companies can have a plug-and-play solution that they can deploy to 10,000 employees to distribute a video, literally overnight.

InfoWorld: Akamai intends to make its solution available to purchase and install behind the firewall. Why should IT organizations opt for your approach over Akamai’s?

Chapweske: There’s always going to be the cost perspective. We have a software-based solution that we can stick right in place and [use to] leverage existing enterprise resources without a huge infrastructure. We can take content from the network and deploy it on the enterprise in a very fault-tolerant and cost-effective fashion. If you’re not using a technology like WebRAID, your applications have no way to deal with network errors or to find other routes because Akamai is a purely DNS-based solution. If you look up one address and that address is not accepting at the moment because maybe that link is down on your LAN, our WebRAID technology could actually discover other replicas in the network and make that content distribution totally reliable. So this is going to provide an ROI that is much, much quicker than a hardware-intensive cache-based CDN. If you’re talking about organizations that have many small branch offices, it might not make sense to deploy a hardware-based solution to all of those different branch offices. But our software-based solution can go to those places with zero administration cost, so it’s very flexible

InfoWorld: How secure is your approach to content delivery?

Chapweske: Because it’s a software-based solution, we can have intelligence at many times more points in the network than a cache-based solution can. Our technology is actually much more secure than a cache-based solution because you may have thousands upon thousands of servers and caches out on the Internet, but from an administrative perspective, it’s basically impossible to assure that none of those caches are being hacked into. With our Content-Addressable Web technology, we can automatically detect tampering and corruption and we can repair it. When you’re talking about thousands of machines, you need to have a technology in place that can assure that even if these different machines are compromised, the content’s going to be delivered intact, and that’s what our technology does. The ability for us to have that many more points of intelligence in our network allows us to monitor the network much more closely, and we can actually locate other replicas on the same subnet.

InfoWorld: What does it take to deploy your software?

Chapweske: It’s very trivial. It’s simply a Web browser plug-in. You don’t have to deploy any extra hardware. There is just one machine that sits on the back end, that’s it. And that machine on the back end, you simply point it at your existing Web service, and it’s a proxy. It automatically discovers everything from there. We’ve really focused everything on just plug-and-play and leveraging all those existing intranet sites. All you have to do is modify the Web pages on your intranet and the system automatically discovers and distributes it. I guarantee you that with the current networks that exist in companies [and] the current desktop machines and the current caches, there’s so much extra storage and capacity available that this technology can optimize, the ROI is just instantaneous.

InfoWorld: Where will your technology manifest itself in a more public way?

Chapweske: We’re going to build a system called the Open Content Network, which is going to be the largest CDN in existence. We’re going to be using it to distribute open-source code, public domain information, and Linux distributions. This will provide a great showcase of what the technology is capable of. It will honestly allow system administrators to download freely available content faster than commercial content.

InfoWorld: How do expect this technology to change the way ISPs think about offering services?

Chapweske: One thing that we’re seeing is that more and more ISPs are looking at their business models and seeing that providing content delivery capabilities to their customers is a huge value-add. What they want to do is to be able to team up with other ISPs to create interoperable CDNs that they can transfer content between each other and provide a real value-add for their customers. But having to trust every single other CDN to secure your content is folly. By using our Content-Addressable Web technology, which we’re pushing forward as a standard, you can do that securely. This is a concept that we call the transient Web.

InfoWorld: How does the concept further the development of wireless applications?

Chapweske: The concept [involves] some very powerful yet simple extensions to HTTP that allow you to download content and allow you to discover mirrors of content on the network. Machines go up and down all the time; they’re very unreliable. And to distribute content from those types of unreliable machines, we have the Content-Addressable Web to reliably deal with distributed content from those machines. If they’re going up and down, we can grab a part of a file from one computer, another part of a file from another computer, and be able to reconstruct the whole thing. So it’s very reliable in the face of horrible network conditions. It’s perfect for wireless-type applications where you have more and more users roaming around, [who] are mobile, coming in close contact with each other and wanting to distribute content.

InfoWorld: Where do multicast and other Web standards come into play?

Chapweske: We’re one of the very few companies that has expertise in reliable multicast and we’ve been working with a lot of companies to push forward multicast standards. We’ve got a number of technologies that we’ve contributed to those standards, such as forward error-correction encoding, especially in terms of the peer-to-peer content delivery. We’re really pushing forward the Content-Addressable Web technology as being a major standard in this space. We’ve seen no other proposal to date that comes anywhere near the capabilities of what the Content-Addressable Web does. It will allow you to securely distribute content from completely untrusted caches. Once these technologies are built into existing caches, Web servers, and those sorts of things, the types of applications you can build are simply mind-boggling. Instead of sending out little XML documents between the Web services, we’re talking about full-blown, multigigabyte medical images and stuff like that. What we’re really doing [with] this is just trying to incrementally enhance the Internet and be able to have future interoperability with other people’s products. … The biggest thing that this will provide is that people will be able to write Web services that deal with huge amounts of content, and don’t have to worry about all the problems of sites going up and down, how fast they are, and those sorts of things.

InfoWorld: How does all this fit into the concept of the Semantic Web?

Chapweske: The Semantic Web is sort of the end goal of this whole thing. The goal is to make it so that computers can understand every application and piece of content on the network. What the Content-Addressable Web really allows is for the Semantic Web to reliably retrieve that content. A key part of the Content-Addressable Web is that content itself is given a unique name. It’s not based on the location. So no matter where a piece of content exists on the network, computers can address that content and access that content and not worry that the original Web site might be down. In order for the Semantic Web to deliver very large amounts of data, you need the facilities that the Content-Addressable Web provides. So it’s a layered approach.

InfoWorld: So at the end of the day, how is this approach any different from the first run people made at peer-to-peer computing?

Chapweske: The technology we’re doing now is really a second generation of the whole thing. A lot of the other peer-to-peer companies that are pushing content delivery solutions have fairly immature solutions. A lot of the focus I’ve seen at most companies has been purely media distribution, and they’re very proprietary systems. They don’t integrate well with existing infrastructure. They require you to use their own applications, their own user interfaces, whereas our technology is completely designed to integrate with everything. If you already have a network of mirrors and caches, our system will simply augment that and provide reliable transfers. It’s very transparent. There’s no special end-user application that the users have to use. That makes it a lot more compelling to do peer-to-peer content delivery within the enterprise.

Source: www.infoworld.com