Making speech technology mainstream
General Magic’s CTO talks about speech technology, Web services, and choosing between .Net and J2EE
GENERAL MAGIC, A public company since 1995, proudly boasts another General — General Motors — as one of its premier partners and customers, but it wasn’t always so. Founded in 1993 by Apple, AT&T, Sony, Motorola, Philips, Matsushita, NTT, France Telecom, and British Telecom, the company was targeting innovative GUIs for handheld devices. In 1997 the company shifted its focus to voice technology and developed one of the first unified voice messaging platforms, magicTalk, and its product Portico.
By the time the new century rolled in, however, it became obvious that unified messaging was not going to set the world on fire, and when a new CEO, Kathie Layton, was appointed to head up a turnaround, she in turn tapped Pat Haleftiras as the new CTO. Layton and Haleftiras saw the potential for an enterprise software play using Java componentry to take speech technology mainstream. Neither Layton nor Haleftiras have wavered from that vision, and now the CTO, with 25 years of software development experience under his belt, is right in the thick of where the action is with products for Web application servers and Web services. InfoWorld sent Editor at Large Ephraim Schwartz, whose daily beat includes speech technology, to talk to Pat Haleftiras about the company’s new direction and its newest speech-enabled solutions.
InfoWorld: General Magic’s myTalk platform is currently being deployed by General Motors, is that correct?
Haleftiras: Yes, myTalk is the technology behind OnStar’s Virtual Advisor, which is the largest-in-production VXML [Voice XML] deployment in the world today. It is a marriage of Java and VXML. It is over the course of helping OnStar with the Virtual Advisor that we came to understand the dynamics of what it takes to build an enterprise-class, large-scale voice application. We took that experience and it now manifests itself in the magicTalk Enterprise Platform which we have taken to the general market.
InfoWorld: Then you selected J2EE (Java 2 Enterprise Edition) over, say, Microsoft technology?
Haleftiras: There are only two viable software component models in the enterprise: .Net or J2EE. We chose to go down the J2EE path.
InfoWorld: Why?
Haleftiras: The predominance of enterprise environments in our experience is J2EE. We are a small company and we need to pick our spots, and we focus on the marriage between VXML and J2EE. The solution can dynamically generate VXML [responses]. We can accommodate another standard [SALT] if one emerges, but the underlying engine that drives everything will be a J2EE platform.
InfoWorld: So it runs on all the standard J2EE application servers?
Haleftiras: BEA, IBM, and Jboss.
InfoWorld: Not Oracle?
Haleftiras: No. At the time when we first started IBM and BEA were 60 percent of the marketplace. We brought Jboss, [an open-source app server] to have a lower-end, lightweight solution because there are a lot of call centers without a lot of infrastructure. For $500 you can get service and have access to source code.
InfoWorld: Voice technology hasn’t exactly set the world on fire. What do you think is happening to change that?
Haleftiras: Voice evolved from a black box point solution to today, [where] you can take our software and layer it on top of a J2EE app server and it becomes IVR [Interactive Voice Response] on steroids.
InfoWorld: Steroids? Please explain.
Haleftiras: Traditionally, the voice would instruct a caller to press 1. The next level added a voice recognition box to the black box. Now a caller could “press or say 1,” and that is still the predominate voice-enabled system out there today. But over the last several years they have gotten more sophisticated. “Press or say 1” is turning into a “welcome to the XYZ company, would you like sales or service?” We are beginning to craft an interface that has higher usability with more complex interactions with the user. And now we are just entering the third stage of development.
At this third stage the idea of standards based-implementations are emerging. The metaphor for VXML is Web development, where you have an HTML page with GIF files and headers and tags and when an HTTP request comes in those components get pushed out to the browser. VXML is similar. Now a VXML page gets pushed out to a voice browser. And a voice interpreter resident in the network or in, say, an AT&T facility are used. We host Virtual Advisor in our facility and we have racks of voice gateways in our hosted facility. Those gateways house the VXML interpreter. As the VXML page is pushed to the server it is interpreted, turned into an analog signal, and sent out to the phone line and that is what you hear — pushed to wire line and cellular.
InfoWorld: And how does a typical enterprise company benefit from instituting voice?
Haleftiras: The beauty for enterprise is it turns all wire line and cell phones into enterprise access devices. The benefit is to extend the reach of enterprise services to a much broader audience: customers, partners, suppliers, and employees.
InfoWorld: Where do you go from here?
Haleftiras: Remember, you used to have HTML pages that were static. Now with J2EE and .Net platforms you have server-side processing going on that ties into back-end systems and constructs the HTML system dynamically based on back-end systems or by information they were gaining from the actual interaction with the user. So what we have done is construct a framework that brings all that to the voice world. Literally we can adapt the voice user interface, based either on back-end events or based on info gleaned from conversations going on. We can have a conversation and as we learn more about the individual based on these two things we can change the conversation.
InfoWorld: But will the enterprise have to invest even more money in infrastructure?
Haleftiras: This represents bringing voice to the Web infrastructure. The enterprise already has spent huge amounts of money building their Web application infrastructure with racks of servers loaded with WebLogic and WebSphere and supporting both customer- and internal-facing applications. Now they are not only leveraging the hardware and software but also the people, bringing a new dimension of ROI to the infrastructure they invested in.
InfoWorld: So you can leverage the back end, but what does it bring to the front end?
Haleftiras: Even today with proliferation with Web and Internet technologies, 76 percent of stake holder interaction with an enterprise occurs over the phone. Even with the expected growth rates of Internet and the Web, the phone will remain the predominant channel for the enterprise, and that is why they need to pay close attention to this.
InfoWorld: Let’s get into some of the details. You say the magicTalk Enterprise Platform is architected on the J2EE platform. Can you tell the readers what that means to them?
Haleftiras: We represent voice-specific application architecture. We’ve crafted everything from frameworks to best processes to libraries of reusable componentry that enable a Java developer to build voice applications that can be loaded onto the J2EE app server. J2EE wasn’t designed for this, so we provide the bridge between standards-based J2EE and voice apps.
InfoWorld: So a Java developer can add speech components. But Java developers aren’t speech technology experts. Don’t you still need a speech engineer to design a program that uses a voice response interface?
Haleftiras: Manifest in the Java components are Talklets, dialog components, and within them we provide the prompts and grammars down to the smallest dialog component. If you want to build a credit card app, you need to get digits. We provide a component that is called Get Digits. We provide the No. 1, for example, in a wave file with seven different intonations. Remember, depending on where the one is in a number at the beginning, middle, or end, it is pronounced in a different way and that is brought to the Java developer. He doesn’t need a PHd in speech to know that. It is incorporated in the reusable componentry and we are adding more tools to manipulate responses.
InfoWorld: Is there a J2EE developer kit that works with Sun’s J2EE Developer?
Haleftiras: Those are the integrated development environments, like Forte for Java. Our Java components can be introduced into those environments, leveraging what the Java and the Web development community uses.
InfoWorld: Will there be a Web services standard for adding speech applications to a service?
Haleftiras: The standards that are evolving today represent a new programming model, and those standards that are evolving can be accommodated by the voice community today. We are already communicating with large Web services with SOAP [Simple Object Access Protocol].
InfoWorld: What do you see as the benefit of Web services?
Haleftiras: The idea is you can take application functionality from multiple places and string them together and build a virtual process that doesn’t exist in any one place. Now think about that Web service with voice communications built around it. This is where it is all-going — as you start building inter-enterprise Web services applications you wrap those Web services with a conversational voice interface.
InfoWorld: Companies are thinking now about their portal strategy and how they can add depth to portals by making them the integration point behind the UI for many applications. Where does magicTalk fit into that?
Haleftiras: The portal companies ought to be talking to us. Sun and the voice channel should be an addition to those channels. I believe it is imminent. I can’t say more than that at this point. Voice should be strategic to any multi-channel addition, and that is what portals are all about. Now with the magicTalk platform it is easy to incorporate voice into an enterprise portal offering.
InfoWorld: Call centers and customer service in general are highly expensive undertakings. GM uses live agents to respond when a customer pushes the blue OnStar button, and it is rumored they are losing money big time on OnStar and looking for more ways to automate it. Where does General Magic fit in?
Haleftiras: I would love to talk to you about OnStar but I can’t. Contractually we host OnStar’s Virtual Advisor. We are precluded from talking about it. But the call center concept and reducing cost was the original value proposition for the IVR players. We can take that value proposition and make a quantum leap forward. If IVR offloaded some percentage of phone calls, we expect a significantly higher percentage of phone calls to be offloaded due to voice. The IVR players were able to handle fairly straightforward requests for information. Now with the ability to have more complex interactions we can offload those that previously required human interaction. For example, today if you want to do renewals and payments on accounts we turn the interaction into a transaction, when previously you had to get a human involved on this.
InfoWorld: So at the end of the day, is that the benefit of voice technology to the enterprise?
Haleftiras: The call center and automating a larger percentage of calls is the low-hanging fruit and is where everyone starts and is focused. But the bigger picture is about productivity and business process efficiencies, turning a 60-second process into an 8-second process with speed dial to access information. If you have a few hundred thousand employees and each employee has 10 interactions at 130 seconds per interaction per day, if you can save 100 seconds per call, per employee, per day [those are] pretty big numbers. The beauty for the enterprise is they can attack the call center in the near term, but the long term will be in other dimensions. [They can] get a short-term ROI and set themselves up for long-term ROI as they deploy more applications on the same platform.
InfoWorld: What keeps you up at night?
Haleftiras: Well, in addition to my technology duties I’ve taken over the enterprise sales team. I am in the trenches and looking at the customer’s face, and this is unique for a CTO. We are seeing a lot of activity, but the enterprise sales cycle is a good nine to 12 months plus. GM is 120 people, so the dynamics of all of that keep me up, but I’ve also learned that technology for the sake of technology doesn’t cut it.