Program multimedia with JMF, Part 1

Go multimedia by learning how the Java Media Framework compares to your stereo system

The Java Media Framework (JMF) is a large and versatile API used to process time-based media. However, JMF’s complexity can take weeks to understand. With that complexity in mind, this article introduces JMF the easy way. We start by drawing an analogy with a stereo system, and then proceed with discussions of JMF’s most common interfaces and classes. Finally, we’ll see working code that demonstrates part of the API’s capabilities.

This article, the first of a two-part series, focuses on the JMF architecture. Part 2 will focus mostly on the code that shows how to register the capture devices, and to play and capture audio and video data.

JMF, currently at version 2.1, is Sun’s initiative to bring time-based media processing to Java. Time-based media is data that changes meaningfully with respect to time, such as audio and video clips, MIDI sequences, and animations. Note that Sun recently announced it would release the JMF 2.1 source code under the Sun Community Source Licensing Program (SCSL). Also a complete reference implementation, JMF 2.1 will enable you to do anything imaginable with multimedia. Among others uses, JMF can:

Play various multimedia files in a Java applet or application. The formats supported include AU, AVI, MIDI, MPEG, QuickTime, and WAV.
Play streaming media from the Internet.
Capture audio and video with your microphone and video camera, then store the data in a supported format.
Process time-based media and change the content-type format.
Transmit audio and video in realtime on the Internet.
Broadcast live radio or television programs.

However, before you write a JMF application, you need to fully understand the JMF architecture, its interfaces, and its classes.

The JMF architecture

To easily understand the JMF architecture, take your stereo system as a comparison. When you play a Sarah Vaughan CD with your CD player, the CD provides the music data to the system. This data has been previously captured using microphones and other devices in the recording studio. The microphone serves as an audio capture device. The CD itself is a data source to the stereo system.

The CD player outputs the music signal to the speaker — the output device. However, from the CD player we can also plug in an earphone. In this case, the earphone acts as the output device.

JMF uses the same model. As you read on, you will come across terms such as:

Data source
Capture device
Player
Processor
DataSink
Format
Manager

Let’s look at these terms in more detail.

Data source

A data source encapsulates the media stream much like a music CD. In JMF, a DataSource object represents the audio media, video media, or a combination of the two. A DataSource can be a file or an incoming stream from the Internet. The good thing about this class is, once you determine its location or protocol, the DataSource encapsulates both the media location, and the protocol and software used to deliver the media. Once created, a DataSource can be fed into a Player to be rendered, with the Player unconcerned about where the DataSource originated or what was its original form.

Media data can be obtained from various sources, such as local or network files, or live Internet broadcasts. As such, DataSources can be classified according to how a data transfer initiates:

Pull data source: The client initiates the data transfer and controls the data flow from the source. HTTP and FILE serve as examples of established protocols for this type of data.
Push data source: The server initiates the data transfer and controls the data flow from a push data source. Push data source examples include broadcast media and video on demand.

As we will discuss in Part 2 of this series, several data sources can be combined into one. For example, if you are capturing a live scene, chances are you have two data sources: audio and video. In that situation, you might want to combine these two for easier control.

Capture device

A capture device represents the hardware you use to capture data, such as a microphone, a still camera, or a video camera. Captured media data can be fed into a Player to be rendered, processed to convert the data into another format, or stored for future use.

Capture devices can be categorized as either push or pull sources. With a pull source, the user controls when to capture an image. As an example, think of a still camera where a user clicks a button to take the shot. In contrast, a microphone acts as a push source because it continuously provides a stream of audio data.

Player

A Player takes as input a stream of audio or video data and renders it to a speaker or a screen, much like a CD player reads a CD and outputs music to the speaker. A Player can have states, which exist naturally because a Player has to prepare itself and its data source before it can start playing the media. To understand this, insert a CD into your stereo and play the fourth song on the CD. What would happen? The CD player does not instantly play the song. It first has to search the track where the fourth song begins and do some other preparations. After about half a second (depending on your CD player), you start to hear the music. Likewise, the JMF Player must do some preparation before you can hear the audio or see the video. In normal operations, a Player steps through each state until it reaches the final state. JMF defines six states in a Player:

Unrealized: In this state, the Player object has been instantiated. Like a newborn baby who does not yet recognize its environment, a newly instantiated Player does not yet know anything about its media.
Realizing: A Player moves from the unrealized state to the realizing state when you call the Player‘s realize() method. In the realizing state, the Player is in the process of determining its resource requirements. A realizing Player often downloads assets over the network.
Realized: Transitioning from the realizing state, the Player comes into the realized state. In this state the Player knows what resources it needs and has information about the type of media it is to present. It can also provide visual components and controls, and its connections to other objects in the system are in place.
Prefetching: When the prefetch() method is called, a Player moves from the realized state into the prefetching state. A prefetching Player is preparing to present its media. During this phase, the Player preloads its media data, obtains exclusive-use resources, and does whatever else is needed to play the media data.
Prefetched: The state where the Player has finished prefetching media data — it’s ready to start.
Started: This state is entered when you call the start() method. The Player is now ready to present the media data.

Processor

A Processor is a type of Player. In the JMF API, a Processor interface extends Player. As such, a Processor supports the same presentation controls as a Player. Unlike a Player, though, a Processor has control over what processing is performed on the input media stream.

In addition to rendering a data source, a Processor can also output media data through a DataSource so it can be presented by another Player or Processor, further processed by another Processor, or converted to some other format.

Besides the six aforementioned Player states, a Processor includes two additional states that occur before the Processor enters the realizing state but after the unrealized state:

Configuring: A Processor enters the configuring state from the unrealized state when the configure() method is called. A Processor exists in the configuring state when it connects to the DataSource, demultiplexes the input stream, and accesses information about the format of the input data.
Configured: From the configuring state, a Processor moves into the configured state when it is connected to the DataSource and the data format has been determined.

As with a Player, a Processor transitions to the realized state when the realize() method is called.

DataSink

The DataSink is a base interface for objects that read media content delivered by a DataSource and render the media to some destination. As an example DataSink, consider a file-writer object that stores the media in a file.

Format

A Format object represents an object’s exact media format. The format itself carries no encoding-specific parameters or global-timing information; it describes the format’s encoding name and the type of data the format requires. Format subclasses include AudioFormat and VideoFormat. In turn, VideoFormat contains six direct subclasses:

H261Format
H263Format
IndexedColorFormat
JPEGFormat
RGBFormat
YUVFormat

Manager

A manager, an intermediary object, integrates implementations of key interfaces that can be used seamlessly with existing classes. No real-world equivalent exists in the stereo system, but you can imagine a manager as a versatile object that can interface two different objects. For example, with Manager you can create a Player from a DataSource. JMF offers four managers:

Manager: Use Manager to create Players, Processors, DataSources, and DataSinks. For example, if you want to render a DataSource, you can use Manager to create a Player for it.
PackageManager: This manager maintains a registry of packages that contain JMF classes, such as custom Players, Processors, DataSources, and DataSinks.
CaptureDeviceManager: This manager maintains a registry of available capture devices.
PlugInManager: This manager maintains a registry of available JMF plug-in processing components.

Create a Player

With JMF multimedia programming, one of your most important tasks is to create a Player. You create a Player by calling the Manager‘s createPlayer() method. The Manager uses the URL of the media or MediaLocator that you specify to create an appropriate Player. Once you have a Player, you can obtain the Player object’s visual components — where a Player presents the visual representation of its media. You can then add these visual components to your application window or applet.

To display a Player object’s visual component, you must:

Obtain the visual component by calling the getVisualComponent() method
Add the visual component to the application window or applet

A Player can also include a control panel with buttons to start, stop, and pause the media stream, as well as control the volume, just like the similar buttons on your CD player.

Many of the Player‘s methods can be called only when the Player is in the realized state. To guarantee that it is in this state, you can use the Manager‘s createRealizedPlayer() method to create the Player. This method provides a convenient way to create and realize a Player in a single step. When it is called, it blocks until the Player is realized.

Further, start() can be invoked after a Player is created but before it reaches the prefetched state. start() attempts to transition the Player to the started state from whatever state it’s currently in. For example, you can immediately call the start() method after a Player is instantiated. The start() method will then implicitly call all necessary methods to bring the Player into the started state.

Capture media data

Media capture is another important task in JMF programming. You can capture media data using a capture device such as a microphone or a video camera. It can then be processed and rendered, or stored in a media format. To capture media data, you need to:

Locate the capture device you want to use by querying the CaptureDeviceManager
Obtain a CaptureDeviceInfo object for the device
Get a MediaLocator from the CaptureDeviceInfo object and use it to create a DataSource
Create either a Player or a Processor using the DataSource
Start the Player or Processor to begin the capture process

You use the CaptureDeviceManager to access capture devices available on the system. This manager acts as the central registry for all capture devices available to JMF. You can obtain an available device list by calling the getDeviceList() method. A capture device is represented by a CaptureDeviceInfo object. You use the CaptureDeviceManager‘s getDevice() method to get the CaptureDeviceInfo for a particular capture device.

To use the capture device to capture media data, you then need to get the device’s MediaLocator from its CaptureDeviceInfo object. You can either use this MediaLocator to construct a Player or a Processor directly, or use the MediaLocator to construct a DataSource that you can use as the input to a Player or Processor. Use the Player‘s or Processor‘s start() method to initiate the capture process.

Realtime multimedia processing

With JMF you can also send or receive a live media broadcast, such as live radio and television broadcasts, or realtime teleconferences over the Internet or an intranet.

One realtime media transport characteristic that differs from accessing static data: the realtime protocol does not guarantee all packets will arrive safely. More importantly, how do you make up for the lost data and ensure no large delay in receiving the data? When working with realtime media streams, you play the media data without waiting for the complete stream data to download. Likewise, transmitting across the Internet in realtime requires a high-bandwidth network so the recipient can play the media data continuously.

Realtime traffic needs its own protocol to transfer packets of realtime media streams reliably. The Internet Engineering Task Force (IETF) has devised the Real-Time Transport Protocol (RTP) suitable for applications that transmit realtime data such as audio, video, or simulation data over multicast or unicast network services. RTP, network- and transport-protocol independent, is often used over User Datagram Protocol (UDP).

There’s no guarantee RTP data packets will arrive in the order in which they were sent. In fact, there’s no guarantee they will arrive at all. It’s up to the receiver to reconstruct the sender’s packet sequence and detect lost packets using the information provided in the packet header. Therefore, in addition to RTP, there is also the Real-Time Transport Control Protocol (RTCP) that addresses resource reservation and guarantees quality-of-service for realtime services. RTCP also allows data delivery monitoring in a manner scalable to large multicast networks.

Applications that use RTP can be categorized into RTP servers (applications that need to send data over the network) and RTP clients (those that need to receive data from the network). However, some applications, such as teleconferencing, establish RTP sessions to capture and transmit, as well as receive data.

JMF provides the APIs defined in the javax.media.rtp, javax.media.rtp.event, and javax.media.rtp.rtcp packages for RTP stream playback and transmission. JMF RTP APIs can work seamlessly with JMF’s capture device, players, processors, and processing capabilities. In addition to the four managers described above, there is another manager used to coordinate a RTP session: SessionManager. It keeps track of the session participants and the streams being transmitted, as well as handles the RTCP control channel and supports RTCP for both senders and receivers.

Transmit RTP media streams

There are two ways to transmit RTP streams:

The simplest way to transmit RTP data: use a MediaLocator with the RTP session parameters to construct an RTP DataSink by calling the Manager object’s createDataSink() method.
You can also use SessionManager to create send streams for the content and control the transmission. After you retrieve the output DataSource from a Processor, you call the SessionManager‘s createSendStream() and startSession() methods.

Receive and play RTP media streams

Receiving and playing RTP media streams normally come in one package. Just like static media data, you create a Player using the Manager object’s createPlayer() method, passing a MediaLocator as the argument. As an alternative, you can construct a Player object by retrieving the DataSource from the stream and passing it as the argument to the Manager‘s createPlayer() method. Which option you use to create a Player depends on what is available. If you know the MediaLocator, then you’d use the first option. On the other hand, if you have the data source, you’d use the second option. Either way, the DataSource comes from the SessionManager that receives the realtime media from the network. For each stream received by the session manager, a separate player is used.

If you use a MediaLocator to construct the Player, the MediaLocator has the parameters of the RTP session and you can present only the first RTP stream detected in the session. If you want to play back multiple RTP streams in a session, you need to use the SessionManager and construct a Player for each ReceiveStream.

Conclusion

In this article, you have learned how similar JMF is to your stereo system. You have been introduced to JMF’s Player, CaptureDevice, DataSink, Manager, and more. We also examined the RTP for live media broadcasts. Due to the vastness of the API, however, it is impossible to include everything here, so you’ll be well served by examining the “JMF API Guide” and “JMF 2.0 API Specification,” both from java.sun.com (see Resources).

In Part 2 we will look at JMF-based code — the fun part of the series. You will learn, for example, how to play music and movie files in your applet or application with a few simple lines of code. Until then, rock on.

Budi Kurniawan teaches Java at the Centre for
Continuing Education at the University of Sydney. His book, Web
Tips and Techniques, to be published by APress, features
important topics that all serious Web developers should be familiar
with.

Source: www.infoworld.com