Program multimedia with JMF, Part 1
Go multimedia by learning how the Java Media Framework compares to your stereo system
The Java Media Framework (JMF) is a large and versatile API used to process time-based media. However, JMF’s complexity can take weeks to understand. With that complexity in mind, this article introduces JMF the easy way. We start by drawing an analogy with a stereo system, and then proceed with discussions of JMF’s most common interfaces and classes. Finally, we’ll see working code that demonstrates part of the API’s capabilities.
This article, the first of a two-part series, focuses on the JMF architecture. Part 2 will focus mostly on the code that shows how to register the capture devices, and to play and capture audio and video data.
JMF, currently at version 2.1, is Sun’s initiative to bring time-based media processing to Java. Time-based media is data that changes meaningfully with respect to time, such as audio and video clips, MIDI sequences, and animations. Note that Sun recently announced it would release the JMF 2.1 source code under the Sun Community Source Licensing Program (SCSL). Also a complete reference implementation, JMF 2.1 will enable you to do anything imaginable with multimedia. Among others uses, JMF can:
- Play various multimedia files in a Java applet or application. The formats supported include AU, AVI, MIDI, MPEG, QuickTime, and WAV.
- Play streaming media from the Internet.
- Capture audio and video with your microphone and video camera, then store the data in a supported format.
- Process time-based media and change the content-type format.
- Transmit audio and video in realtime on the Internet.
- Broadcast live radio or television programs.
However, before you write a JMF application, you need to fully understand the JMF architecture, its interfaces, and its classes.
The JMF architecture
To easily understand the JMF architecture, take your stereo system as a comparison. When you play a Sarah Vaughan CD with your CD player, the CD provides the music data to the system. This data has been previously captured using microphones and other devices in the recording studio. The microphone serves as an audio capture device. The CD itself is a data source to the stereo system.
The CD player outputs the music signal to the speaker — the output device. However, from the CD player we can also plug in an earphone. In this case, the earphone acts as the output device.
JMF uses the same model. As you read on, you will come across terms such as:
- Data source
- Capture device
- Player
- Processor
- DataSink
- Format
- Manager
Let’s look at these terms in more detail.
Data source
A data source encapsulates the media stream much like a music CD. In JMF, a DataSource
object represents the audio media, video media, or a combination of the two. A DataSource
can be a file or an incoming stream from the Internet. The good thing about this class is, once you determine its location or protocol, the DataSource
encapsulates both the media location, and the protocol and software used to deliver the media. Once created, a DataSource
can be fed into a Player
to be rendered, with the Player
unconcerned about where the DataSource
originated or what was its original form.
Media data can be obtained from various sources, such as local or network files, or live Internet broadcasts. As such, DataSource
s can be classified according to how a data transfer initiates:
- Pull data source: The client initiates the data transfer and controls the data flow from the source. HTTP and FILE serve as examples of established protocols for this type of data.
- Push data source: The server initiates the data transfer and controls the data flow from a push data source. Push data source examples include broadcast media and video on demand.
As we will discuss in Part 2 of this series, several data sources can be combined into one. For example, if you are capturing a live scene, chances are you have two data sources: audio and video. In that situation, you might want to combine these two for easier control.
Capture device
A capture device represents the hardware you use to capture data, such as a microphone, a still camera, or a video camera. Captured media data can be fed into a Player
to be rendered, processed to convert the data into another format, or stored for future use.
Capture devices can be categorized as either push or pull sources. With a pull source, the user controls when to capture an image. As an example, think of a still camera where a user clicks a button to take the shot. In contrast, a microphone acts as a push source because it continuously provides a stream of audio data.
Player
A Player
takes as input a stream of audio or video data and renders it to a speaker or a screen, much like a CD player reads a CD and outputs music to the speaker. A Player
can have states, which exist naturally because a Player
has to prepare itself and its data source before it can start playing the media. To understand this, insert a CD into your stereo and play the fourth song on the CD. What would happen? The CD player does not instantly play the song. It first has to search the track where the fourth song begins and do some other preparations. After about half a second (depending on your CD player), you start to hear the music. Likewise, the JMF Player
must do some preparation before you can hear the audio or see the video. In normal operations, a Player
steps through each state until it reaches the final state. JMF defines six states in a Player
:
- Unrealized: In this state, the
Player
object has been instantiated. Like a newborn baby who does not yet recognize its environment, a newly instantiatedPlayer
does not yet know anything about its media. - Realizing: A
Player
moves from the unrealized state to the realizing state when you call thePlayer
‘srealize()
method. In the realizing state, thePlayer
is in the process of determining its resource requirements. A realizingPlayer
often downloads assets over the network. - Realized: Transitioning from the realizing state, the
Player
comes into the realized state. In this state thePlayer
knows what resources it needs and has information about the type of media it is to present. It can also provide visual components and controls, and its connections to other objects in the system are in place. - Prefetching: When the
prefetch()
method is called, aPlayer
moves from the realized state into the prefetching state. A prefetchingPlayer
is preparing to present its media. During this phase, thePlayer
preloads its media data, obtains exclusive-use resources, and does whatever else is needed to play the media data. - Prefetched: The state where the
Player
has finished prefetching media data — it’s ready to start. - Started: This state is entered when you call the
start()
method. ThePlayer
is now ready to present the media data.
Processor
A Processor
is a type of Player
. In the JMF API, a Processor
interface extends Player
. As such, a Processor
supports the same presentation controls as a Player
. Unlike a Player
, though, a Processor
has control over what processing is performed on the input media stream.
In addition to rendering a data source, a Processor
can also output media data through a DataSource
so it can be presented by another Player
or Processor
, further processed by another Processor
, or converted to some other format.
Besides the six aforementioned Player
states, a Processor
includes two additional states that occur before the Processor
enters the realizing state but after the unrealized state:
- Configuring: A
Processor
enters the configuring state from the unrealized state when theconfigure()
method is called. AProcessor
exists in the configuring state when it connects to theDataSource
, demultiplexes the input stream, and accesses information about the format of the input data. - Configured: From the configuring state, a
Processor
moves into the configured state when it is connected to theDataSource
and the data format has been determined.
As with a Player
, a Processor
transitions to the realized state when the realize()
method is called.
DataSink
The DataSink
is a base interface for objects that read media content delivered by a DataSource
and render the media to some destination. As an example DataSink
, consider a file-writer object that stores the media in a file.
Format
A Format
object represents an object’s exact media format. The format itself carries no encoding-specific parameters or global-timing information; it describes the format’s encoding name and the type of data the format requires. Format
subclasses include AudioFormat
and VideoFormat
. In turn, VideoFormat
contains six direct subclasses:
H261Format
H263Format
IndexedColorFormat
JPEGFormat
RGBFormat
YUVFormat
Manager
A manager, an intermediary object, integrates implementations of key interfaces that can be used seamlessly with existing classes. No real-world equivalent exists in the stereo system, but you can imagine a manager as a versatile object that can interface two different objects. For example, with Manager
you can create a Player
from a DataSource
. JMF offers four managers:
Manager
: UseManager
to createPlayers
,Processors
,DataSources
, andDataSinks
. For example, if you want to render aDataSource
, you can useManager
to create aPlayer
for it.PackageManager
: This manager maintains a registry of packages that contain JMF classes, such as customPlayers
,Processors
,DataSources
, andDataSinks
.CaptureDeviceManager
: This manager maintains a registry of available capture devices.PlugInManager
: This manager maintains a registry of available JMF plug-in processing components.
Create a Player
With JMF multimedia programming, one of your most important tasks is to create a Player
. You create a Player
by calling the Manager
‘s createPlayer()
method. The Manager
uses the URL of the media or MediaLocator
that you specify to create an appropriate Player
. Once you have a Player
, you can obtain the Player
object’s visual components — where a Player
presents the visual representation of its media. You can then add these visual components to your application window or applet.
To display a Player
object’s visual component, you must:
- Obtain the visual component by calling the
getVisualComponent()
method - Add the visual component to the application window or applet
A Player
can also include a control panel with buttons to start, stop, and pause the media stream, as well as control the volume, just like the similar buttons on your CD player.
Many of the Player
‘s methods can be called only when the Player
is in the realized state. To guarantee that it is in this state, you can use the Manager
‘s createRealizedPlayer()
method to create the Player
. This method provides a convenient way to create and realize a Player
in a single step. When it is called, it blocks until the Player
is realized.
Further, start()
can be invoked after a Player
is created but before it reaches the prefetched state. start()
attempts to transition the Player
to the started state from whatever state it’s currently in. For example, you can immediately call the start()
method after a Player
is instantiated. The start()
method will then implicitly call all necessary methods to bring the Player
into the started state.
Capture media data
Media capture is another important task in JMF programming. You can capture media data using a capture device such as a microphone or a video camera. It can then be processed and rendered, or stored in a media format. To capture media data, you need to:
- Locate the capture device you want to use by querying the
CaptureDeviceManager
- Obtain a
CaptureDeviceInfo
object for the device - Get a
MediaLocator
from theCaptureDeviceInfo
object and use it to create aDataSource
- Create either a
Player
or aProcessor
using theDataSource
- Start the
Player
orProcessor
to begin the capture process
You use the CaptureDeviceManager
to access capture devices available on the system. This manager acts as the central registry for all capture devices available to JMF. You can obtain an available device list by calling the getDeviceList()
method. A capture device is represented by a CaptureDeviceInfo
object. You use the CaptureDeviceManager
‘s getDevice()
method to get the CaptureDeviceInfo
for a particular capture device.
To use the capture device to capture media data, you then need to get the device’s MediaLocator
from its CaptureDeviceInfo
object. You can either use this MediaLocator
to construct a Player
or a Processor
directly, or use the MediaLocator
to construct a DataSource
that you can use as the input to a Player
or Processor
. Use the Player
‘s or Processor
‘s start()
method to initiate the capture process.
Realtime multimedia processing
With JMF you can also send or receive a live media broadcast, such as live radio and television broadcasts, or realtime teleconferences over the Internet or an intranet.
One realtime media transport characteristic that differs from accessing static data: the realtime protocol does not guarantee all packets will arrive safely. More importantly, how do you make up for the lost data and ensure no large delay in receiving the data? When working with realtime media streams, you play the media data without waiting for the complete stream data to download. Likewise, transmitting across the Internet in realtime requires a high-bandwidth network so the recipient can play the media data continuously.
Realtime traffic needs its own protocol to transfer packets of realtime media streams reliably. The Internet Engineering Task Force (IETF) has devised the Real-Time Transport Protocol (RTP) suitable for applications that transmit realtime data such as audio, video, or simulation data over multicast or unicast network services. RTP, network- and transport-protocol independent, is often used over User Datagram Protocol (UDP).
There’s no guarantee RTP data packets will arrive in the order in which they were sent. In fact, there’s no guarantee they will arrive at all. It’s up to the receiver to reconstruct the sender’s packet sequence and detect lost packets using the information provided in the packet header. Therefore, in addition to RTP, there is also the Real-Time Transport Control Protocol (RTCP) that addresses resource reservation and guarantees quality-of-service for realtime services. RTCP also allows data delivery monitoring in a manner scalable to large multicast networks.
Applications that use RTP can be categorized into RTP servers (applications that need to send data over the network) and RTP clients (those that need to receive data from the network). However, some applications, such as teleconferencing, establish RTP sessions to capture and transmit, as well as receive data.
JMF provides the APIs defined in the javax.media.rtp
, javax.media.rtp.event
, and javax.media.rtp.rtcp
packages for RTP stream playback and transmission. JMF RTP APIs can work seamlessly with JMF’s capture device, players, processors, and processing capabilities. In addition to the four managers described above, there is another manager used to coordinate a RTP session: SessionManager
. It keeps track of the session participants and the streams being transmitted, as well as handles the RTCP control channel and supports RTCP for both senders and receivers.
Transmit RTP media streams
There are two ways to transmit RTP streams:
- The simplest way to transmit RTP data: use a
MediaLocator
with the RTP session parameters to construct an RTPDataSink
by calling theManager
object’screateDataSink()
method. - You can also use
SessionManager
to create send streams for the content and control the transmission. After you retrieve the outputDataSource
from aProcessor
, you call theSessionManager
‘screateSendStream()
andstartSession()
methods.
Receive and play RTP media streams
Receiving and playing RTP media streams normally come in one package. Just like static media data, you create a Player
using the Manager
object’s createPlayer()
method, passing a MediaLocator
as the argument. As an alternative, you can construct a Player
object by retrieving the DataSource
from the stream and passing it as the argument to the Manager
‘s createPlayer()
method. Which option you use to create a Player
depends on what is available. If you know the MediaLocator
, then you’d use the first option. On the other hand, if you have the data source, you’d use the second option. Either way, the DataSource
comes from the SessionManager
that receives the realtime media from the network. For each stream received by the session manager, a separate player is used.
If you use a MediaLocator
to construct the Player
, the MediaLocator
has the parameters of the RTP session and you can present only the first RTP stream detected in the session. If you want to play back multiple RTP streams in a session, you need to use the SessionManager
and construct a Player
for each ReceiveStream
.
Conclusion
In this article, you have learned how similar JMF is to your stereo system. You have been introduced to JMF’s Player
, CaptureDevice
, DataSink
, Manager
, and more. We also examined the RTP for live media broadcasts. Due to the vastness of the API, however, it is impossible to include everything here, so you’ll be well served by examining the “JMF API Guide” and “JMF 2.0 API Specification,” both from java.sun.com (see Resources).
In Part 2 we will look at JMF-based code — the fun part of the series. You will learn, for example, how to play music and movie files in your applet or application with a few simple lines of code. Until then, rock on.