Last modified: Mon Sep 8 1997

Multimedia server taxonomy and evaluation

by Tobias Öbrink , Department of Teleinformatics, KTH

A presentation of activities in WP7 for MERCI meeting in INRIA Sophia Antipolis April 1997.


Table of Contents


1 Motivation

Digital audio, video and computer supported cooperation tools are HYPE. VoD-, and Multimedia Server servers are sprouting up everywhere like weeds. There are several presentation languages and distributed programming languages waiting for streaming material to display in flashy shows on ITV, networked kiosks, home computers, you name it. Every platform vendor have presented their own high performance solution, and a lot of heavyweight players like f.ex. Microsoft, Netscape and Progressive Networks have launched their own products. The IETF and ITU-T works hard to keep up with the current avalanche-like development and tries to cooperate with eachother as well as with the commercial vendors to produce working standards.

During the last year I have been collecting information about research projects and commercial development efforts as well as the standardisation efforts related to networked digital multimedia in general, and MoD in particular. I have a lot of information, but no common structure in which to present it. When I look at all these announcements of new products and all new research papers on Multimedia Server systems, I wish I had some classification scheme to use in comparing the different solutions. Some general taxonomy that is not connected to a specific product or research prototype.

2 Workplan

We will develop a taxonomy over Multimedia Server-related elements to help in comparing different Multimedia Server solutions with regard to

In parallell to this effort, I will continue to collect information about -, and test Multimedia servers to produce a survey over existing solutions using the above taxonomy.It will be an iterative process where the two parts will interactively complement each other.

The survey will ofcourse relatively soon become outdated, but my hope is that the taxonomy will have a more lasting value.

3 Current status

3.1 Taxonomy

A preliminary draft of a taxonomy can be found in Appendix A. Here's a short summary:

3.2 Multimedia server survey

A lot of information, but so far in a quite unstructured state and a lot of dangling loose ends can be found in Appendix B. Here's a listing of the Multimedia servers included so far:

1 Terminology

1.1 Media type

An digital ecoding of some information, either digitized or computer-generated. The Media Types are

1.2 Stream

A "stream" of packets containing data encoding of one media type.

1.3 Participant

A collection of streams originating from and operated by a "participant" .

1.4 Source

The host from which a specific Stream origins. There can be several Participants sending from the same Source.

1.5 Session

One or more Participants participating in a conference, lecture, call

2 Standards

2.1 Coding and Compression

Some Multimedia servers have limitations as to the encoding format of the Streams it plays.

2.2 Session Control and Signalling

RTSP - Replay session control and signalling. Either interleaved with the playback stream or separate.

2.2.1 Invitation of a Media Server to a conference

SIP - Either for recording or Indirect playback

2.2.2 Direct Playback initiation

2.3 Transport

Playback Streams by IP multicast and unicast, both reliable and unreliable. RTP/RTCP, UDP, TCP, SRM. nt, et.c.

Session Description by HTTP, FTP, SIP, SAP or some other way.

2.4 User Interface

Many Multimedia servers use Tcl/Tk-based GUI, others use WWW

2.5 Interoperability support

IP, RTP, H.32x - Standards for ensuring interoperation over different networks, platforms, codecs

2.6 Security

Insecure network, end-points, proxies and mirrors.

3 Functionality

3.1 Main field of application

Lecture-on-demand, news-on-demand, movie-on-demand, conference recorder, network-based multimedia presentation.

3.2 Extensibility

API's, modules for extending the servers Functionality, Interoperability, Security.

3.3 Playback

3.3.1 Playback types

Direct playback - A new session is created by the Multimedia-server for playback.

Indirect playback - The Multimedia-server joins an already existing conference.

3.3.2 Functions

Search and Browse recorded Sessions, Participants, Streams.

Browse active session playbacks.

Start, kill, set speed, direction, jump to bookmark.

Visual Fast forward and Rewind by creating new indexes

Floor control for user interaction.

Explicitly invite a list of participants

Set a minimal playback session scope (TTL)

Replicate original sender information in the playback session.

3.3.3 Playback timing

3.4 Recording

3.4.1 User Functions

Browse active sessions, participants, streams and "point-and-click" those you want to record (Using RTCP or SDP,SAP,SIP, IPheader)

Delayed recording. Instruct the Multimedia server to start recording at a certain time.

3.4.2 RTP Functions

Use RTP sequence numbers to reconstruct incoming streams by

3.5 Editing

3.5.1 Functions

Edit recorded material.

Filter/merge and other post-processing of Sessions, Participants, Streams and sub-packet level (block, sample. object)

Orchestration (creating "artificial" time-stamps) use the "atomic" time steps of the level 0 index or creating new Streams based on processing on sub-packet level.

Analysis tools to help editing and statistics.

Editing quality and other aspects of media coding.

Eliminate initial silence in Streams, Participants, Sessions.

Create layered encodings from non-layered originals.

3.6 Admission control

The Multimedia server should have some admission control scheme for access to the server.

The User interaction channel should be secured.

4 System design

Client/Server with powerful, heavy servers with lots of resources, minimal client programs and playback tools.

Distributed Multimedia server system with lots of small distributed servers using a common frontend or a lot of frontends using a common communication channel.

4.1 Client

4.1.1 Supported Playback tools

Which applications are compatible for use to playback replayed streams.

4.1.2 User interaction client

Handles all direct user interaction with the Multimedia server

4.2 Server

4.2.1 Multicast address allocation

For Direct playback to a multicast address we need to assign one or more unique multicast address/port pairs. Such a mechanism is supported by

4.2.2 Inviting Participants

The Playback initiator optionally supplies a list of Participants to receive the Playback. These can be invited using the Session Invitation Protocol provided they have tools handling this protocol.

In the case no support for SIP can be depended upon, a Session Description Protocol packet can be generated and sent to the Playback initiator. It's then up to the Playback initiator to contact the other Participants.

4.2.3 Setting scope of Playback

Using the list of Participants given by the Playback initiator a minimal scope (TTL) for the Playback session can be computed (by f.ex. using traceroute)

Using a Session Description Protocol packet given by the Playback initiator, the same TTL can be used for the Playback session.

Optionally a TTL can be given by the Playback initiator.

4.2.4 Multiplexing and Demultiplexing

of Streams, Participants, Sources, and Sessions. Using unique sender ID's in RTCP sender reports or UDP/IP IPaddress/port pairs and payload-specific information.

4.2.5 Replicating original sender information

Sender information exist in the vat protocol, and RTP. This information or part of it can be played back to create a more or less genuine replay of the original Session also providing awareness-related- and other information.

4.2.6 Reliable Multicast support

To be able to correctly record a Session including tools using reliable multicast, the Multimedia-server must Participate actively. F.ex request retransmission of lost packets. This means that the Multimedia server must support the protocols used by the applications.

4.2.7 A Single point of contact

A single entrypoint for Playback, Recording, Editing, Admission control.

A common interface for all functionality (except playback?), preferably graphical and WWW-based.

4.3 Communication

4.3.1 Multicasted User Interaction

High interactivity reduce scalability gains.

How to manage multicast addresses to take advantage of that a few users are looking at the same clip.

The scalability gain appears close to the receivers/sender.

Handling multicast addresses is (so far) more computationally expensive than ordinary unicast dito.

In the case that we have a "moderator" and a bunch of "listeners" we still have use of multicast delivery.

4.3.2 Reliable Multicast

Realtime-reliability tradeoffs based on retransmission is "unbalanced" in a 1:N-system.

Realtime-reliability tradeoffs based on buffering at the receiver side seems to help a bit.

A lower-level (link or network layer) reliability guarantee (QoS) seems to be the best alternative.

4.3.3 Limit scope of a Multicasted Session

Use TTL-scope. The Playback initiator controls the size of the potential audience by setting TTL to a suitable value.

Use encryption (which will affect real-time performance).

4.3.4 Invitation of a Media Server to a conference

Either for recording or Indirect playback. Uses SIP.

4.3.5 Heterogeneous networks

Internet is the target. therefore support for layered coding and transmission is needed.

4.4 Storage strategy

4.4.1 Hierarchical storage

Store data by

and use indexes for arbitrary access of streams

4.4.2 Data stored on Streams

4.4.3 Data stored on Participant

4.4.4 Data stored on Session

4.4.5 Indexing

4.4.6 Database Management

Different solutions are better for different media-types

4.5 Hardware solution

If the Multimedia server depends on certain hardware configuration(s).

5 Performance

5.1 Network related

5.2 Media related

5.3 Hardware related

5.4 User friendliness

Consumers are mostly humans, therefore we get the following constraints:

With these constraints in mind we see that a good Multimedia-server should behave as following:

  1. Have a relatively long initial delay before replay when data is buffered at the receiver
  2. Prioritize the replay of media at the receiver in the following order: Video, Audio, Data
  3. Prioritize the transfer and reception of media in the following order: Interactions, Audio, Data, Video.
  4. Buffer a little Audio, and much Data before replay at the receiver.

1 SGI WebFORCE and MediaBase

Integrated Web-media servers that deliver interactive, real-time, high-quality MPEG1 and MPEG2 video and audio streams to Web clients via IP and ATM networks.

28.8Kbps to 8Mbps bandwidth for Internet and intranet applications Cosmo MediaBase 2.0 will run on the Origin(TM) systems. http://www.sgi.com/Products/WebFORCE/Products/Mediabase/

2 XMovie project, University of Mannheim

see http://www.informatik.uni-mannheim.de/~keller/XMovie/XMovie.html

A CMClient (Continuous Media client) in the Xserver-machine using MTP (Movie Transmission Protocol) on TCP and UDP to communicate with a Video server that's located somewhere else. The X11 protocol is extended to handle continuous media streams. The CMClient uses a shared memory buffer and the extended X11 to c ommunicate with the Xserver. The CMClient uses AF (Audio File) protocol and the extended X11 to communicate with the AudioFile server that also resides in the Xserver-machine. The playback application uses MCAM (Movie Control, Access and Management) protocol to communicate with the CMClient.

The Xserver doesn't have to reside in the same machine as the playback application and the X11 need less bandwidth for CM. This is a good solution for viewing video streams on dumb Xterminals (No disk, very little memory).

Of course neither the local network nor the Xserver machine can cope with more than 2 - 3 CMClients at a time and a few more player applications (watching the same stream), because all decoding have to be done in the CMClient, the Xserver have to mediate the full video frames to the playback application. And the AudioFile server?

But can this extended X11 support sharing of X applications?

3 Voyager project, Argonne National Laboratory, Illinois

The Voyager system has a distributed client-server architecture, where the server side consists of a set of cgi-scripts for session browsing, recording, and playback; a pair of lowlevel daemons for record and playback; a Perl-based architecture for gluing the layers together (using nPerl for interdaemon communication (1)), and a relational database to act as a central repository of informationfor the system as a whole. The design goal is to serve large numbers of high-bandwidth streams (nominal 5Mbps or more) both in record and playback mode.

We use the IBM TigerShark multimedia filesystem on the server machine, in order to do striping across multiple disk server nodes in the SP2 we're using as a server.

On the client side we just use the mbone tools -- vic/vat or the Precept tools.

We demonstrated an earlier version of Voyager at Supercomputing '95 -- see http://www.iway.org/video/index.html for a little info; information on the current version is available at http://voyager.mcs.anl.gov/Voyager/.

(1) Nexus/Perl -- see http://www.mcs.anl.gov/nexus/nperl/

4 MBone-VCR on demand Service, University of Mannheim

A while ago I implemented one of the recording and playback applications on the MBone, the MBone-VCR, but unfortunately it is a little outdated since I didn't have enouguh time to look after it after I left ICSI (where I developed it). Anyhow, some ideas were still in my mind and together with two students in Mannheim we now could start a new project a couple of weeks ago that we called: "MBone-VCR on demand Service"

Well, we just started, so there are lots of things we have to investigate further, but since John brought the issue up, I thought this might be a good point to throw our ideas in the pot as well...

A brief overview of our project goals so far:

There are a lot more things to mention but I think you get the idea and we have enough to talk and discuss about. We are at the very beginning of the project and would be happy to have lively discussion and new ideas that we may include in the architecture.

Contact:

Wieland Holfelder

University of Mannheim Praktische Informatik IV

Email: whd@pi4.informatik.uni-mannheim.de L 15,16

Fax : +49-621-292-5745 68131 Mannheim

Phone: +49-621-292-3300 Germany

http://www.informatik.uni-mannheim.de/~whd

5 Oracle Video Client/Server

This is a short summary from the Oracle Video server manual. We haven't been able to verify the information in the manual yet because of lack of funds and personell to do the installation and run the tests.

5.1 Oracle Video Client

Platform:

PC/Windows 3.11
API:

16bit OLE API, for ex. Visual Basic 4.0.
Format:

MPEG-1
Comment:

Oracle Media Net uses UDP/IP, so let's imp a new client

5.2 Oracle Video Server

Platform:

Sun, HP/Unix
API:

System Services API, C
Formats:

MPEG-1 implemented
Supports storing of Quicktime, MPEG-1 & 2, TrueMotion, DVI, JPEG, et.c
Can be expanded through System Services API, C
Features:

Multi-user,
Striping & RAID Disk access
Comments:

I like the disk handling. The API is bearable.
Performance not affected by disk fragmentation.

5.3 Oracle Media Net

Overview:

API:

Media Net RPC, Media Net XDR, Semaphores, Media Net Transport Layer API, System Services API, C
Protokoll:

UDP/IP
Possible to expand to RTP/UDP/IP for ex.
Features:

Oracle Media Net Logger Process
Oracle Media Net Name Server (Servers and clients have their own unique name space built on top of underlying transport layer(UDP))
Oracle Media Net Address Server (Maps Media Net addresses to corresponding
physical addresses (UDP == physical address?))
Oracle Media Net Process Server (Typical portmap-process)
Connection Service (???)

5.4 Interactive Applications Objects SDK Object Reference

A typical convenience functions library in C. Cannot see any objects though.

5.5 Other comments and questions

1) Total Bandwidth demand?

bw_demand = #streams * average_bitrate * 1.2
^
Adm. loss. junk
2) FDDI network interface

30 streams @ 2 Mbps
40 streams @ 1.5 Mbps
3) Max output from Sparc center 1000

4st FDDI-cards:
120 streams @ 2 Mbps => 240 Mbps
bw_demand = 240 * 1.2 = 288 Mbps
Needs approx. 256 MB RAM?
4) Media clips and all that

Is supported.
5) Videopump

Max # video streams = 20 (MPEG version)
Max # video streams from Oracle Video Server, an implementation of
Oracle Media Server I think. We probably can write own media pump
programs for the Media Server that can encode/transmit other than
MPEG over UDP
6) RTP MPEG-1 & 2 payloads defined

7) RTP JPEG payloads defined

JPEG also supported in the System Services API of the Media Server
according to the documentation.
8) RTP over UDP/IP

Because RTP has

Can use System Services API to encapsulate any format in RTP, but how handle RTCP messaging?
9) Do students have to run their own Mrouted at home to be able to receive?

10) How many ISDN BRI or Modem 28.8 connections on 240 Mbps?

11) Media Net RPC

Is it implemented or not?

12) Oracle Media Server usability

Oracle Video Server and Oracle Media Net.
Can use on LAN for a few streams, but we have to modify to other format
and bitrates.
For Internet use and ISDN/Modem users we need alot of modifications.
Current Server user- and programmer interface is horrible.
Disk- and request handling in the server is good.

5.6 Sun MediaCenter

KTH/Teleinformatics have got a Sun MediaCenter 20 for evaluation. This is another video server platform equipped with three * 100Mbps Ethernet and ability to stream MPEG over UDP/IP. The client run on Sun/Solaris and PC.

http://www.sun.com/products-n-solutions/hw/servers/smc_external.html

5.7 The RealMedia Model, Progressive Networks

We have six elements in our current model: Servers, Splitters, License Servers, System Managers, Players, and Encoders. These elements form a tree heirarchy for any given stream. Each stream has an identifier (URL) which identifies the server originating it and the object in that server's namespace (the RealAudio file). Both live and on-demand streams (the former originating from live encoders and the latter originating from files) appear in a single heirarchical namespace maintained by the server.

The protocol used, RTSP, is currently under standardisation in the IETF MMUSIC WG.

See also http://www.real.com/products/server/index.html for a list over Streaming Media Servers.

5.7.1 Progressive Networks RealPlayer

Stereo audio at 28.8, near-CD quality at higher bitrates, AM-quality audio at 14.4. Newscast-quality video at 28.8 and full-motion at higher bitrates.

Player currently available for Windows95/NT and Macintosh PowerPC. In the future also available for

Linux 1.2.x with an OSS (Voxware) 3.0 sound driver

Linux 2.0.x with an OSS 3.5.x sound driver on a Pentium machine

Solaris 2.5

Solaris 2.4

SunOS 4.1.x

Irix 5.3

FreeBSD 2.0

See http://www.real.com/sysreq.html for detailed Player system requirements.

5.7.2 RealVideo Encoder 1.0 beta 1

is currently available for Windows 95/NT. RealVideo Encoder plug-in for Adobe Premiere is available for Macintosh Power PC, and comes with the Windows encoder.

See http://www.real.com/products/encoder/realvideo/sysreq.html for detailed RealVideo Encoder system requirements.

5.7.3 RealAudio Encoder 3.0

is currently available for Windows 95/NT, Macintosh PowerPC, Linux, HP/UX, AIX, FreeBSD, Irix and Digital Unix platforms. RealAudio Encoder 3.0 beta is currently available for Solaris 2.5. RealAudio Encoder 2.0 is still available for users with a Macintosh 68040 with FPU, and for users with SunOS 4.1. The RealAudio Xtras for SoundEdit 16 are currently available only with RealAudio Encoder 2.0.

See http://www.real.com/products/encoder/realaudio/sysreq.html for detailed RealAudio Encoder system requirements.

5.8 StreamWorks, Xing Technologies

You first need a StreamWorks Encoder/Transmitter. It is a PC box we ship to your address DHL. It comes either in a PC Box form or a rack-mount configuration. It enables you to make your source streams from standard AV inputs (RCA, etc.) A full audio/video system costs $6,500. It is fully PAL compatible. The Audio and Video Transmitter is the one you would probably utilize for your audio and video broadcasts.

In order to send streams across the internet or LANs or WANs one needs our StreamWorks Server, which is software that you download from our site here that sits on your web server machine, UNIX box, or whatever you are currently using. Our products handle everything from 8.5 kbps (which is AM quality audio (mono) all the way up to 112 kbps (which handles CD quality audio, or 30 frames per second of video MPEG-1 (VHS quality) with AM type audio on top of that.

5.9 VXtreme client/server

A PC application based on a proprietary codec running over modem speeds (14.4 kbps, 28.8 kbps).

5.10 The multicast Media-On-Demand/mMOD system, Luleå Tekniska Universitet

http://ctrl.cdt.luth.se/~peppar/progs/mMOD/

MBone audio, video, whiteboard, NetText, mWeb or any other UDP multicast or unicast session can be recorded and played back in 2 modes; raw UDP or parsed RTP. In parsed RTP recording it checks for duplicates & out-of-order packets and rearrange packets based on sequence numbers. All packets get a timestamp when recorded.

The mMOD system consists of distributed stand-alone VCR-programs and distributed Web controller-interfaces.

Messages are sent on a special multicast/port-pair or TCP/port.

Multicasted messages:

alive_message = All active VCR-programs respond with its unique ID, session name, owner, playing/recording, media and its control-port number.
Multicasted or direct messages:

stop, pause, continue, skip, rewind, jump
No group control of a multicasted session possible. Only the owner controls a session.

The mMOD is completely written in Java v1.0.2 and is available for SUN/Solaris and Windows95/NT4.0.

5.10.1 The VCR-program

is a stand-alone program and also includes a minimal HTTP-server listening on the control-port. This enables 1) a Java-client that is directly connected to the VCR 2) use of the mWeb slide-show presentation feature

5.10.2 The Web-controller

uses a MIME-type application/x-sdp and SDP-files for launching appropriate helper applications on the client-side. The Web-controller has CGI-based and Java user interfaces.

5.11 VivoActive from Vivo Software

Windows 95 or NT, PowerMac. H.263 video compression and G.723 audio compression. Once the original .AVI file is converted to a .VIV file, the Web site developer simply uses the HTML <EMBED> command to embed the VIVO file into the Web page, the same way he or she would embed a .GIF or .JPG file. Free VivoActive Players for a variety of browsers and platforms (plug-ins for Netscape Navigator on the Macintosh, Windows 95, Window NT, Windows 3.1, and an ActiveX control for Microsoft Internet Explorer on Windows 95 or NT). The video is streamed from the Web server via HTTP over TCP, just like the rest of the page.

5.12 IP/TV from Precept Software

http://hydra.precept.com/products/ip-tv.htm

IP/TV uses RTP over standard IP Multicast to deliver full-screen, full-motion video to desktop PCs over private IP routed networks and the Internet. IP/TV has three elements:

5.13 NetShow 2.0 from Microsoft

Microsoft NetShow 2.0 provides an easy, powerful way to stream multimedia across the Internet and intranets. It is a complete, high-performance system for broadcasting live and on-demand audio, video and mixed multimedia. NetShow technology is built on open industry standards. NetShow supports the ASF file format, which is capable of using media in MOV, AVI, WAV, BMP, DIB, JPEG and other standard formats. NetShow also supports the H.263 video standard, as well as transport and protocol standards such as UDP/IP, TCP/IP, HTTP, RTP and IP multicast. NetShow 2.0 beta software is available now without charge from the Microsoft Web site at http://www.microsoft.com/netshow/

5.14 Video Conference Recorder (VCR) from UCL

Stuart Clayman, A Video Conference Recorder - Design and implementation, Department of Computer Science, University College London, December 1995.

Record, and Play UDP, TCP and IP packets. Telnet-based user interface.

5.15 VDOLive from VDONet Corporation

No data yet

5.16 Vosaic

No data yet


Maintained by Tobias Öbrink