Coding a library for the creation of audio and video stream files for the internet using open source projects.
Kristaly, Dominic Mircea ; Sisak, Francisc ; Bujdei, Catalin 等
1. INTRODUCTION
The e-learning platforms destined for the internet developed
nowadays must include modern tools for communication between the
teacher/tutor and students.
The usual communication channels on such a platform are internal
messaging, e-mail, whiteboard, chat, blog and forum.
The communication can be:
1) one-way (only one actor can speak, the other/s will just
listen)--this type of communication is used when presenting a course in
the old fashioned manner; useful for transmiting new information in a
very short time (e-courses, whiteboard, blogs);
2) two-ways (an usual dialog, both actors can send messages)--used
in chatrooms, forums, e-mails; this type of communication can be used
succesfully to clear up aspects that are more difficult to understand
and/or when the actors cannot meet in person.
The latter category can be split into:
1) asynchronous communications: the period between the sending of
the message and the receiving of the answer is greater than in an usual,
face-to-face, conversation. The sender will post a message now and will
have to check later for the response. This type of communication is
profitable when the schedules of the actors cannot be synchronized or
when the response needs a thorough documentation. Some of the tools that
implement this type of communication are: e-mail, forums and blogs;
2) synchronous communications: the conversion progresses in a
ping-pong manner; the sender writes a message and the receiver answers
immediately. This type of communication is, in most cases, more
productive than the asynchronous one, but both actors must be available
in the same time. The tool that molds on this type of communication is
the chat.
With the development of Web 2.0 technologies, the users'
expectations grew: new tools are required to offer to both tutors and
students exciting, new means of communication, in order to improve the
teaching/learning process. Just to mention a few such tools:
* wikis--a collection of webpages and a mechanism that allows the
user to contribute;
* whiteboard--the tutor/teacher can explain by drawing on a virtual
blackboard;
* virtual classroom--a tool for audio-video conference with an
integrated chatroom (Kristaly et al, 2008.)
In teaching information technology related courses, for example,
the teacher must instruct the students how to use different software
applications, which is not very easy to do whitout the visualization of
the teacher's screen. For this, an application sharing tool is
priceless. This type of application usually runs on a one-way
communication channel, from the teacher to one or more students.
The sharing application tool that will be considered in this paper
allows the teacher to share with the students the image of a software
application running on his machine, by means of video streaming. The
application was developed for the e-learning platform of a European
Leonardo da Vinci project--VET TREND. The teacher starts a server
program and selects the process that he wants to broadcast. The clients
link to the server from an activex component that is embedded into a
webpage of the e-learning platform.
The problem with this particular application is that it
doesn't offer the possibility to record the activity of the teacher
for later viewing. A new module must be developed that allows the
recording of the activities in a multimedia format, playable on the
internet.
2. ASSESSING THE CHOICES
The first decision to be taken it is the file format in which the
image and sound will be stored. The chosen format must allow streaming
over the internet and must be easy to implement into the new module.
There are four formats that come to mind when streaming is
involved: Microsoft's Windows Media Video (WMV), Real Media Video
(RM, RAM), Apple's Quicktime (MOV), Adobe's Flash Video (FLV).
One very important aspect it's to keep the total cost of the
module as low as possible. This can be achieved by using open source
project and to minimize the changes in the already existing code of the
sharing application.
The considered e-learning platform includes a virtual classroom
developed using Adobe Flash technologies, able to play external Flash
files, so it comes natural to choose the FLV format for the recording.
The video encoding format chosen is Screen video, which uses the
zlib compression for the frames. For the audio data, the MP3 format was
chosed, for its great compression ratios.
The FLV files are transferred over RTMP connections with the Adobe
Flash Media Server[TM]. RTMP, or Real Time Messaging Protocol, is a
proprietary protocol developed by Adobe Systems for streaming audio,
video and data over the internet between a Flash player and a server. A
flash video player published for Flash Player 7 or above can also play
FLV files directly with MIME type video/x-flv (libflv, 2008).
The FLV format it's a good choice also for the fact that is
platform independent: all the data is stored in big-endian byte order.
Also, there are many free and open-source players for Flash movies. For
example, JW FLV media player for embedding a FLV stream into a webpage,
or FLV Player for a stand-alone player. A FLV file encodes synchronized
audio and video streams and consists of one single audio-video stream.
The second decision it's related to the programming language
that will be used to create this module. The sharing application being
written in Visual C++ 6.0, as a COM component, the FLV library was
developed in Visual C# 2008 as a DLL library.
3. LIBRARY'S ARCHITECTURE
The figure 1 illustrates the architecture of the library, as well
as the flow of data.
[FIGURE 1 OMITTED]
The sound is captured using the WaveIn API and fed into the Audio
Tag Generator, where is converted into MP3 format by the LAME Encoder.
The output of the generator consists of FLV audio tags.
The frames captured by the sharing application are prepared by the
video adapter for the Video Tag Generator which produces FLV video tags.
To compress the blocks in the Screen Video Packets, the ZLib is used
(Lame, 2008.)
The AVMUX creates the FLV stream and synchronizes the sound with
the image. A Stream recorder writes the generated FLV stream to a file
on a persistent storage device.
4. IMPLEMENTATION DETAILS
4.1 The FLV file format
The FLV file format it's very simple, as shown in the
following figure (Adobe Systems, 2008.)
Fig. 2. The FLV file format.
FLV Header
FLV Body
Previous Tag Size #0--UI32--Always 0
FLV Tag #1 (Audio, Video or Script Data Object)
Previous Tag Size #1--UI32--Size of Tag #1
FLV Tag #N (Audio, Video or Script Data Object)
Previous Tag Size #N--UI32--Size of Tag #N
The FLV tags are sections of the file that contain the audio, video
or script information. These tags also contain the timestamp data that
controls the way the file will be played.
The video data it is segmented and encapsulated in FLV Video tags.
The audio data is also segmented and encapsulated in FLV Audio tags.
After each tag, a 32 bits unsigned int stores its size (Adobe Systems,
2008.)
One video tag contains information about one frame:
* Frame type: specifies if the frame is a keyframe (the frame is
seekable and it is stored in its whole), inter frame (not all the frame
data it's store, only the changes from the previous frame),
disposable inter frame, generated keyframe or video info/command frame.
* Codec used: Sorenson H.263, ScreenVideo, On2VP6, AVC
* Video frame payload, different for each codec type.
For the developed library, the Screen video codec is used. This is
quite easy to implement and it is fast (Munoz, 2008.)
An audio tag defines what type of audio data is stored in the
attached sound data: sound format (PCM, MP3, Nellymoser, G.711, AAC or
device-specific sound), sound rate (5.5 kHz, 11 kHz, 22 kHz or 44 kHz),
sound size (8 bit or 16 bit), sound type (mono or stereo), sound data
(the payload in the format specified by the sound format). For the FLV
library, the audio data will be compressed using MP3; in this way, the
size of the file will be significantly smaller (Cardoso, 2008.)
4.2 Compressing the video data
The video data is encapsulated into a Screen Video Packet that
contains the width and height of the frame in pixels and the width and
height of a block (Adobe Systems, 2008.)
To compress the video data, the frame is split into blocks. Blocks
have width and height that range from 16 to 256 in multiples of 16. The
block size must not change except at a keyframe. Each block is
compressed with the open source ZLIB
If the frame is a inter frame, not all the blocks need to be
written, just the ones that changed from the previous frame.
4.3 Compressing the audio data
The audio data is captured from the primary sound device and it is
compressed in MP3 format using the LAME encoder (Lame, 2008.)
The main issue is how to synchronize the sound with the video; the
idea is to interlace the video and audio tags so none of them will load
faster than the corresponding one.
The MP3 data it's organized in frames that contain a fixed
number of samples (576 or 1152, depending on the sampling rate). The
frames must be written in the FLV audio tag entirely, whitout
fragmenting them, so a block of MP3 sound data always contains a number
of samples that is a multiple of 576 or 1152 (mp3-tech, 2008.)
The ideal number of samples can be determined by dividing the
sampling rate by the video frame rate. This number must be rounded to a
multiple of 576 or 1152, so full MP3 frames will be written to the file.
To keep the MP3 streaming in sync with the video playback, the MP3
frames must be distributed as uniform as possible among video frames and
to provide appropriate SeekSamples values in the audio data header.
5. CONCLUSION
The developed library can be use in any software project, not only
in the context presented in this paper.
The use of the FLV format is the best solution at present time for
streaming audio and video content over the internet: it can be very
easily embedded in webpages or Flash applications and offers very good
performances.
6. REFERENCES
Cardoso, I. C# MP3 Compressor, http://www.codeproject.com (May, 1st
2008)
Kristaly, D.M.; Sisak, F.; Truican I.; Moraru, S.A. & Sandu, F.
(2008). Web 2.0 technologies in web application development, Proceedings
of the 1st PETRA Conference--Workshop PTLIE, Athens, Greece
Munoz, I. A full-duplex audio player in C# using the waveIn/waveOut
APIs, http://www.codeproject.com (May, 1st 2008)
SWF Format Specification, Adobe Systems, April 2008, electronic
version
Video File Format Specification--Version 9, Adobe Systems, April
2008, electronic version
*** http://klaus.geekserver.net/libflv, Accessed on: 2008-05-03
*** http://lame.sourceforge.net, Accessed on: 2008-05-01
*** http://www.mp3-tech.org , Accessed on: 2008-04-23