文章基本信息

标题：Coding a library for the creation of audio and video stream files for the internet using open source projects.
作者：Kristaly, Dominic Mircea ; Sisak, Francisc ; Bujdei, Catalin 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2008
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：The e-learning platforms destined for the internet developed nowadays must include modern tools for communication between the teacher/tutor and students.
关键词：Internet;Libraries;Open source software;Public software;Streaming media

Coding a library for the creation of audio and video stream files for the internet using open source projects.

Kristaly, Dominic Mircea ; Sisak, Francisc ; Bujdei, Catalin 等

1. INTRODUCTION

The e-learning platforms destined for the internet developed nowadays must include modern tools for communication between the teacher/tutor and students.

The usual communication channels on such a platform are internal messaging, e-mail, whiteboard, chat, blog and forum.

The communication can be:

1) one-way (only one actor can speak, the other/s will just listen)--this type of communication is used when presenting a course in the old fashioned manner; useful for transmiting new information in a very short time (e-courses, whiteboard, blogs);

2) two-ways (an usual dialog, both actors can send messages)--used in chatrooms, forums, e-mails; this type of communication can be used succesfully to clear up aspects that are more difficult to understand and/or when the actors cannot meet in person.

The latter category can be split into:

1) asynchronous communications: the period between the sending of the message and the receiving of the answer is greater than in an usual, face-to-face, conversation. The sender will post a message now and will have to check later for the response. This type of communication is profitable when the schedules of the actors cannot be synchronized or when the response needs a thorough documentation. Some of the tools that implement this type of communication are: e-mail, forums and blogs;

2) synchronous communications: the conversion progresses in a ping-pong manner; the sender writes a message and the receiver answers immediately. This type of communication is, in most cases, more productive than the asynchronous one, but both actors must be available in the same time. The tool that molds on this type of communication is the chat.

With the development of Web 2.0 technologies, the users' expectations grew: new tools are required to offer to both tutors and students exciting, new means of communication, in order to improve the teaching/learning process. Just to mention a few such tools:

* wikis--a collection of webpages and a mechanism that allows the user to contribute;

* whiteboard--the tutor/teacher can explain by drawing on a virtual blackboard;

* virtual classroom--a tool for audio-video conference with an integrated chatroom (Kristaly et al, 2008.)

In teaching information technology related courses, for example, the teacher must instruct the students how to use different software applications, which is not very easy to do whitout the visualization of the teacher's screen. For this, an application sharing tool is priceless. This type of application usually runs on a one-way communication channel, from the teacher to one or more students.

The sharing application tool that will be considered in this paper allows the teacher to share with the students the image of a software application running on his machine, by means of video streaming. The application was developed for the e-learning platform of a European Leonardo da Vinci project--VET TREND. The teacher starts a server program and selects the process that he wants to broadcast. The clients link to the server from an activex component that is embedded into a webpage of the e-learning platform.

The problem with this particular application is that it doesn't offer the possibility to record the activity of the teacher for later viewing. A new module must be developed that allows the recording of the activities in a multimedia format, playable on the internet.

2. ASSESSING THE CHOICES

The first decision to be taken it is the file format in which the image and sound will be stored. The chosen format must allow streaming over the internet and must be easy to implement into the new module.

There are four formats that come to mind when streaming is involved: Microsoft's Windows Media Video (WMV), Real Media Video (RM, RAM), Apple's Quicktime (MOV), Adobe's Flash Video (FLV).

One very important aspect it's to keep the total cost of the module as low as possible. This can be achieved by using open source project and to minimize the changes in the already existing code of the sharing application.

The considered e-learning platform includes a virtual classroom developed using Adobe Flash technologies, able to play external Flash files, so it comes natural to choose the FLV format for the recording.

The video encoding format chosen is Screen video, which uses the zlib compression for the frames. For the audio data, the MP3 format was chosed, for its great compression ratios.

The FLV files are transferred over RTMP connections with the Adobe Flash Media Server[TM]. RTMP, or Real Time Messaging Protocol, is a proprietary protocol developed by Adobe Systems for streaming audio, video and data over the internet between a Flash player and a server. A flash video player published for Flash Player 7 or above can also play FLV files directly with MIME type video/x-flv (libflv, 2008).

The FLV format it's a good choice also for the fact that is platform independent: all the data is stored in big-endian byte order. Also, there are many free and open-source players for Flash movies. For example, JW FLV media player for embedding a FLV stream into a webpage, or FLV Player for a stand-alone player. A FLV file encodes synchronized audio and video streams and consists of one single audio-video stream.

The second decision it's related to the programming language that will be used to create this module. The sharing application being written in Visual C++ 6.0, as a COM component, the FLV library was developed in Visual C# 2008 as a DLL library.

3. LIBRARY'S ARCHITECTURE

The figure 1 illustrates the architecture of the library, as well as the flow of data.

[FIGURE 1 OMITTED]

The sound is captured using the WaveIn API and fed into the Audio Tag Generator, where is converted into MP3 format by the LAME Encoder. The output of the generator consists of FLV audio tags.

The frames captured by the sharing application are prepared by the video adapter for the Video Tag Generator which produces FLV video tags. To compress the blocks in the Screen Video Packets, the ZLib is used (Lame, 2008.)

The AVMUX creates the FLV stream and synchronizes the sound with the image. A Stream recorder writes the generated FLV stream to a file on a persistent storage device.

4. IMPLEMENTATION DETAILS

4.1 The FLV file format

The FLV file format it's very simple, as shown in the following figure (Adobe Systems, 2008.)

Fig. 2. The FLV file format.

FLV Header

FLV Body

Previous Tag Size #0--UI32--Always 0

FLV Tag #1 (Audio, Video or Script Data Object)

Previous Tag Size #1--UI32--Size of Tag #1

FLV Tag #N (Audio, Video or Script Data Object)

Previous Tag Size #N--UI32--Size of Tag #N

The FLV tags are sections of the file that contain the audio, video or script information. These tags also contain the timestamp data that controls the way the file will be played.

The video data it is segmented and encapsulated in FLV Video tags. The audio data is also segmented and encapsulated in FLV Audio tags. After each tag, a 32 bits unsigned int stores its size (Adobe Systems, 2008.)

One video tag contains information about one frame:

* Frame type: specifies if the frame is a keyframe (the frame is seekable and it is stored in its whole), inter frame (not all the frame data it's store, only the changes from the previous frame), disposable inter frame, generated keyframe or video info/command frame.

* Codec used: Sorenson H.263, ScreenVideo, On2VP6, AVC

* Video frame payload, different for each codec type.

For the developed library, the Screen video codec is used. This is quite easy to implement and it is fast (Munoz, 2008.)

An audio tag defines what type of audio data is stored in the attached sound data: sound format (PCM, MP3, Nellymoser, G.711, AAC or device-specific sound), sound rate (5.5 kHz, 11 kHz, 22 kHz or 44 kHz), sound size (8 bit or 16 bit), sound type (mono or stereo), sound data (the payload in the format specified by the sound format). For the FLV library, the audio data will be compressed using MP3; in this way, the size of the file will be significantly smaller (Cardoso, 2008.)

4.2 Compressing the video data

The video data is encapsulated into a Screen Video Packet that contains the width and height of the frame in pixels and the width and height of a block (Adobe Systems, 2008.)

To compress the video data, the frame is split into blocks. Blocks have width and height that range from 16 to 256 in multiples of 16. The block size must not change except at a keyframe. Each block is compressed with the open source ZLIB

If the frame is a inter frame, not all the blocks need to be written, just the ones that changed from the previous frame.

4.3 Compressing the audio data

The audio data is captured from the primary sound device and it is compressed in MP3 format using the LAME encoder (Lame, 2008.)

The main issue is how to synchronize the sound with the video; the idea is to interlace the video and audio tags so none of them will load faster than the corresponding one.

The MP3 data it's organized in frames that contain a fixed number of samples (576 or 1152, depending on the sampling rate). The frames must be written in the FLV audio tag entirely, whitout fragmenting them, so a block of MP3 sound data always contains a number of samples that is a multiple of 576 or 1152 (mp3-tech, 2008.)

The ideal number of samples can be determined by dividing the sampling rate by the video frame rate. This number must be rounded to a multiple of 576 or 1152, so full MP3 frames will be written to the file.

To keep the MP3 streaming in sync with the video playback, the MP3 frames must be distributed as uniform as possible among video frames and to provide appropriate SeekSamples values in the audio data header.

5. CONCLUSION

The developed library can be use in any software project, not only in the context presented in this paper.

The use of the FLV format is the best solution at present time for streaming audio and video content over the internet: it can be very easily embedded in webpages or Flash applications and offers very good performances.

6. REFERENCES

Cardoso, I. C# MP3 Compressor, http://www.codeproject.com (May, 1st 2008)

Kristaly, D.M.; Sisak, F.; Truican I.; Moraru, S.A. & Sandu, F. (2008). Web 2.0 technologies in web application development, Proceedings of the 1st PETRA Conference--Workshop PTLIE, Athens, Greece

Munoz, I. A full-duplex audio player in C# using the waveIn/waveOut APIs, http://www.codeproject.com (May, 1st 2008)

SWF Format Specification, Adobe Systems, April 2008, electronic version

Video File Format Specification--Version 9, Adobe Systems, April 2008, electronic version

*** http://klaus.geekserver.net/libflv, Accessed on: 2008-05-03

*** http://lame.sourceforge.net, Accessed on: 2008-05-01

*** http://www.mp3-tech.org , Accessed on: 2008-04-23