首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We investigate the optimization of the quality of service (QoS) offered by real-time multimedia adaptive applications through machine learning algorithms. These applications are able to adapt in real time their internal settings (i.e., video sizes, audio and video codecs, among others) to the unpredictably changing capacity of the network. Traditional adaptive applications just select a set of settings to consume less than the available bandwidth. We propose a novel approach in which the selected set of settings is the one which offers a better user-perceived QoS among all those combinations which satisfy the bandwidth restrictions. We use a genetic algorithm to decide when to trigger the adaptation process depending on the network conditions (i.e., loss-rate, jitter, etc.). Additionally, the selection of the new set of settings is done according to a set of rules which model the user-perceived QoS. These rules are learned using the SLIPPER rule induction algorithm over a set of examples extracted from scores provided by real users. We will demonstrate that the proposed approach guarantees a good user-perceived QoS even when the network conditions are constantly changing.  相似文献   

2.
胡成  任平安  李文莉 《微机发展》2011,(10):85-87,91
FFmpeg是一个开源跨平台多媒体数据解决方案,常被移植到各种嵌入式系统中。将FFmpeg移植到Android系统中,能够增加Android系统对编解码格式标准的支持,但由于目前手机处理能力低,内存小等硬件配置因素,严重影响FFm-peg对音视频流的解码效率,导致解码出的音视频数据无法同步。通过研究基于时间戳的多媒体音视频同步算法模型,将其引入到FFmpeg中,并在Android平台进行算法实验。实验证明,基于时间戳多媒体音视频同步算法模型能够有效地保证多媒体数据的同步。  相似文献   

3.
一种基于内容相关性的跨媒体检索方法   总被引:12,自引:0,他引:12  
针对传统基于内容的多媒体检索对单一模态的限制,提出一种新的跨媒体检索方法.分析了不同模态的内容特征之间在统计意义上的典型相关性,并通过子空间映射解决了特征向量的异构性问题,同时结合相关反馈中的先验知识,修正不同模态多媒体数据集在子空间中的拓扑结构,实现跨媒体相关性的准确度量.实验以图像和音频数据为例验证了基于相关性学习的跨媒体检索方法的有效性.  相似文献   

4.
多媒体会议中的快速实时自适应混音方案研究   总被引:12,自引:0,他引:12       下载免费PDF全文
樊星  顾伟康  叶秀清 《软件学报》2005,16(1):108-115
多媒体会议中多点控制单元(multi-point controlling unit,简称MCU)在多点会议中提供音频、视频和数据等的集中处理能力,其中音频处理能力是最基本的,也是实时性要求最高的要素.针对多点多媒体会议的实际应用需求,归类并分析了多种自适应多点语音混合处理方案,提出了采用自对齐加权的高性能混音方案.该方案不使用在实时多媒体处理中广泛运用的饱和运算,所以不引入新的噪声,因而具有较低的算法复杂度,其混合处理结果具有良好的听觉主观舒适感.同时,这套方案具有较好的并行处理特性,使用DSP等硬件较易实现,可以广泛应用在多媒体会议系统的实现中.  相似文献   

5.
Keyframe-based video summarization using Delaunay clustering   总被引:1,自引:0,他引:1  
Recent advances in technology have made tremendous amounts of multimedia information available to the general population. An efficient way of dealing with this new development is to develop browsing tools that distill multimedia data as information oriented summaries. Such an approach will not only suit resource poor environments such as wireless and mobile, but also enhance browsing on the wired side for applications like digital libraries and repositories. Automatic summarization and indexing techniques will give users an opportunity to browse and select multimedia document of their choice for complete viewing later. In this paper, we present a technique by which we can automatically gather the frames of interest in a video for purposes of summarization. Our proposed technique is based on using Delaunay Triangulation for clustering the frames in videos. We represent the frame contents as multi-dimensional point data and use Delaunay Triangulation for clustering them. We propose a novel video summarization technique by using Delaunay clusters that generates good quality summaries with fewer frames and less redundancy when compared to other schemes. In contrast to many of the other clustering techniques, the Delaunay clustering algorithm is fully automatic with no user specified parameters and is well suited for batch processing. We demonstrate these and other desirable properties of the proposed algorithm by testing it on a collection of videos from Open Video Project. We provide a meaningful comparison between results of the proposed summarization technique with Open Video storyboard and K-means clustering. We evaluate the results in terms of metrics that measure the content representational value of the proposed technique.  相似文献   

6.
基于校园网的多媒体教室共享教学系统   总被引:6,自引:0,他引:6       下载免费PDF全文
本文提出了一种在校园网网络环境下及多个多媒体教室中实现实时共享课堂教学的一种教学方式,介绍了系统面对对象的实现方法,并进一步给出了系统中多媒体同步问题的基本算法描述。  相似文献   

7.
Nurcan  Wenye   《Computer Networks》2008,52(13):2558-2567
Wireless multimedia sensor networks (WMSN) are formations of a large number of compact form-factor computing devices that can capture multimedia content, such as video and audio, and communicate them over wireless channels. The efficiency of a WMSN heavily depends on the correct orientation (i.e., view) of its individual sensory units in the field. In this paper, we study the problem of self-orientation in WMSN, that is finding the most beneficial orientation for all multimedia sensors to maximize multimedia coverage. We propose a new algorithm to determine a node’s multimedia coverage and find the sensor orientation that minimizes the negative effect of occlusions and overlapping regions in the sensing field. Our approach enables multimedia sensor nodes to compute their directional coverage leading to an efficient and self-configurable sensor orientation calculation. By using simulations, we show that the occlusion-free viewpoint approach increases the multimedia coverage significantly. The self-orientation methodology is designed in the form of a distributed algorithm, making it a suitable candidate for deployment in practical systems.  相似文献   

8.
自动分割及跟踪视频运动对象的一种实现方法   总被引:32,自引:3,他引:29       下载免费PDF全文
随着MPEG-4压缩标准的制定,分割及跟踪视频运动对象的研究显得极其重要。在MPEG-4视频编码标准中,为了实现基于视频内容的交互功能,其视频序列的每一帧由视频对象面(VOP)来表示。为了生成视频对象面,需要对视频序列中的运动对象进行有效的分割;并跟踪运动对象随时间的变化,为此提出并实现了一种用于分割及跟踪视频运动对象的时空联合方法。该方法首先采用连续帧间差的4次统计量假设检验,确定运动对象的位置,自动地分离出运动区域与背景区域;在运动区域内,采用数学形态学的分水线算法来精确地提取运动对象的轮廓;最后,将提取到的运动对象作为模板,对后续的视频序列,用Hausdorff距离度量,来跟踪并提取后续帧中运动对象。实验结果表明,该方法能有效地分割和跟踪视频运动对象,且能有效减少计算复杂度,其调整参数也较少。  相似文献   

9.
Most implementations of workstation-based multimedia information systems cannot support a continuous display of high resolution audio and video data and suffer from frequent disruptions and delays termed hiccups. This is due to the low I/O bandwidth of the current disk technology, the high bandwidth requirement of multimedia objects, and the large size of these objects, which requires them to be almost always disk resident. A parallel multimedia information system and the key technical ideas that enable it to support a real-time display of multimedia objects are described. In this system, a multimedia object across several disk drives is declustered, enabling the system to utilize the aggregate bandwidth of multiple disks to retrieve an object in real-time. Then, the workload of an application is distributed evenly across the disk drives to maximize the processing capability of the system. To support simultaneous display of several multimedia objects for different users, two alternative approaches are described. The first approach multitasks a disk drive among several requests while the second replicates the data and dedicates resources to each individual request. The trade-offs associated with each approach are investigated using a simulation model  相似文献   

10.
Advances in the media and entertainment industries, including streaming audio and digital TV, present new challenges for managing and accessing large audio-visual collections. Current content management systems support retrieval using low-level features, such as motion, color, and texture. However, low-level features often have little meaning for naive users, who much prefer to identify content using high-level semantics or concepts. This creates a gap between systems and their users that must be bridged for these systems to be used effectively. To this end, in this paper, we first present a knowledge-based video indexing and content management framework for domain specific videos (using basketball video as an example). We will provide a solution to explore video knowledge by mining associations from video data. The explicit definitions and evaluation measures (e.g., temporal support and confidence) for video associations are proposed by integrating the distinct feature of video data. Our approach uses video processing techniques to find visual and audio cues (e.g., court field, camera motion activities, and applause), introduces multilevel sequential association mining to explore associations among the audio and visual cues, classifies the associations by assigning each of them with a class label, and uses their appearances in the video to construct video indices. Our experimental results demonstrate the performance of the proposed approach.  相似文献   

11.
In this paper, an audio-driven algorithm for the detection of speech and music events in multimedia content is introduced. The proposed approach is based on the hypothesis that short-time frame-level discrimination performance can be enhanced by identifying transition points between longer, semantically homogeneous segments of audio. In this context, a two-step segmentation approach is employed in order to initially identify transition points between the homogeneous regions and subsequently classify the derived segments using a supervised binary classifier. The transition point detection mechanism is based on the analysis and composition of multiple self-similarity matrices, generated using different audio feature sets. The implemented technique aims at discriminating events focusing on transition point detection with high temporal resolution, a target that is also reflected in the adopted assessment methodology. Thereafter, multimedia indexing can be efficiently deployed (for both audio and video sequences), incorporating the processes of high resolution temporal segmentation and semantic annotation extraction. The system is evaluated against three publicly available datasets and experimental results are presented in comparison with existing implementations. The proposed algorithm is provided as an open source software package in order to support reproducible research and encourage collaboration in the field.  相似文献   

12.
Silence detection and removal is an essential building block of any multimedia video conferencing system. It reduces the bandwidth requirements of the underlying network transport service and helps to maintain an acceptable end-to-end delay for audio. We analyze the requirements for a silence detection algorithm hosted on a multimedia communication system, and propose a novel low-complexity algorithm operating in the non-linear domain. After discussing the constraints which are imposed by the architecture of the system hardware (computer, packet-based network), we show that several recently proposed silence detection algorithms fail to meet all of these constraints. A new approach is then introduced, based on the small- and large-signal behavior of the speech waveform in the -law domain. The new algorithm is compared with a recent design that meets several of our requirements; experimental results indicate that it performs significantly better in the particular environment at hand.  相似文献   

13.
使用修改的豪氏道夫距离自动提取运动对象   总被引:1,自引:1,他引:0       下载免费PDF全文
新的视音频编码标准MPEG-4增加了支持基于内容的功能,它把视频序列分割成语义意义上的视频对象(VO)视频对象在某一瞬时的:“快照”称为视频对象平面(VOP),且一系列VOP表示一个运动对象,VOP分割相当困难,这主要是因为物理对象通常不以亮度,彩色或光流等低级特征来表达,所以经典的分割方法无法获得有意义的分割结果,为了对这种视频运动图象进行有效的提取,提出了一种基于修改的豪氏道夫对象踊跃器的自动VOP分割方法,首先提取出初始模型,然后用跟踪器在序列中继帧中跟踪此对象,再对模型逐帧修改,以适应对象在后继帧中形状的旋转和变化,最后根据一系列二值模型来提取出视频对象,此外,为了提高分割效果帮减少复杂性,还使用了静 背景滤除技术来滤除静态背景,实验结果表明,该算法是有效的。  相似文献   

14.
Interaction and integration of multimodality media types such as visual, audio, and textual data in video are the essence of video semantic analysis. Contextual information propagation is useful for both intra- and inter-shot correlations. However, the traditional concatenated vector representation of videos weakens the power of the propagation and compensation among the multiple modalities. In this paper, we introduce a higher-order tensor framework for video analysis. We represent image frame, audio, and text in video shots as data points by the 3rd-order tensor. Then we propose a novel dimension reduction algorithm which explicitly considers the manifold structure of the tensor space from contextual temporal associated cooccurring multimodal media data. Our algorithm inherently preserves the intrinsic structure of the sub- manifold where tensorshots are sampled and is also able to map out-of-sample data points directly. We propose a new transductive support tensor machines algorithm to train effective classifier using large amount of unlabeled data together with the labeled data. Experiment results on TREVID 2005 data set show that our method improves the performance of video semantic concept detection.  相似文献   

15.
16.
介绍了一种基于Ti DM642 DSP与PCI总线的多路MPEG-4多媒体实时压缩板卡系统;该系统由视频解码SAA7144H采集视频数据,由PCM1801U采集音频,多媒体数据采集到DM642DSP,完成硬件压缩、位流复合;经由PCI总线传送图像、位流于PC系统并预览、存储;该设计创新的提出了视频编码算法DSP优化根本思想,并在研发过程中得到了良好的实施和验证,高效的DSP视频优化算法保证了多路板卡高性价比的产品,系统较低的CPU占用资源完全达到了视频监控场合的需要.  相似文献   

17.
隐马尔可夫模型实现复杂数据挖掘   总被引:3,自引:0,他引:3  
利用隐马尔可夫模型(HMM)对多媒体数据库进行复杂数据挖掘,复杂数据挖掘要解决的难题就是音频和视频识别。在建立音、视频识别算法的基础上,构造出符合HMM的识别方法。实验证明该系统声音的识别率最高达到96.67%,视频中特征值的检测率可达87.81%。  相似文献   

18.
Traditional browsing of large multimedia documents (e.g., video, audio) is primarily sequential. In the absence of an index structure browsing and searching for relevant information in a long video, audio or other multimedia document becomes difficult. Manual annotation can be used to mark various segments of such documents. Different segments can be combined to create new annotated segments, thus creating hierarchical annotation structures. Given the lack of structure in media data, it is natural for different users to have different views on the same media data. Therefore, different users can create different annotation structures. Users may also share some or all of each other's annotation structures. The annotation structure can be browsed or used to playback as a composed video consisting of different segments. Finally, the annotation structures can be manipulated dynamically by different users to alter views on a document. BRAHMA is a multimedia environment for browsing and retrieval of multimedia documents based on such hierarchical annotation structures.  相似文献   

19.
基于细节层次与最小生成树的三维地形识别与检索   总被引:5,自引:1,他引:5  
肖俊  庄越挺  吴飞 《软件学报》2003,14(11):1955-1963
图像、视频、音频和图形等均是多媒体数据流中的信息载体,对上述数据所蕴涵的内容进行分析,可以极大地方便人们对它们的使用与管理.基于内容的图像(视频)和音频检索已经取得了不少进展,但是对于图形,特别是3D图形进行识别与检索的有效方法还很少见.提出了对相似3D物体识别与检索的算法,在这个算法中,首先使用细节层次模型对3D物体进行三角面片约减,然后提取3D物体的特征.由于所提取的特征维数很大,最小生成树(minimum spanning tree,简称MST)被用来对每一个3D物体的特征进行约减,基于约减后的特征,实现了基于支持向量机的3D物体识别与检索方法.这个算法被使用到3D丘陵与山地的地形识别中,取得了良好效果.  相似文献   

20.
远程音、视频重演中的同步技术   总被引:3,自引:0,他引:3  
音、视频信息的同步重演足多媒体应用系统实现的关键技术和难点之一,该文探讨了音、视频同步的机制,依据基于参考点的同步思想创建了以音频为时间主导的同步模型,提出了在解码端运用的音、视频同步算法,并给出了同步的判断准则和实际处理的过程。算法已应用于基于MPEG-4编码标准的分布式多媒体监控系统中,实验表明,它能准确、可靠地实现音、视频连续同步重演,并具有较好的通用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号