首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
会议电视系统中语音编码算法的研究   总被引:3,自引:0,他引:3  
本文介绍了会议电视系统及其语音编码的国际标准,结合会议电视系统开发的实际,重点讨论了在实际系统中常用的三个标准:G.711,G.722和G.728,并对这三种标准对应遥三种编码算法进行了改进,利用改进算法的实现结果,结合语音编码的评价体系,我们对上述三种算法进行了比较和评价,给出了部分的实验结果图以及三种算法的一些对比图。  相似文献   

2.
关于音频水印算法透明性评估的讨论   总被引:1,自引:0,他引:1  
许多文献都对水印鲁棒性评估技术有大量的阐述,实际上透明性评估在数字水印评估领域也是非常重要的。因此,该文主要是针对数字音频水印算法嵌入函数的透明进行评估。文章给出了透明性的一般性定义,从SMBA中选择攻击方案,利用原始的和修改后的攻击来评估音频水印算法的透明性。该文提出了检测结果和音频内容的依赖性。  相似文献   

3.
一种基于内容的音频流二级分割方法   总被引:5,自引:0,他引:5  
基于内容的音频流分割是多媒体数据分析领域中的一个十分重要和困难的问题.目前大多数传统的音频流分割方法是基于小尺度音频分类的,但是这类分割方法普遍存在虚假分割点过多的缺点,严重影响了实际应用的效果.作者的研究表明,大尺度音频片段的分类正确率要明显高于小尺度音频片段的分类正确率,并且这个趋势与分类器选择无关.基于这个事实和减少虚假分割点的目的,作者提出了一种新的音频流分割方法.首先,采用基于大尺度音频分类的分割方法对音频流进行粗分割,以减少虚假分割点;然后定义了分割点评价函数,并利用它在边界区域中进一步精确定位分割点.实验结果表明这种音频流分割方法可以比较精确地获取分割点位置,同时将虚假分割点减少到传统方法的四分之一.  相似文献   

4.
利用数学模型分析了二项式算法在静态网络环境下的反应时间及平滑性,讨论了该算法中参数选取与性能表现之间的关系。针对以往二项式算法只在静态网络环境下研究,与实际网络环境差别较大的情况,设计动态环境的网络拓扑,模拟分析了几种不同参数的“TCP兼容”的二项式算法的性能表现。NS仿真结果表明,对于实时流媒体等应用,适当的选取二项式算法的参数不仅能满足应用的需求,又能保证与TCP更好的兼容性。  相似文献   

5.
This paper presents a method for embedding data in an audio file container. A feature of this method is its resistance to attacks based on a comparison of histograms. The advantages of this method are its exact reversibility and the possibility of performance in real-time. The cost for these benefits is an increase in the amount of data two times and the emergence of high-frequency components in the stego audio file compared to the blank. This paper also presents the results of simulation algorithms of the method and the evaluation of the relative capacity of the container. This method can also be used in the problem of watermark embedding in audio files.  相似文献   

6.
数字水印技术具有鲁棒性、透明性、复杂性等特性,大部分文献对水印评估的阐述都是集中在鲁棒性上,实际上复杂性评估在数字水印评估领域也是非常重要的.因此,主要是针对数字音频水印算法嵌入函数的复杂性进行评估.给出了基本方案评估的概念,选取了两种典型的音频水印算法,应用此方案评估标准对算法进行评估,并同时论述了应用到基本方案上的嵌入参数和音频检验集.给出了两种算法复杂性评估的结果并将结果进行比较.  相似文献   

7.
In this paper we propose several novel algorithms for multi-video summarization. The first and essential algorithm, Video Maximal Marginal Relevance (Video-MMR), mimics the principle of a classical algorithm of text summarization, Maximal Marginal Relevance (MMR). Video-MMR rewards relevant keyframes and penalizes redundant keyframes, only relying on visual features. We extend Video-MMR to Audio Video Maximal Marginal Relevance (AV-MMR) by exploiting audio features. We also propose Balanced AV-MMR, which exploits additional semantic features, the balance between audio information and visual information, and the balance of temporal information in different videos of a set. The proposed algorithms are generic and suitable for summarizing various video genres in multi-video set by using multimodal information. Our series of MMR algorithms for multi-video summarization are proved to be effective by the large-scale subjective and objective evaluation.  相似文献   

8.
Many audio signal applications are corrupted by noise. In particular, adaptive filters are frequently applied to white noise reduction in audio. Recent work provides that there exist some insights on using an artificial intelligence method called artificial hydrocarbon networks (AHNs) for filtering audio signals. Thus, the scope of this paper is to design and implement a novel approach of artificial hydrocarbon networks on adaptive filtering for audio signals. Three experiments were developed. Results demonstrate that AHNs can reduce noise from audio signals. A comparison between the proposed algorithm and a FIR-filter is also provided. The short-time objective intelligibility value (STOI) and the signal-to-noise ratio (SNR) were used for evaluation. At last, the proposed training method for finding the parameters involved in the AHN-filter can also be used in other fields of application.  相似文献   

9.
提出一种基于状态异步动态贝叶斯网络模型(SA-DBN)的语音驱动面部动画合成方法。提取音视频语音数据库中音频的感知线性预测特征和面部图像的主动外观模型(AAM)特征来训练模型参数,对于给定的输入语音,基于极大似然估计原理学习得到对应的最优AAM特征序列,并由此合成面部图像序列和面部动画。对合成面部动画的主观评测结果表明,与听视觉状态同步的DBN模型相比,通过限制听觉语音状态和视觉语音状态间的最大异步程度,SA-DBN可以得到清晰自然并且嘴部运动与输入语音高度一致的面部动画。  相似文献   

10.
The goal of this paper is to develop an audio quality metric that can accurately quantify subjective quality over audio fidelities ranging from highly impaired to perceptually lossless. As one example of its utility, such a metric would allow scalable audio coding algorithms to be easily optimized over their entire operating ranges. We have found that the ITU-recommended objective quality metric, ITU-R BS.1387, does not accurately predict subjective audio quality over the wide range of fidelity levels of interest to us. In developing the desired universal metric, we use as a starting point the model output variables (MOVs) that make up BS.1387 as well as the energy equalization truncation threshold which has been found to be particularly useful for highly impaired audio. To combine these MOVs into a single quality measure that is both accurate and robust, we have developed a hybrid least-squares/minimax optimization procedure. Our test results show that the minimax-optimized metric is up to 36% lower in maximum absolute error compared to a similar metric designed using the conventional least-squares procedure.  相似文献   

11.
针对自适应滤波器编程复杂,难以按照虚拟仪器系统的形式来测试工程应用中的实际性能等问题。文中利用LabVIEW8.6提供的自适应滤波器工具包,设计了基于最小均方误差算法、递推最小二乘算法的自适应滤波器,并对影响两种算法的参数对滤波器的敏感性进行了分析;进而,利用音频信号验证了滤波器性能。仿真结果表明,所设计的自适应滤波器功能全面,人机交互界面良好,便于工程技术人员快速开发,具有较好的工程实用价值。  相似文献   

12.
随着信息化的发展,音视频流媒体技术应用面越来越广,为了使得音视频流媒体技术尤其是在直播方面拥有更好的性能,得到更多用户的好评,采用在原本HTTP的动态自适应流标准的视频流媒体架构下引入MPC控制算法并将MPC模型预测控制与码率自适应算法相结合的方法,进行对AAC优化、确定预测模型、测试音视频同步的影响因素以及PSNR-Y分量、测试切片时长与跳帧时延,计算最终的QoE用户评价指标来进一步检测音视频流媒体技术的优劣。经实验仿真测试可知,相比前人的相关算法,在不同直播场景下以及不同网络环境下均有更加良好的QoE值,平均QoE用户评价指标明显更高,为1237.2826。综上分析可知,MPC的音视频同步码率自适应算法各项性能最好。  相似文献   

13.
Performance evaluation of evolutionary heuristics in dynamic environments   总被引:2,自引:2,他引:0  
In recent years, there has been a growing interest in applying genetic algorithms to dynamic optimization problems. In this study, we present an extensive performance evaluation and comparison of 13 leading evolutionary algorithms with different characteristics on a common platform by using the moving peaks benchmark and by varying a set of problem parameters including shift length, change frequency, correlation value and number of peaks in the landscape. In order to compare solution quality or the efficiency of algorithms, the results are reported in terms of both offline error metric and dissimilarity factor, our novel comparison metric presented in this paper, which is based on signal similarity. Computational effort of each algorithm is reported in terms of average number of fitness evaluations and the average execution time. Our experimental evaluation indicates that the hybrid methods outperform the related work with respect to quality of solutions for various parameters of the given benchmark problem. Specifically, hybrid methods provide up to 24% improvement with respect to offline error and up to 30% improvement with respect to dissimilarity factor by requiring more computational effort than other methods.  相似文献   

14.
An audio fingerprint is a compact yet very robust representation of the perceptually relevant parts of an audio signal. It can be used for content-based audio identification, even when the audio is severely distorted. Audio compression changes the fingerprint slightly. We show that these small fingerprint differences due to compression can be used to estimate the signal-to-noise ratio (SNR) of the compressed audio file compared to the original. This is a useful content-based distortion estimate, when the original, uncompressed audio file is unavailable. The method uses the audio fingerprints only. For stochastic signals distorted by additive noise, an analytical expression is obtained for the average fingerprint difference as function of the SNR level. This model is based on an analysis of the Philips robust hash (PRH) algorithm. We show that for uncorrelated signals, the bit error rate (BER) is approximately inversely proportional to the square root of the SNR of the signal. This model is extended to correlated signals and music. For an experimental verification of our proposed model, we divide the field of audio fingerprinting algorithms into three categories. From each category, we select an algorithm that is representative for that category. Experiments show that the behavior predicted by the stochastic model for the PRH also holds for the two other algorithms.  相似文献   

15.
Content-based audio content authentication algorithms provide a method to solve the veracity and integrity of audio content. On the basic of pseudo-Zernike moments, an audio content authentication algorithm robust against feature-analysed substitution attack is proposed, which is aimed at some insecure issues in the existing content-based audio content authentication schemes. Firstly, the audio signal is cut into non-overlapping frames and each frame is divided into two segments, and each segment is scrambled. Then, synchronization codes generated by pseudo random sequence and watermark bits generated by pseudo-Zernike moments are embedded in the first and second segment, respectively, which are completed by quantizing the modulus of pseudo-Zernike moments. The scrambled segments used to generate and extract watermark are unknown to attackers. So, it is difficult for attackers to get the watermark generated and extracted to perform feature-analysed substitution attack. The synchronization code and watermark embedding method proposed is inaudible and has excellent ability to tolerance against common signal processing operations. Compared with the existing audio watermark algorithms based on pseudo-Zernike moments, the algorithm increases the embedding capacity and improves the security of the watermarking system.  相似文献   

16.
为了探讨高斯混合模型在说话人识别中的作用,设计了一个基于GMM的说话人识别系统。整个系统由音频信号预处理,语音活动检测,说话人模型建立以及音频信号识别4个模块组成。前三个模块构成了系统的模型训练部分,最后一个模块构成了系统的语音识别部分。包含在第二个模块中的由GMM模型搭建的语音活动检测器是研究的创新之处。利用增强的多方互动会议语料库中的视听会议对系统中的部分可调参数以及系统的识别错误率进行了测试。仿真结果表明,在语音活动检测器和若干滤波算法的帮助下,系统对包含重叠语音的音频信号的识别准确率可以达到83.02%。  相似文献   

17.
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification has been becoming a focus in the research of audio processing and pattern recognition. Automatic audio classification is very useful to audio indexing, content-based audio retrieval and on-line audio distribution, but it is a challenge to extract the most common and salient themes from unstructured raw audio data. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. Support vector machines are applied to classify audio into their respective classes by learning from training data. Then the proposed method extends the application of neural network (RBFNN) for the classification of audio. RBFNN enables nonlinear transformation followed by linear transformation to achieve a higher dimension in the hidden space. The experiments on different genres of the various categories illustrate the results of classification are significant and effective.  相似文献   

18.
Given a large audio database of music recordings, the goal of classical audio identification is to identify a particular audio recording by means of a short audio fragment. Even though recent identification algorithms show a significant degree of robustness towards noise, MP3 compression artifacts, and uniform temporal distortions, the notion of similarity is rather close to the identity. In this paper, we address a higher level retrieval problem, which we refer to as audio matching: given a short query audio clip, the goal is to automatically retrieve all excerpts from all recordings within the database that musically correspond to the query. In our matching scenario, opposed to classical audio identification, we allow semantically motivated variations as they typically occur in different interpretations of a piece of music. To this end, this paper presents an efficient and robust audio matching procedure that works even in the presence of significant variations, such as nonlinear temporal, dynamical, and spectral deviations, where existing algorithms for audio identification would fail. Furthermore, the combination of various deformation- and fault-tolerance mechanisms allows us to employ standard indexing techniques to obtain an efficient, index-based matching procedure, thus providing an important step towards semantically searching large-scale real-world music collections.  相似文献   

19.
目前的数字水印算法,主要集中在针对图像和视频,对音频、语音数据产品的数字水印算法研究仍然较少,需要提出更多的有效的针对音频、语音数据的数字水印算法。该文以二值图像作为数字水印信息,以音频为嵌入对象,在离散傅立叶变换域的系数上嵌入水印。在离散余弦变换域,采用扩频技术通过对变换域系数进行量化来嵌入水印信息,在水印的提取过程中不需要原始音频信号的参与。  相似文献   

20.
Audio support for an object-oriented database-management system   总被引:1,自引:0,他引:1  
We describe the development of the data type audio in an object-oriented database management system (DBMS). The interface of the data type includes operations to store, retrieve, and manipulate audio data. Additionally, a transport protocol supports continuous recording and presentation at the users' workstations in a client-server environment. Design considerations are outlined and lead us to use no compression algorithms and to handle parametrized sample rates and sizes transparently for the user. Specific manipulation operations, such as low-pass filtering and dynamic compression, are described in detail. The implementation of an interactive audio tool shows that the data type audio can be used in the same way as conventional data types. We give an outlook on further built-in support of time-dependent media that a comprehensive multimedia DBMS should offer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号