期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

余迟黄刘生杨威陈志立缪海波《小型微型计算机系统》2012,33(7):1445-1449

现今,移动通信网络已经发展到了第三代,随着3G网络的不断发展,研究3G网络隐蔽通信是十分必要的.语音编码过程中利用编码参数隐藏信息是现今的一个研究热点.基音周期是描述音源特征的一个主要参数,基于基音周期的修改较难检测的特点,在自适应码本搜索的过程中,提出一种在3G话音中利用基音周期参数隐藏信息的算法,使用该算法可以在3G通信的过程中嵌入隐秘信息.该算法可以抵抗基于浊音特性的语音压缩域隐写分析.仿真实验的结果表明该方法对合成语音质量的影响很小,具有良好的隐蔽信息传输性能. 相似文献

2.

Source cell phone verification from speech recordings using sparse representation

《Digital Signal Processing》2017

Source recording device recognition is an important emerging research field in digital media forensics. The literature has mainly focused on the source recording device identification problem, whereas few studies have focused on the source recording device verification problem. Sparse representation based classification methods have shown promise for many applications. This paper proposes a source cell phone verification scheme based on sparse representation. It can be further divided into three schemes which utilize exemplar dictionary, unsupervised learned dictionary and supervised learned dictionary respectively. Specifically, the discriminative dictionary learned by supervised learning algorithm, which considers the representational and discriminative power simultaneously compared to the unsupervised learning algorithm, is utilized to further improve the performances of verification systems based on sparse representation. Gaussian supervectors (GSVs) based on MFCCs, which have shown to be effective in capturing the intrinsic characteristics of recording devices, are utilized for constructing and learning dictionary. SCUTPHONE, which is a corpus of speech recordings from 15 cell phones, is presented. Evaluation experiments are conducted on three corpora of speech recordings from cell phones and demonstrate the effectiveness of the proposed methods for cell phone verification. In addition, the influences of number of target examples in the exemplar dictionary and size of the unsupervised learned dictionary on source cell phone verification performance are also analyzed. 相似文献

3.

基于Android系统的FFmpeg多媒体同步传输算法研究

胡成任平安李文莉《微机发展》2011,(10):85-87,91

FFmpeg是一个开源跨平台多媒体数据解决方案,常被移植到各种嵌入式系统中。将FFmpeg移植到Android系统中,能够增加Android系统对编解码格式标准的支持,但由于目前手机处理能力低,内存小等硬件配置因素,严重影响FFm-peg对音视频流的解码效率,导致解码出的音视频数据无法同步。通过研究基于时间戳的多媒体音视频同步算法模型,将其引入到FFmpeg中,并在Android平台进行算法实验。实验证明,基于时间戳多媒体音视频同步算法模型能够有效地保证多媒体数据的同步。相似文献

4.

Distributed audio recorder using smart phones and proximity connections

Gergely Hományi^{Author Vitae} Lóránt Farkas Author VitaeKristóf Aczél Author Vitae 《Computer Standards & Interfaces》2011,33(3):315-324

A set of smart phones distributed in space may constitute a powerful recording device that is able to create audio/video recordings of social events in a location-aware manner. Such events are formal and informal meetings with 2 or more participants. If each person has a smart phone, this can be programmed to record his speech and skip the speech of the rest of people. Proximity networking capabilities let the individually recorded content be aggregated and distributed to each smart phone.In this paper we analyze such a scenario in which Symbian S60 platforms with Bluetooth version 1.2 stack were used. A distributed recording and content delivery framework is proposed and its performance analyzed by means of simulation and measurements. The validation of our algorithm has been performed using a prototype implementation over these platforms. 相似文献

5.

Detecting and locating digital audio forgeries based on singularity analysis with wavelet packet

Jiaorong Chen Shijun Xiang Hongbin Huang Weiping Liu 《Multimedia Tools and Applications》2016,75(4):2303-2325

Audio watermarking and signature are widely used for authentication. However, these techniques will become powerless in many actual situations because of their requirement of additional information. Audio forensic techniques are necessary for digital audio. In this paper, we propose an audio forensics scheme to detect and locate speech audio forged operations in time domain (including deletion, insertion, substitution and splicing) by performing discrete wavelet packet decomposition and analyzing singularity points of audio signals. We first analyze the forged operations and find that the audio signals will often generate new singular points because of the decrease or breaking of the correlation property of those samples close to the tampering position. Then we utilize the singularity analysis based on wavelet packet and design five parameters (which is different for the sample rate of digital audio file) to propose an approach which can detect and locate audio forgeries in time domain. Finally, extensive experimental results have demonstrated that the proposed method can better achieve the goals that identify whether a given speech file has been tampered (e.g., part of the content deleted or replaced) previously and further locate the forged positions in time domain. 相似文献

6.

A new error concealment method for consecutive frame loss based on CELP speech

Jie Yang^{Author Vitae} Sheng Sheng Yu Author VitaeAuthor Vitae Yi Gao Author Vitae 《Computers & Electrical Engineering》2010,36(5):1014-1020

Lower bit-rate speech coding by digital signal processing becomes more and more important with the development of communication technology. Speech codec should keep good quality in various conditions such as diverse channel, different speakers and background noises. When transmission environment is poor and the channel coding could not effectively control error occurrences, error concealment will be applied. Generally speaking, error concealment is based on extrapolation method or repetition method in which the speech coding parameters are extrapolated or repeated from the parameters of the surrounding good frame received. This paper focuses on speech coding standard Adaptive Multi-Rate (AMR) and two points are discussed: the value of pitch lag when consecutive frames are lost and the recovery of codebook gain for good frames after continuous bad frames. Objective and subjective experimental results confirm that the proposed algorithm could achieve better speech quality. 相似文献

7.

AMR mode selection enhancement in 3G networks

Igor D. D. Curcio Juha Kalliokulju Miikka Lundan 《Multimedia Tools and Applications》2006,28(3):259-281

This paper describes methods for mode selection in multirate speech codecs, such as the AMR (Adaptive Multi-Rate), that is the mandatory speech codec selected in 3GPP (3rd Generation Partnership Project) defined mobile networks. Originally, the multirate functionality has been developed for coping with changing radio conditions. The algorithms described in this paper find applicability in IP-based mobile networks, where speech encoded data is encapsulated using the RTP (Real Time Protocol). The main advantages offered by these techniques are improved speech quality and congestion control along the network path between two mobile terminals. 相似文献

8.

Nucleus系统的移动终端录音功能设计

田磊《单片机与嵌入式系统应用》2009,(7):60-62,68

针对Nucleus系统的移动设备硬件平台,分析了自适应多速率（AMR）编解码算法的工作原理,提出了基于AMR语音压缩算法的语音录音功能的设计方案,重点研究了手机语音多媒体软件设计。通过交叉编译环境,对软件进行了调试,运行良好。相似文献

9.

Articulatory and excitation source features for speech recognition in read,extempore and conversation modes

K. E. Manjunath K. Sreenivasa Rao 《International Journal of Speech Technology》2016,19(1):121-134

In our previous works, we have explored articulatory and excitation source features to improve the performance of phone recognition systems (PRSs) using read speech corpora. In this work, we have extended the use of articulatory and excitation source features for developing PRSs of extempore and conversation modes of speech, in addition to the read speech. It is well known that the overall performance of speech recognition system heavily depends on accuracy of phone recognition. Therefore, the objective of this paper is to enhance the accuracy of phone recognition systems using articulatory and excitation source features in addition to conventional spectral features. The articulatory features (AFs) are derived from the spectral features using feedforward neural networks (FFNNs). We have considered five AF groups, namely: manner, place, roundness, frontness and height. Five different AF-based tandem PRSs are developed using the combination of Mel frequency cepstral coefficients (MFCCs) and AFs derived from FFNNs. Hybrid PRSs are developed by combining the evidences from AF-based tandem PRSs using weighted combination approach. The excitation source information is derived by processing the linear prediction residual of the speech signal. The vocal tract information is captured using MFCCs. The combination of vocal tract and excitation source features is used for developing PRSs. The PRSs are developed using hidden Markov models. Bengali speech database is used for developing PRSs of read, extempore and conversation modes of speech. The results are analyzed and the performance is compared across different modes of speech. From the results, it is observed that the use of either articulatory or excitation source features along-with to MFCCs will improve the performance of PRSs in all three modes of speech. The improvement in the performance using AFs is much higher compared to the improvement obtained using excitation source features. 相似文献

10.

Low-complexity feature-mapped speech bandwidth extension

Gustafsson H. Lindgren U.A. Claesson I. 《IEEE transactions on audio, speech, and language processing》2006,14(2):577-588

Today's telecommunications systems use a limited audio signal bandwidth. A typical bandwidth is 0.3-3.4 kHz, but recently it has been suggested that mobile phone networks will facilitate an audio signal bandwidth of 50 Hz-7 kHz. This is suggested since an increased bandwidth will increase the sound quality of the speech signals. Since only few telephones initially will have this facility, a method extending the conventional narrow frequency-band speech signal into a wide-band speech signal utilizing the receiving telephone only is suggested. This will give the impression of a wide-band speech signal. The proposed speech bandwidth extension method is based on models of speech acoustics and fundamentals of human hearing. The extension maps each speech feature separately. Care has been taken to deal with implementation aspects, such as noisy speech signals, speech signal delays, computational complexity, and processing memory usage. 相似文献

11.

语言发音模型研究综述

下载免费PDF全文

张金光《计算机工程与应用》2018,54(12):27-34

对各种语言发音模型进行了综述,分别讨论了言语声音模型和言语动作模型。言语声音模型研究语言发音的声学原理,利用声音信号处理技术重构语音信号波形,由于对声源和共鸣之间的关系的认识不同,以及对共鸣的分析方法的不同,产生了3种不同的语言发音模型,第一种是频谱分析模型,第二种是共振峰模型,第三种是生理发音模型。言语动作模型研究发音器官的运动过程,利用图像信号处理技术重构发音器官的发音动作,根据建模方法的不同,言语动作模型可以分为3类：生理机能模型、几何特征模型、统计参数模型。相似文献

12.

IEEE 802.16系统中自适应的功率节省策略

下载免费PDF全文

周向军《计算机工程与科学》2010,32(6):16-18

针对IEEE 802.16系统中基于自适应多速率(AMR)语音编码器的IP语音(VoIP)业务,本文提出了一个自适应的功率节省策略。该策略周期性检测双向会话的语音帧信息,以此来判断上下行业务是否均进入语音静默期,然后自适应地调整功率节省模式参数。从能量节省、丢包率、系统信令开销方面分析了所提策略的性能,并且做了仿真实验。从理论分析和仿真结果可以看出,新策略在保证一定丢包率的基础上,可以比传统策略减少13.4%以上的能量损耗。相似文献

13.

Android系统智能手机语音应用开发环境构架

周巍何涛林嘉宇《微处理机》2011,32(6):28-32

3G无线网络技术的飞速发展促进了移动终端硬件性能不断提升.具有多功能、多应用和服务“智能化”等特点的新一代智能手机正在逐渐取代传统的功能手机.着眼智能手机的语音应用开发,选取Android操作系统智能手机,介绍其语音应用开发环境的构架,并提出基于此构架的语音应用开发解决方案. 相似文献

14.

Why are mobile phones annoying?

Andrew Monk Jenni Carroll Sarah Parker Mark Blythe 《Behaviour & Information Technology》2004,23(1):33-41

Sixty four members of the public were exposed to the same staged conversation either while waiting in a bus station or travelling on a train. Half of the conversations were by mobile phone, so that only one end of the conversation was heard, and half were co present face-to-face conversations. The volume of the conversations was controlled at one of two levels: the actors' usual speech level and exaggeratedly loud. Following exposure to the conversation participants were approached and asked to give verbal ratings on six scales. Analysis of variance showed that mobile phone conversations were significantly more noticeable and annoying than face-to-face conversations at the same volume when the content of the conversation is controlled. Indeed this effect of medium was as large as the effect of loudness. Various explanations of this effect are explored, with their practical implications. 相似文献

15.

Reading dynamically displayed text

《Behaviour & Information Technology》2012,31(1):33-41

Sixty four members of the public were exposed to the same staged conversation either while waiting in a bus station or travelling on a train. Half of the conversations were by mobile phone, so that only one end of the conversation was heard, and half were co present face-to-face conversations. The volume of the conversations was controlled at one of two levels: the actors' usual speech level and exaggeratedly loud. Following exposure to the conversation participants were approached and asked to give verbal ratings on six scales. Analysis of variance showed that mobile phone conversations were significantly more noticeable and annoying than face-to-face conversations at the same volume when the content of the conversation is controlled. Indeed this effect of medium was as large as the effect of loudness. Various explanations of this effect are explored, with their practical implications. 相似文献

16.

面向移动通信网的蓝牙加密语音同步研究

下载免费PDF全文

洪鹏程黄一才郁滨《计算机工程与应用》2020,56(13):131-136

在蓝牙终端嵌入加密模块，可以实现语音在手机和移动通信网的安全传输，针对蓝牙加密语音同步问题，通过正弦信号时域特征提取，利用波形码本的可透传性，设计了初始同步头和周期同步头，提出了语音初始同步建立方案和语音同步检测恢复方案。仿真实验及分析表明，方案具有较小时间开销和计算开销，能够有效实现初始同步、同步检测及恢复。相似文献

17.

基于手机的取证调查模型研究

吴叶科宋如顺陈波《计算机时代》2010,(12):24-26

给出了手机取证的概念,并与计算机取证进行了比较,分析了手机取证和计算机取证的差异。结合手机取证的特点和难点,提出了基于手机的取证调查模型,分析了模型中各个阶段的具体活动。该模型对取证人员具有一定的指导意义。相似文献

18.

Discriminating Between Pitched Sources in Music Audio 总被引：1，自引：0，他引：1

Every M.R. 《IEEE transactions on audio, speech, and language processing》2008,16(2):267-277

相似文献

19.

A method to compensate the influence of speech codec in speaker recognition

José R. Calvo de Lara Flavio J. Reyes Diaz Gabriel Hernández Sierra Orlando Jimenez Alcazar 《International Journal of Speech Technology》2018,21(4):975-985

The recognition of a person by his voice or “speaker recognition”, is a biometric specialty increasingly used in electronic commerce and electronic banking transactions and forensic investigations, among others. Speaker recognition is supported by the discriminative information contained in the speech of a person and its main challenge is the variability that exists between different speech samples of the same person, used for training and evaluation, or “session variability”. When a speech communication is transmitted over the internet, for example, the coding–decoding process “codec” of the speech causes loss of such information and affects the effectiveness of the speaker recognition. Some methods have been proposed to mitigate this effect. This work makes a study of the degree of affectation of this information for some commonly used codec types and proposes our own solution, to compensate the session variability provoked by the codec. The influence of some types of codec in the quality of the sample was evaluated first with a set of synthesized speech samples. Later, experiments were carried out with speech samples of international competitions, retransmitted over two different codecs, and the effect on the speaker recognition effectiveness was checked. Finally, the variability compensation was applied, with an improvement of the recognition effectiveness, measured by the equal error rate, of 20.8% for the g.722 codec and 27.8% for the gsm 6.20 codec. 相似文献

20.

An audio-visual corpus for multimodal automatic speech recognition

Andrzej Czyzewski Bozena Kostek Piotr Bratoszewski Jozef Kotus Marcin Szykulski 《Journal of Intelligent Information Systems》2017,49(2):167-192

相似文献