期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Simulated smart phone recordings for audio identification

Shingchern D. You Yu-Chu Lin 《The Journal of supercomputing》2016,72(5):1799-1812

相似文献

2.

Source cell phone verification from speech recordings using sparse representation

《Digital Signal Processing》2017

Source recording device recognition is an important emerging research field in digital media forensics. The literature has mainly focused on the source recording device identification problem, whereas few studies have focused on the source recording device verification problem. Sparse representation based classification methods have shown promise for many applications. This paper proposes a source cell phone verification scheme based on sparse representation. It can be further divided into three schemes which utilize exemplar dictionary, unsupervised learned dictionary and supervised learned dictionary respectively. Specifically, the discriminative dictionary learned by supervised learning algorithm, which considers the representational and discriminative power simultaneously compared to the unsupervised learning algorithm, is utilized to further improve the performances of verification systems based on sparse representation. Gaussian supervectors (GSVs) based on MFCCs, which have shown to be effective in capturing the intrinsic characteristics of recording devices, are utilized for constructing and learning dictionary. SCUTPHONE, which is a corpus of speech recordings from 15 cell phones, is presented. Evaluation experiments are conducted on three corpora of speech recordings from cell phones and demonstrate the effectiveness of the proposed methods for cell phone verification. In addition, the influences of number of target examples in the exemplar dictionary and size of the unsupervised learned dictionary on source cell phone verification performance are also analyzed. 相似文献

3.

Flexible speech translation systems

Schultz T. Black A.W. Vogel S. Woszczyna M. 《IEEE transactions on audio, speech, and language processing》2006,14(2):403-411

Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical concerns such as portability and reconfigurability of speech come into play: system maintenance becomes a key issue and data is never sufficient to cover the changing domains over varying languages. In this paper, we discuss strategies to overcome the limits of today's speech translation systems. In the first part, we describe our layered system architecture that allows for easy component integration, resource sharing across components, comparison of alternative approaches, and the migration toward hybrid desktop/PDA or stand-alone PDA systems. In the second part, we show how flexibility and reconfigurability is implemented by more radically relying on learning approaches and use our English-Thai two-way speech translation system as a concrete example. 相似文献

4.

Interactive translation of conversational speech

Waibel A. 《Computer》1996,29(7):41-48

As communication becomes increasingly automated and transnational, the need for rapid, computer-aided speech translation grows. The Janus-II system uses paraphrasing and interactive error correction to boost performance. Janus-II operates on spontaneous conversational human dialogue in limited domains with vocabularies of 3,000 or more words. Current experiments involve 10,000 to 40,000 word vocabularies. It now accepts English, German, Japanese, Spanish, and Korean input, which it translates into any other of these languages. Beyond translating syntactically well-formed speech or carefully structured human-to-machine speech utterances, Janus-II research has focused on the more difficult task of translating spontaneous conversational speech between humans. This naturally requires a suitable database and task domain 相似文献

5.

Computer-assisted translation using speech recognition

Vidal E. Casacuberta F. Rodriguez L. Civera J. Hinarejos C.D.M. 《IEEE transactions on audio, speech, and language processing》2006,14(3):941-951

Current machine translation systems are far from being perfect. However, such systems can be used in computer-assisted translation to increase the productivity of the (human) translation process. The idea is to use a text-to-text translation system to produce portions of target language text that can be accepted or amended by a human translator using text or speech. These user-validated portions are then used by the text-to-text translation system to produce further, hopefully improved suggestions. There are different alternatives of using speech in a computer-assisted translation system: From pure dictated translation to simple determination of acceptable partial translations by reading parts of the suggestions made by the system. In all the cases, information from the text to be translated can be used to constrain the speech decoding search space. While pure dictation seems to be among the most attractive settings, unfortunately perfect speech decoding does not seem possible with the current speech processing technology and human error-correcting would still be required. Therefore, approaches that allow for higher speech recognition accuracy by using increasingly constrained models in the speech recognition process are explored here. All these approaches are presented under the statistical framework. Empirical results support the potential usefulness of using speech within the computer-assisted translation paradigm. 相似文献

6.

Cross-covariance-based features for speech classification in film audio

《Journal of Visual Languages and Computing》2015

As multimedia becomes the dominant form of entertainment through an ever increasing range of digital formats, there has been a growing interest in obtaining information from entertainment media. Speech is one of the core resources in multimedia, providing a foundation for the extraction of semantic information. Thus, detecting speech is a critical first step for speech-based information retrieval systems. This work focuses on speech detection in one of the dominant forms of entertainment media: feature films. A novel approach for voice activity detection (VAD) in film audio is proposed. The approach uses correlation to analyze associations of Mel Frequency Cepstral Coefficient (MFCC) pairs in speech and non-speech data. This information then drives feature selection for the creation of MFCC cross-covariance feature vectors (MFCC-CCs) which are used to train a random forest classifier to solve a binary speech/non-speech classification problem on audio data from entertainment media. The classifier performance is evaluated on a number of test sets and achieves a classification accuracy of up to 94%. The approach is also compared with state of the art and contemporary VAD algorithms, and demonstrates competitive results. 相似文献

7.

Embedded coding using a mixed speech and audio coding paradigm

Sean A. Ramprashad 《International Journal of Speech Technology》1999,2(4):359-372

A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output.The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality.Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s. 相似文献

8.

Comparative study on corpora for speech translation

Kikui G. Yamamoto S. Takezawa T. Sumita E. 《IEEE transactions on audio, speech, and language processing》2006,14(5):1674-1682

This paper investigates issues in preparing corpora for developing speech-to-speech translation (S2ST). It is impractical to create a broad-coverage parallel corpus only from dialog speech. An alternative approach is to have bilingual experts write conversational-style texts in the target domain, with translations. There is, however, a risk of losing fidelity to the actual utterances. This paper focuses on balancing a tradeoff between these two kinds of corpora through the analysis of two newly developed corpora in the travel domain: a bilingual parallel corpus with 420 K utterances and a collection of in-domain dialogs using actual S2ST systems. We found that the first corpus is effective for covering utterances in the second corpus if complimented with a small number of utterances taken from monolingual dialogs. We also found that characteristics of in-domain utterances become closer to those of the first corpus when more restrictive conditions and instructions to speakers are given. These results suggest the possibility of a bootstrap-style of development of corpora and S2ST systems, where an initial S2ST system is developed with parallel texts, and is then gradually improved with in-domain utterances collected by the system as restrictions are relaxed. 相似文献

9.

JMF技术和实时语音通信的实现 总被引：3，自引：0，他引：3

鲍新毅王权海《电脑与信息技术》2003,11(4):9-12,46

文章介绍了流媒体的基本概念,描述了由JMF RTP APIs所提供的对实时媒体流的支持,阐述了怎样通过网络发送和接收流媒体数据。相似文献

10.

Single-channel speech enhancement using implicit Wiener filter for high-quality speech communication

Jaiswal Rahul Kumar Yeduri Sreenivasa Reddy Cenkeramaddi Linga Reddy 《International Journal of Speech Technology》2022,25(3):745-758

International Journal of Speech Technology - Speech enables easy human-to-human communication as well as human-to-machine interaction. However, the quality of speech degrades due to background... 相似文献

11.

基于可见光传输的多路数字音频通信系统

《工矿自动化》2016,(10):79-81

针对现有点对点数字通信系统传输距离近、成本高等问题,采用白光LED设计并实现了基于可见光传输的多路数字音频通信系统。该系统结合音频编解码芯片TP3067实现多路音频信号的复用传输,采用FPGA作为主控制器控制编码信号的时隙。实验结果表明,该系统可实现最高2.048 Mbit/s的传输速率,最多可传输32路音频信号,且传输稳定。相似文献

12.

一种改进的混沌语音保密通信系统

浦晨岚林锦国李为相《微计算机信息》2006,22(23):71-73

本文基于无刷式直流电机的混沌模型,利用状态观测器实现了发送端与接收端的同步,并且用该混沌模型和状态观测器所产生的数字序列实现待传输语音特征参量的加解密操作,识别结果验证了该混沌保密通信系统的有效性,并且传输语音特征参量的方法也提高了该系统的安全性。相似文献

13.

基于非特定人车载音响语音控制系统的设计与实现

孙保群郭恒飞王琼《微型机与应用》2011,30(5)

提出一种语音命令控制车载音响操作的设计方案,以德国Infineon公司新推出的具有DSP和单片机双核的SoC语音处理芯片UniSpeech-SDA80D51为核心组成非特定人车栽音响语音控制系统,并实现了系统样机的研制.该系统在江淮同悦SL1102C1型车载音响上进行了语音控制实测,实测数据表明系统语音识别率可达到95%. 相似文献

14.

Increasing adaptability of a speech into sign language translation system

Verónica López-Ludeña Rubén San-Segundo Carlos González Morcillo Juan Carlos López José M. Pardo Muñoz 《Expert systems with applications》2013,40(4):1312-1322

This paper describes a new version of a speech into sign language translation system with new tools and characteristics for increasing its adaptability to a new task or a new semantic domain. This system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). In order to increase the system adaptability, this paper presents new improvements in all the three main modules for generating automatically the task dependent information from a parallel corpus: automatic generation of Spanish variants when generating the vocabulary and language model for the speech recogniser, an acoustic adaptation module for the speech recogniser, data-oriented language and translation models for the machine translator and a list of signs to design. The avatar animation module includes a new editor for rapidly design of the required signs. These developments have been necessary to reduce the effort when adapting a Spanish into Spanish sign language (LSE: Lengua de Signos Española) translation system to a new domain. The whole translation presents a SER (Sign Error Rate) lower than 10% and a BLEU higher than 90% while the effort for adapting the system to a new domain has been reduced more than 50%. 相似文献

15.

Assessing the importance of audio/video synchronization for simultaneous translation of video sequences

Nicolas Staelens Jonas De Meulenaere Lizzy Bleumers Glenn Van Wallendael Jan De Cock Koen Geeraert Nick Vercammen Wendy Van den Broeck Brecht Vermeulen Rik Van de Walle Piet Demeester 《Multimedia Systems》2012,18(6):445-457

Lip synchronization is considered a key parameter during interactive communication. In the case of video conferencing and television broadcasting, the differential delay between audio and video should remain below certain thresholds, as recommended by several standardization bodies. However, further research has also shown that these thresholds can be relaxed, depending on the targeted application and use case. In this article, we investigate the influence of lip sync on the ability to perform real-time language interpretation during video conferencing. Furthermore, we are also interested in determining proper lip sync visibility thresholds applicable to this use case. Therefore, we conducted a subjective experiment using expert interpreters, which were required to perform a simultaneous translation, and non-experts. Our results show that significant differences are obtained when conducting subjective experiments with expert interpreters. As interpreters are primarily focused on performing the simultaneous translation, lip sync detectability thresholds are higher compared with existing recommended thresholds. As such, primary focus and the targeted application and use case are important factors to be considered when selecting proper lip sync acceptability thresholds. 相似文献

16.

Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition

Prashant Borde Amarsinh Varpe Ramesh Manza Pravin Yannawar 《International Journal of Speech Technology》2015,18(2):167-175

相似文献

17.

Precursors of adolescents’ use of visual and audio devices during online communication

Jochen Peter Patti M. Valkenburg Alexander P. Schouten 《Computers in human behavior》2007

Theories of computer-mediated communication typically rest upon the assumption that communication via computers lacks visual and auditory cues. However, recent technological advances, such as webcams and microphones, as well as their increased use question this assumption. Moreover, the question arises of what characterizes individuals who use such devices. Drawing on a survey of 1060 adolescents, we found that 57% of adolescents at least occasionally used webcams during instant messaging, while 32% at least sometimes used microphones. If adolescents perceived the lack of visual cues in online communication to be important, they used webcams less frequently. For early and middle adolescents, greater levels of social anxiety reduced the use of webcams, whereas higher levels of private self-consciousness increased it. Our results suggest that the nature of computer-mediated communication may change considerably in the next years. Theories of computer-mediated communication need to more strongly integrate these changes into theory building. 相似文献

18.

INTERNET上语音通信压缩算法中静音处理问题研究

胡毅胡咏梅《计算机工程与设计》2001,22(4):93-96

主要介绍了应用于语音压缩及多媒体技术中静音抑制算法,并通过该算法中语音检测算法和噪声发生器算法,以实现降低语音间隙的发送比特率,实现非连续发送。相似文献

19.

High-quality bilingual subtitle document alignments with application to spontaneous speech translation

Andreas Tsiartas Prasanta Ghosh Panayiotis Georgiou Shrikanth Narayanan 《Computer Speech and Language》2013,27(2):572-591

相似文献

20.

Adopting system call based address translation into user-level communication 总被引：1，自引：0，他引：1

Moon-Sang Lee Sang-Kwon Lee Joonwon Lee Seung-Ryoul Maeng 《Computer Architecture Letters》2006,5(1):26-29

User-level communication alleviates the software overhead of the communication subsystem by allowing applications to access the network interface directly. For that purpose, efficient address translation of virtual address to physical address is critical. In this study, we propose a system call based address translation scheme where every translation is done by the kernel instead of a translation cache on a network interface controller as in the previous cache based address translation. According to our experiments, our scheme achieves up to 4.5 % reduction in application execution time compared to the previous cache based approach. 相似文献