首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the future, the world of telecommunications will be vastly different than it is today. The driving force will be the seamless integration of real time communications (e.g. voice, video, music, etc.) and data into a single network, with ubiquitous access to that network anywhere, anytime, and by a wide range of devices. The only currently available ubiquitous access device to the network is the telephone, and the only ubiquitous user access technology mode is spoken voice commands and natural language dialogues with machines. In the future, new access devices and modes will augment speech in this role, but are unlikely to supplant the telephone and access by speech anytime soon. Speech technologies have progressed to the point where they are now viable for a broad range of communications services, including: compression of speech for use over wired and wireless networks; speech synthesis, recognition, and understanding for dialogue access to information, people, and messaging; and speaker verification for secure access to information and services. The paper provides brief overviews of these technologies, discusses some of the unique properties of wireless, plain old telephone service, and Internet protocol networks that make voice communication and control problematic, and describes the types of voice services available in the past and today, and those that we foresee becoming available over the next several years  相似文献   

2.
This article has provided a brief overview of how the marriage of speech and language technologies with the Web is changing the way people communicate and access information. Advances are happening at a rapid pace through improved algorithms, wider penetration of the Web, availability of data, and faster computing. As the Web continues to evolve, research initiatives need to continue to address difficult challenges in areas of information mining of multimedia data, multimodal search of Web and media contents, speaker recognition for reducing Internet fraud, interactive question/answering for Web-based self-service, Web mining for knowledge discovery, two-way language translation, and Web page personalization and ranking. Progress in these areas will help to generate exciting new business opportunities for mobile Internet, secure voice print, ubiquitous multilingual communication, IPTV, globalization of customer care, and information search of massive amounts of data.  相似文献   

3.
This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models (GMM) models by changing the time-scale of speech. The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum (STRAIGHT) model is adopted to extract the spectrum features, and the GMM models are trained to generate the conversion function. The spectrum features of a source speech will be converted by the conversion function. The time-scale of speech is changed by extracting the converted features and adding to the spectrum. The conversion voice was evaluated by subjective and objective measurements. The results confirm that the transformed speech not only approximates the characteristics of the target speaker, but also more natural and more intelligible.  相似文献   

4.
采用动态时间归正算法(DTW)和支持向量机(SVM)相结合产生一个新的基于径向基函数的DTW核函数实现语音识别,该方法在小词汇量及孤立词识别方面相对传统的隐马尔可夫模型有较大优势。为了满足语音识别系统对实时性和便携性的要求,提出了基于DTW/SVM的混合方法在TMS320C6711DSP芯片中实现的应用研究;给出了语音识别系统的原理框图,其中采用Mel倒谱系数为语音特征参数,应用了可变窗长端点检测技术;阐述了DSP设计中系统的软硬件设计方案及具体的接口电路,该系统使得语音识别更为快速便捷,并且具有一定的通用性。  相似文献   

5.
王大巍 《电子技术》2010,47(7):21-22
语音口令识别是语音信息处理的一个重要研究方向,本文给出一种基于嵌入式系统的语音口令识别系统的设计方案,硬件系统的核心芯片是嵌入式微处理器,语音口令识别算法采用连续隐马尔克夫模型。实验结果表明,将语音识别系统与嵌入式系统相结合,可以使语音口令识别系统广泛应用于便携式设备中。  相似文献   

6.
7.
语音识别可实现人机交互和语音控制,在工业控制、消费电子等领域都有广泛应用。结合人发音的生理结构的特点,使用LPMCC(LPC倒谱美尔变换)作为特征向量,采用动态规划算法作为核心识别算法,在TMS320VC5402芯片上实现了特定人、孤立词的高性能实时识别系统。  相似文献   

8.
Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods.  相似文献   

9.
Web caching has been widely used to alleviate Internet traffic congestion in World Wide Web (WWW) services. To reduce download throughput, an effective strategy on web cache management is needed to exploit web usage information in order to make a decision on evicting the document stored in case of cache saturation. This paper presents a so-called Learning Based Replacement algorithm (LBR), a hybrid approach towards an efficient replacement model for web caching by incorporating a machine learning technique (naive Bayes) into the LRU replacement method to improve prediction of possibility that an existing page will be revised by a succeeding request, from access history in a web log. The learned knowledge includes information on which URL objects in cache should be kept or evicted. The learning-based model is acquired to represent the hidden aspect of user request pattern for predicting the re-reference possibility. By a number of experiments, the LBR gains potential improvement of prediction on revisit probability, hit rate and byte hit rate overtraditional methods; LRU, LFU, and GDSF, respectively.  相似文献   

10.
随着Internet/Intranet的快速发展和普及,丰富的Web资源构成一个巨大的全球信息仓库。在海量数据空间中快速、准确地获取用户所需成为Web检索系统研究的焦点。将一种全新的网页自动分类技术引入WWW信息抽取领域,解决网上信息有效获取的问题。获取网站分类体系,设计的Web信息自动归类算法,可通过Web数据抽取机制以及Web信息分类技术实现检索结果的分类和层次化展示,使用户快捷准确地从WWW上获取所需信息。  相似文献   

11.
邮包校核语音识别系统的实时实现   总被引:6,自引:0,他引:6       下载免费PDF全文
本文研究开发了一套邮包信息校核语音识别系统.该系统利用中大词汇量非特定人连续语音识别技术实时实现了邮包信息的语音校核.系统可以识别普通话或四川话语音,可识别的词汇量约为4500条.系统还采用了拒识技术与说话人自适应技术,提高了整个系统的稳健性.实验表明对普通话的首选识别率达到98.7%,前三选识别率达到99.9%.对四川话的首选识别率达到95.9%,前三选识别率达到98.6%,对无关语音的正确拒识率达到85%,对口音较重的说话人经过自适应后识别率可提高5-8个百分点.  相似文献   

12.
Telephone services based on stored voice have existed for many years in the guise of the Speaking Clock and other recorded information services. Digital speech storage, computer control and customer signalling, based on automatic speech recognition or multi-frequency keypads, make possible better voice-based information and message services. New voice services equipment that implements some of these facilities has just entered service.  相似文献   

13.
简要分析连续语音识别技术原理,介绍了语音识别网格构建海量多媒体新闻素材检索系统,该技术显著提升了多媒体新闻制播体系的素材资产化水平,为视音频媒体的多媒体内容资源检索带来了革命性变化。以中国国际广播电台(China Radio International,CRI)为例,描述了语音识别网格技术所带来的实际应用效果。  相似文献   

14.
In this paper, a novel subspace projection approach is proposed for analysis of speech signal under stressed condition. The subspace projection method is based on the assumption of orthogonality between speech subspace and stress subspace. Speech and stress subspaces contain speech and stress information, respectively. The projection of stressed speech vectors onto the speech subspace will separate speech-specific information. In this work, the speech subspace consists of neutral speech vectors. Speech and stress recognition techniques are used to verify the orthogonal relation between speech and stress subspaces. The evaluation database consists of 119 word vocabulary under neutral, angry, sad and Lombard conditions. Hidden Markov models for speech and stress recognition are used with mel-frequency cepstral coefficient features for evaluation of estimated speech and stress information.  相似文献   

15.
步兵战车强噪声背景下由于强背景噪声的存在,既影响了口令识别的正确率,又降低了指挥所后台监听的清晰度,为了提高语音质量,本文对口令数据进行增强处理.为此,本文提出了一种基于升降编解码全卷积神经网络(Increase De-crease Encoder Decode Convolution Neural Network,I...  相似文献   

16.
李洪伟  马琳  李海峰 《信号处理》2023,39(4):639-648
语音是人类表达思想和感情交流最重要的工具,是人类文化的重要组成部分。语音情感识别作为情感计算中的重要课题已经成为国际上的研究热点,受到越来越多的关注。已有神经科学研究表明,大脑是产生调节情感的物质基础。因此,在语音情感的研究中,我们不能仅考虑语音信号自身,还应将大脑的活动信号融入语音情感识别中,以实现更高准确率的情感识别。基于上述思想,本文提出了一种基于核典型相关分析(KCCA)的语音特征提取方法。该方法将语音特征与脑电图(EEG)特征映射到高维希尔伯特空间,并计算二者的最大相关系数。KCCA将语音特征在高维希尔伯特空间上向与脑电特征相关性最大的方向投影,最终得到包含脑电信息的语音特征。本文方法将与语音情感相关的脑电信息融入语音情感特征提取中,所提特征能够更准确的表征情感。同时,本方法在理论上具有良好的可迁移性,当所提脑电特征足够准确与具有代表性时,KCCA建模得到的投影向量具有通用性,可直接用于新的语音情感数据集中而无需重新采集和计算相应的脑电信号。在自建语音情感数据库与公开语音情感数据库MSP-IMPROV上的实验结果表明,使用投影语音特征进行语音情感分类的方法优于使用原始音频特征...  相似文献   

17.
This paper presents the results of a study into how structured dialogue techniques in IVR systems can be extended to allow more natural speech interaction with the caller. A spoken language system was produced which allows callers to set reminder calls or bar outgoing or incoming telephone calls. In spite of the limits of speech recognition performance, the resulting system had a highly natural speech interface which allowed the caller to optionally offer one or more pieces of information at a time.  相似文献   

18.
As the use of voice response systems employing synthetic speech becomes more widespread in consumer products, industrial and military applications, and aids for the handicapped, it will be necessary to develop reliable methods of comparing different synthesis systems and of assessing how human observers perceive and respond to the speech generated by these systems. The selection of a specific voice response system for a particular application depends on a wide variety of factors only one of which is the inherent intelligibility of the speech generated by the synthesis routines. In this paper, we describe the results of several studies that applied measures of phoneme intelligibility, word recognition, and comprehension to assess the perception of synthetic speech. Several techniques were used to compare performance of different synthesis systems with natural speech and to learn more about how humans perceive synthetic speech generated by rule. Our findings suggest that the perception of synthetic speech depends on an interaction of several factors including the acoustic-phonetic properties of the speech signal, the requirements of the perceptual task, and the previous experience of the listener. Differences in perception between natural speech and high-quality synthetic speech appear to be related to the redundancy of the acoustic-phonetic information encoded in the speech signal.  相似文献   

19.
基于WLAN接入的DHCP+Web认证关键技术分析   总被引:2,自引:0,他引:2  
高波  张正风  徐旭 《电信科学》2008,24(5):32-36
DHCP Web认证是国内外WLAN公众用户通用的接入认证技术,本文详细分析了DHCP Web认证技术及流程,重点讨论3种IP地址分配实现方式、用户在线检测机制、个性化页面推送技术及跨域漫游等技术.  相似文献   

20.
在利用深度学习方式进行语音分离的领域,常用卷积神经网络(RNN)循环神经网络进行语音分离,但是该网络模型在分离过程中存在梯度下降问题,分离结果不理想。针对该问题,该文利用长短时记忆网络(LSTM)进行信号分离探索,弥补了RNN网络的不足。多路人声信号分离较为复杂,现阶段所使用的分离方式多是基于频谱映射方式,没有有效利用语音信号空间信息。针对此问题,该文结合波束形成算法和LSTM网络提出了一种波束形成LSTM算法,在TIMIT语音库中随机选取3个说话人的声音文件,利用超指向波束形成算法得到3个不同方向上的波束,提取每一波束中频谱幅度特征,并构建神经网络预测掩蔽值,得到待分离语音信号频谱并重构时域信号,进而实现语音分离。该算法充分利用了语音信号空间特征和信号频域特征。通过实验验证了不同方向语音分离效果,在60°方向该算法与IBM-LSTM网络相比,客观语音质量评估(PESQ)提高了0.59,短时客观可懂(STOI)指标提高了0.06,信噪比(SNR)提高了1.13 dB,另外两个方向上,实验结果同样证明了该算法较IBM-LSTM算法和RNN算法具有更好的分离性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号