期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于三音素动态贝叶斯网络模型的大词汇量连续语音识别 总被引：1，自引：0，他引：1

吕国云赵荣椿张艳宁樊养余 Sahli Hichem 《数据采集与处理》2009,24(1)

考虑连续语音中的协同发音现象,基于词-音素结构的DBN(WP-DBN)模型和词-音素-状态结构的DBN(WPS-DBN)模型,引入上下文相关的三音素单元,提出两个新颖的单流DBN模型:基于词-三音素结构的DBN(WT-DBN)模型和基于词-三音素-状态的DBN(WTS-DBN)模型.WTS-DBN模型是三音素模型,识别基元为三音素,以显式的方式模拟了基于三音素状态捆绑的隐马尔可夫模型(HMM).大词汇量语音识别实验结果表明:在纯净语音环境下,WTS-DBN模型的识别率比HMM,WT-DBN,WP-DBN和WPS-DBN模型的识别率分别提高了20.53%,40.77%,42.72%和7.52%. 相似文献

2.

基于HTK的维吾尔语连续音素识别技术研究

米日古力·阿布都热素米吉提·阿不力米提艾克白尔·帕塔尔艾斯卡尔·艾木都拉《计算机工程与应用》2013,(22):150-154,172

以建立维吾尔语连续音素识别基础平台为目标,在HTK（基于隐马尔可夫模型的工具箱）的基础上,首次研究了其语言相关环节的几项关键技术;结合维吾尔语的语言特征,完成了用于语言模型建立和语音语料库建设的维吾尔语基础文本设计;根据具体技术指标,录制了较大规模语音语料库;确定音素作为基元,训练了维吾尔语声学模型;在基于字母的N-gram语言模型下,得出了从语音句子向字母序列句子的识别结果;统计了维吾尔语32个音素的识别率,给出了容易混淆的音素及其根源分析,为进一步提高识别率奠定了基础。相似文献

3.

基于动态贝叶斯网络的音视频连续语音识别和音素切分

吕国云蒋冬梅蒋晓悦赵荣椿侯云舒孙阿利 H. Sahli W. Verhelst 《计算机应用》2007,27(7):1670-1673

构造了两个单流单音素的动态贝叶斯网络（DBN）模型，以实现基于音频和视频特征的连续语音识别，并在描述词和对应音素具体关系的基础上，实现对音素的时间切分。实验结果表明，在基于音频特征的识别率方面：在低信噪比（0~15dB）时，DBN模型的识别率比HMM模型平均高12.79%；而纯净语音下，基于DBN模型的音素时间切分结果和三音素HMM模型的切分结果很接近。对基于视频特征的语音识别，DBN模型的识别率比HMM识别率高2.47%。实验最后还分析了音视频数据音素时间切分的异步关系，为基于多流DBN模型的音视频连续语音识别和确定音频和视频的异步关系奠定了基础。相似文献

4.

自然语料缺乏的民族语言连续语音识别

下载免费PDF全文

武晓敏达瓦·伊德木草吾守尔·斯拉木《计算机工程》2012,38(12):129-131

以维吾尔语为例研究自然语料缺乏的民族语言连续语音识别方法。采用HTK通过人工标注的少量语料生成种子模型,引导大语音数据构建声学模型,利用palmkit工具生成统计语言模型,以Julius工具实现连续语音识别。实验用64个维语母语者自由发话的6 400个短句语音建立单音素声学模型,由100 MB文本、6万词词典生成基于词类的3-gram语言模型,测试结果表明,该方法的识别率为 72.5%,比单用HTK提高4.2个百分点。相似文献

5.

基于HTK的语音识别网络优化算法

下载免费PDF全文

杨善茜黄汉明蒋正锋李锐《计算机工程》2010,36(14):169-171

隐马尔可夫模型工具包(HTK)的HParse命令根据用户以正则表达式形式定义的任务语法来生成HTK可用的底层表示的语音识别网络,但不是每个语句都能用正则表达式表示出来。针对该问题,提出基于HTK的语音识别网络算法用于识别网络的优化问题,给出该算法的具体实现过程。实验结果表明,在保证识别率的前提下,优化后的语音识别网络在语音识别系统中所用的时间比较短,算法是有效的。相似文献

6.

衡阳方言孤立词识别研究

李荣华赵征鹏《计算机系统应用》2017,26(5):247-252

目前,汉语识别已经取得了一定的研究成果.但由于中国的地域性差异,十里不同音,使得汉语识别系统在进行方言识别时识别率低、性能差.针对语音识别系统在对方言进行识别时的缺陷,构建了基于HTK的衡阳方言孤立词识别系统.该系统使用HTK3.4.1工具箱,以音素为基本识别单元,提取39维梅尔频率倒谱系数（MFCC）语音特征参数,构建隐马尔可夫模型（HMM）,采用Viterbi算法进行模型训练和匹配,实现了衡阳方言孤立词语音识别.通过对比实验,比较了在不同因素模型下和不同高斯混合数下系统的性能.实验结果表明,将39维MFCC和5个高斯混合数与HMM模型结合实验时,系统的性能得到很大的改善. 相似文献

7.

结合发音特征的动态贝叶斯网络语音识别模型

下载免费PDF全文

王风娜蒋冬梅宋培岩《计算机工程与应用》2009,45(8):178-181

构建了一种新的基于动态贝叶斯网络(Dynamic Bayesian Network,DBN)的异步整词-发音特征语音识别模型AWA-DBN(每个词由其发音特征的运动来描述),定义了各发音特征节点及异步检查节点的条件概率分布。在标准数字语音库Aurora5.0上的语音识别实验表明,与整词-状态DBN(WS-DBN,每个词由固定个数的整词状态构成)和整词-音素DBN(WP-DBN,每个词由其对应的音素序列构成)模型相比,WS-DBN模型虽然具有最高的识别率,但其只适用于小词汇量孤立词语音识别,AWA-DBN和WP-DBN可以为大词汇量连续语音建模,而AWA-DBN模型比WP-DBN模型具有更高的语音识别率和系统鲁棒性。相似文献

8.

基于HTK的维吾尔语连续数字语音识别 总被引：2，自引：0，他引：2

蔡琴吾守尔·斯拉木 CAI Qin WUSHOUR·silamu 《现代计算机》2007,(4)

根据HTK工作原理,设计了维吾尔语连续数字识别的训练步骤,进行嵌入式重估训练,建立了基于音素级单位的HMM模型,建立了语言模型,实现了维吾尔语非特定人小词汇量亿以内的数字语音识别. 相似文献

9.

基于HTK的汉语语音售票系统的设计与实现 总被引：1，自引：0，他引：1

饶耀全吴小培吕钊《工业控制计算机》2010,23(10):58-61

为了解决火车站售票系统处理售票业务效率低、速度慢和低智能化等问题,提出了一种基于HTK（HMM Tool Kit,隐马尔科夫模型工具箱）的汉语语音售票系统。详细阐述了基于HTK的语音识别等关键技术的基本原理,并给出了系统实现的关键代码。在语音识别测试中,语句级识别正确率为98.00%;字词级识别正确率达到了98.67%。实验结果表明所提出的语音售票系统具有较高的可行性与实用性。相似文献

10.

基于动态贝叶斯网络的语音识别及音素切分研究* 总被引：1，自引：1，他引：0

孙阿利蒋冬梅吕国云 Hichem Sahli Werner Verhelst 《计算机应用研究》2007,24(10):104-106

研究了一种基于动态贝叶斯网络(dynamic bayesian networks, DBN)的语音识别建模方法,利用GMTK(graphical model tool kits)工具构建音素级音频流DBN语音训练和识别模型,同时与传统的基于隐马尔可夫的语音识别结果进行比较,并给出词与音素的切分结果.实验表明,在各种信噪比测试条件下,基于DBN的语音识别结果与基于HMM的语音识别结果相当,并表现出一定的抗噪性,音素的切分结果也比较准确. 相似文献

11.

综合语音平台系统的研究开发与应用

齐忠吴春英王静曾义《微计算机信息》2006,22(18):244-245

针对语音交换机昂贵的资源和硬件冲突等问题,研究开发了一个综合语音平台,该系统给需要共同使用同一语音交换机上的多种不同语音业务提供一个综合软件平台,实现对硬件交换机逻辑上的封装,使得不同的语音业务能各自独占一个虚拟的交换机,达到结构上的网络分布和资源共享的目的。相似文献

12.

基于VoiceXML的语音增值业务平台的架构设计

下载免费PDF全文

王文林廖建新王纯朱晓民《计算机工程》2007,33(12):256-258

目前语音增值业务发展迅速，但还缺乏统一的规范。该文设计了一个基于VoiceXML(Voice Extensible Markup Language)的语音增值业务平台的架构，规范了语音增值业务的开发、管理界面，讨论了语音增值业务执行平台和语音增值业务管理平台的功能和结构，着重分析了在此架构上实现业务的几个关键流程，并将此架构和目前IVR(Interactive Voice Response)平台进行了比较。相似文献

13.

Design and implementation of QoS-provisioning system for voice over IP 总被引：1，自引：0，他引：1

Shenquan Wang Mai Z. Dong Xuan Wei Zhao 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(3):276-288

In this paper, we address issues in implementing voice over IP (VoIP) services in packet switching networks. VoIP has been identified as a critical real-time application in the network QoS research community and has been implemented in commercial products. To provide competent quality of service for VoIP systems comparable to traditional PSTN systems, a call admission control (CAC) mechanism has to be introduced to prevent packet loss and over-queuing. Several well-designed CAC mechanisms, such as the site-utilization-based CAC-and the link-utilization-based CAC mechanisms have been in place. However, the existing commercial VoIP systems have not been able to adequately apply and support these CAC mechanisms and, hence, have been unable to provide QoS guarantees to voice over IP networks. We have designed and implemented a QoS-provisioning system that can be seamlessly integrated with the existing VoIP systems to overcome their weakness in offering QoS guarantees. A practical implementation of our QoS-provisioning system has been realized. 相似文献

14.

Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis

Chung-Hsien Wu Chi-Chun Hsia Te-Hsien Liu Jhing-Fa Wang 《IEEE transactions on audio, speech, and language processing》2006,14(4):1109-1116

This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis. 相似文献

15.

音乐快递平台中IVR子系统的设计与实现

戴蕊罗红《微机发展》2008,(1)

语音IVR业务融合了语音和短信的互动参与形式,实现了交互式灵活计费,为运营商及SP带来了可观的收入和利润,是现在最常见的语音增值业务的方式之一。着重介绍作为音乐快递平台用户语音接口的IVR业务子系统,该系统可以完成IP网络与电话网的接入,根据用户按键选择完成相应的业务功能,并具有结构灵活、易于业务扩展、高运行稳定性和易于维护的特点。介绍了该系统的网络结构、功能、设备实现方法和软件实现,同时说明了该系统在整个音乐下载平台中的位置及功用。相似文献

16.

Adaptive low-latency peer-to-peer streaming and its application

Leslie S. Liu Roger Zimmermann 《Multimedia Systems》2006,11(6):497-512

Peer-to-peer (P2P) streaming is emerging as a viable communications paradigm. Recent research has focused on building efficient and optimal overlay multicast trees at the application level. A few commercial products are being implemented to provide voice services through P2P streaming platforms. However, even though many P2P protocols from the research community claim to be able to support large scale low-latency streaming, none of them have been adopted by a commercial voice system so far. This gap between advanced research prototypes and commercial implementations shows that there is a lack of a practical and scalable P2P system design that can provide low-latency service in a real implementation. After analyzing existing P2P system designs, we found two important issues that could lead to improvements. First, many existing designs that aim to build a low-latency streaming platform often make the unreasonable assumption that the processing time involved at each node is zero. However in a real implementation, these delays can add up to a significant amount of time after just a few overlay hops and make interactive applications difficult. Second, scant attention has been paid to the fact that even in a conversation involving a large number of users, only a few of the users are actually actively speaking at a given time. We term these users, who have more critical demands for low-latency, active users. In this paper, we detail the design of a novel peer-to-peer streaming architecture called ACTIVE. We then present a complete commercial scale voice chat system called AudioPeer that is powered by the ACTIVE protocol. The ACTIVE system significantly reduces the end-to-end delay experienced among active users while at the same time being capable of providing streaming services to very large multicast groups. ACTIVE uses realistic processing assumptions at each node and dynamically optimizes the streaming structure while the group of active users changes over time. Consequently, it provides virtually all users with the low-latency service that before was only possible with a centralized approach. We present results from both simulations and our real implementation, which clearly show that our ACTIVE system is a feasible approach to scalable, low-latency P2P streaming. 相似文献

17.

The DYNAMOS approach to support context-aware service provisioning in mobile environments 总被引：1，自引：0，他引：1

Oriana Santtu 《Journal of Systems and Software》2007,80(12):1956-1972

To efficiently make use of information and services available in ubiquitous environments, mobile users need novel means for locating relevant content, where relevance has a user-specific definition. In the DYNAMOS project, we have investigated a hybrid approach that enhances context-aware service provisioning with peer-to-peer social functionalities. We have designed and implemented a system platform and application prototype running on smart phones to support this novel conception of service provisioning. To assess the feasibility of our approach in a real-world scenario, we conducted field trials in which the research subject was a community of recreational boaters. 相似文献

18.

A cloud robotics approach towards dialogue-oriented robot speech

Komei Sugiura Yoshinori Shiga Hisashi Kawai Teruhisa Misu Chiori Hori 《Advanced Robotics》2015,29(7):449-456

相似文献

19.

基于VoiceXML的语音交互平台的设计与实现

庾锡昌刘伟平武晋黄红斌《计算机工程与设计》2007,28(8):1969-1972

设计并实现了一种基于VoiceXML(voice extensible markup language)的呼叫中心语音交互平台.该平台以OpenVXI开源项目的VoiceXML解析器为核心,以C/S三层结构框架来设计,在集成了杭州三汇语音板卡的基础上实现了语音合成、语音识别、呼叫处理等呼叫中心的基本功能.利用该平台,电信运营商不仅能方便地建立呼叫中心,而且可以联合SP/CP(service proyider/content provider)推出各种语音增值业务.介绍了该平台的总体架构设计,并详细讲述关键技术的实现,最后给出了系统测试实例及其运行结果. 相似文献

20.

基于策略的可控服务发现与动态路由模型

初宁钟晨《计算机工程与设计》2012,33(5):1857-1862

如何改进现有服务发现模型使之适应动态可变的服务运行环境并选择最符合用户需求的Web服务正在引起研究领域关注.提出了一种基于策略的可控服务发现与动态路由模型(P-WSDRM).该模型支持抽象服务、服务实例和服务发现者的属性定义,支持携带属性描述信息的服务发布与发现,引入了策略判定机制,支持服务发现者基于已定义的策略进行服务发现和实例路由.目前已经于Linux平台和目录服务实现了该模型的一个原型系统. 相似文献