期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis

N.P. Narendra K. Sreenivasa Rao 《Applied Soft Computing》2013,13(2):773-781

This paper proposes a method for tuning the weights of unit selection cost functions in syllable based text-to-speech (TTS) synthesis system. In this work, unit selection cost functions, namely target cost and concatenation cost, are designed appropriate to syllables. The method tunes the weights in such a way that perceptual preference patterns are appropriately considered while selecting the units. The method uses genetic algorithm to derive the optimal weights. Fitness function is designed to map perceptual preference patterns into weights of unit selection cost functions. The effectiveness of proposed method is evaluated by both subjective and objective measures. From the results, it is observed that the derived optimal weights can synthesize good quality speech compared to manually tuned weights. 相似文献

2.

多类型语音特征进化选择算法

下载免费PDF全文

张小恒谢文宾李勇明《计算机工程与应用》2016,52(14):150-155

基于特征选择的语音特征获取用于说话人识别是目前较为有效的方式。但是,最优语音特征随着具体应用环境的变化而不同。因此,提出了基于四类型语音特征封装式遗传特征选择算法（FSF-WrGAF）,该算法提取了四种类型的语音特征参数,通过链式智能体遗传算法和GMM-UBM进行封装式动态特征选择,获取高精度的识别准确率。采用了多种指标完成该算法的性能测试。实验结果表明,该算法具体实现过程简便,改进效果明显,较同类算法在多项指标（识别率,EER,DET曲线）上都有显著提高。相似文献

3.

基于并行遗传最大最小蚁群算法的分布式数据库查询优化

林基明班文娇王俊义童记超《计算机应用》2016,36(3):675-680

针对分布式数据库中关系及其分片多副本、多站点存储的特性会增加查询搜索空间及时间复杂度,从而降低查询执行计划(QEP)搜索效率的问题,提出一种基于分片分配选择器(FSS)设计准则的并行遗传-最大最小蚁群算法(PGA-MMAS)。首先,结合实际的企业分布式信息管理系统设计FSS,启发式选择较优关系副本,以减少查询连接代价并缩小PGA-MMAS的搜索空间;然后结合遗传算法(GA)收敛较快的优势,对最终连接关系进行编码和并行遗传操作,得到一组相对较优的QEP,并将其转化为并行最大最小蚁群算法(MMAS)的初始信息素分布,从而使其更快速地搜索到全局最优QEP;最后分别在不同关系数情况下对算法进行仿真实验,结果表明,基于FSS的PGA-MMAS搜索最优QEP的效率高于原GA以及基于FFS的GA、MMAS和GA-MMAS;经实际工程应用验证,所提算法搜索出的高质量QEP可以提高分布式数据库多关系查询效率。相似文献

4.

Syllable based text to speech synthesis system using auto associative neural network prosody prediction 总被引：1，自引：0，他引：1

Sudhakar Sangeetha Sekar Jothilakshmi 《International Journal of Speech Technology》2014,17(2):91-98

This paper presents the design and development of an Auto Associative Neural Network (AANN) based unrestricted prosodic information synthesizer. Unrestricted Text To Speech System (TTS) is capable of synthesize different domain speech with improved quality. This paper deals with a corpus-driven text-to speech system based on the concatenative synthesis approach. Concatenative speech synthesis involves the concatenation of the basic units to synthesize an intelligent, natural sounding speech. A corpus-based method (unit selection) uses a large inventory to select the units and concatenate. The prosody prediction is done with the help of five layer auto associative neural network which helps us to improve the quality of speech synthesis. Here syllables are used as basic unit of speech synthesis database. The database consisting of the units along with their annotated information is called annotated speech corpus. A clustering technique is used in annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit. Discontinuities present at the unit boundaries are lowered by using the mel-LPC smoothing technique. The experiment has been made for the Dravidian language Tamil and the results reveal to demonstrate the improved intelligibility and naturalness of the proposed method. The proposed system is applicable to all the languages if the syllabification rules has been changed. 相似文献

5.

A New Korean Corpus-Based Text-to-Speech System

Sanghun Kim Youngjik Lee Keikichi Hirose 《International Journal of Speech Technology》2002,5(2):105-116

This paper describes a new Korean Text-to-Speech (TTS) system based on a large speech corpus. Conventional concatenative TTS systems still produce machine-like synthetic speech. The poor naturalness is caused by excessive prosodic modification using a small speech database. To cope with this problem, we utilized a dynamic unit selection method based on a large speech database without prosodic modification. The proposed TTS system adopts triphones as synthesis units. We designed a new sentence set maximizing phonetic or prosodic coverage of Korean triphones. All the utterances were segmented automatically into phonemes using a speech recognizer. With the segmented phonemes, we achieved a synthesis unit cost of zero if two synthesis units were placed consecutively in an utterance. This reduces the number of concatenating points that may occur due to concatenating mismatches. In this paper, we present data concerning the realization of major prosodic variations through a consideration of prosodic phrase break strength. The phrase break was divided into four kinds of strength based on pause length. Using phrase break strength, triphones were further classified to reflect major prosodic variations. To predict phrase break strength on texts, we adopted an HMM-like Part-of-Speech (POS) sequence model. The performance of the model showed 73.5% accuracy for 4-level break strength prediction. For unit selection, a Viterbi beam search was performed to find the most appropriate triphone sequence, which has the minimum continuation cost of prosody and spectrum at concatenating boundaries. From the informal listening test, we found that the proposed Korean corpus-based TTS system showed better naturalness than the conventional demisyllable-based one. 相似文献

6.

Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis

Vepa J. King S. 《IEEE transactions on audio, speech, and language processing》2006,14(5):1763-1771

In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method. 相似文献

7.

融合自动检错的单元挑选语音合成方法

孙晓辉凌震华戴礼荣《数据采集与处理》2016,31(2):385-392

提出了一种融合自动检错的单元挑选语音合成方法。本文方法旨在设计与主观听感更加一致的单元挑选准则,以提高合成语音的自然度。首先利用众包网络平台快速大量地收集测听人对于合成语音的主观评价数据,取代了传统的利用具备语言学知识的专家收集主观评价数据的方法;然后基于这些主观评价数据,提取对应语音的音节时长、单元代价以及声学参数距离等特征,构建基于支持向量机的合成错误检测器;在合成阶段,该检测器被用来对传统单元挑选输出的N条路径行重打分,以确定最优的单元挑选序列。倾向性测听结果表明本文方法可以有效地提高合成语音的自然度。相似文献

8.

Variable-Length Unit Selection in TTS Using Structural Syntactic Cost

Wu C.-H. Hsia C.-C. Chen J.-F. Wang J.-F. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1227-1235

This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference between target and candidate units (words or phrases) is estimated by the cosine measure with the inside probability of PCFG acting as a weight. Latent semantic analysis (LSA) is applied to reduce the dimensionality of the syntactic vectors. The dynamic programming algorithm is adopted to obtain a concatenated unit sequence with minimum cost. A syntactic property-rich speech database is designed and collected as the unit inventory. Several experiments with statistical testing are conducted to assess the quality of the synthetic speech as perceived by human subjects. The proposed method outperforms the synthesizer without considering syntactic property. The structural syntax estimates the substitution cost better than the acoustic features alone 相似文献

9.

Heuristic search through islands

《Artificial Intelligence》1986,29(3):339-347

A heuristic search strategy via islands is suggested to significantly decrease the number of nodes expanded. Algorithm I, which searches through a set of island nodes (“island set”), is presented assuming that the island set contains at least one node on an optimal cost path. This algorithm is shown to be admissible and expands no more nodes than A1. For cases where the island set does not contain an optimal cost path (or any path). Algorithm I', a modification of Algorithm I, is suggested. This algorithm ensures a suboptimal cost path (which may be optimal) and in extreme cases falls back to A1. 相似文献

10.

基于统计韵律模型的汉语语音合成系统的研究 总被引：2，自引：4，他引：2

陶建华赵晟蔡莲红《中文信息学报》2002,16(1):2-7

本文论述了采用统计模型进行汉语韵律层级结构分析和韵律建模的思路,在此基础上建立了汉语语音合成系统。其中,本文还仔细阐述了韵律代价函数的构造,及其参数的自动训练算法。同时,论文还分析了韵律特征间相互作用对音节基元选取的影响,并最终实现了一个连续语流中用于汉语语音合成的音节基元选取模型。测试表明了本文提出的基于统计模型的韵律层级分析和韵律建模思路,能够较好应用于汉语语音合成系统的构造,并使之具有良好的合成语音的自然度。相似文献

11.

结合语义与事务属性的QoS感知的服务优化选择

杨万春张晨曦穆斌《计算机应用》2016,36(8):2207-2212

服务级别协议（SLA）等级感知的服务选择是NP难题。针对服务选择中维度与粒度方面的问题,提出结合语义与事务属性的服务质量（QoS）感知的服务优化选择模型。该模型从语义链接匹配度、QoS与事务三个维度对服务进行优化选择,并设计了支持多粒度的编码策略。针对服务选择中时间复杂度高的问题,提出了克隆选择与遗传算法相结合的混合优化算法。该算法首先采用动态适应度函数,逐代淘汰不满足约束的个体;其次给出了事务属性的优先级,并根据优先级设计了知识启发式的交叉与变异算子,以保证个体满足事务属性要求;最后在遗传算法中对优秀个体进行克隆选择,以增强对最优解的搜索能力。仿真实验中,该算法在服务选择的精确度和成功率方面均优于遗传算法;在时间花费上稍高于遗传算法但远低于穷举法。实验结果表明,所提算法能在较少时间花费的基础上保证服务选择的质量。相似文献

12.

基于决策树CART选择拼接单元的英语语音合成

下载免费PDF全文

裴定瑜柴佩琪曾令平《计算机工程》2006,32(3):223-225

以英语文语转换系统的开发为背景，采用基于大语料库的拼接语音合成方法进行英语语音合成。就英语多音节和无限词汇的特点，选用了3种不同长度的拼接单元：单词，音节，phone。引入了决策树CART（classification and regressiontree）方法对大语料库中的语音单元进行预选，并设计了相应的单元选择算法。实验表明，利用该方法能得到清晰自然的合成效果，并且提高了单元选择的效率。相似文献

13.

A Text-to-Speech Platform for Variable Length Optimal Unit Searching Using Perception Based Cost Functions

Minkyu Lee Daniel P. Lopresti Joseph P. Olive 《International Journal of Speech Technology》2003,6(4):347-356

In concatenative Text-to-Speech, the size of the speech corpus is closely related to synthetic speech quality. In this paper, we describe our work on a new corpus-based Bell Labs' TTS system. This encompasses large acoustic inventories with a rich set of annotations, models and data structures for representing and managing such inventories, and an optimal unit selection algorithm that accommodates a broad range of possible cost criteria. We also propose a new method for setting weights in the cost functions based on a perceptual preference test. Our results show that this approach can successfully predict human preference patterns. Synthetic speech using weights determined in this manner consistently demonstrates smoother transitions and higher voice quality than speech using manually set weights. 相似文献

14.

变邻域遗传算法在车间物流调度中的应用

杨海宴王淑营《计算机系统应用》2021,30(12):288-298

针对含有自动引导小车(Automated Guided Vehicle,AGV)的离散化车间物流调度问题,以最小化物流任务时间惩罚成本和最小化运载小车的总行驶距离为优化目标,构建离散化车间多目标物流调度优化模型,设计一种基于Pareto寻优的多目标混合变邻域搜索遗传算法(VNSGA-II).以遗传算法为基础,通过使用NSGA-II的Pareto分层和拥挤度计算方法评估种群优劣实现多目标优化,为了提高算法的寻优能力,避免算法陷入局部最优,通过添加保优记忆库对精英个体进行保护,并利用变邻域搜索算法在搜索过程中的局部寻优能力,针对本文模型特点,设计6个随机邻域结构,来达到算法求解最优值的目标.并提出了基于关键AGV小车的插入邻域和基于关键物流任务的交换邻域调整策略以进一步降低成本.最后,以某离散车间物流调度为实例,分别使用VNSGA-II、带精英策略的快速非支配排序遗传算法Ⅱ(Nondominated Sorting Genetic AlgorithmⅡ,NSGA-II)和强Pareto进化算法(Strong Pareto Evolutionary Algorithm 2,SPEA2)对问题进行求解,计算结果表明,VNSGA-II能得到更好的Pareto解集,验证了算法的有效性和可行性. 相似文献

15.

Generation bidding strategy in a pool based electricity market using Shuffled Frog Leaping Algorithm

《Applied Soft Computing》2014

In an electricity market generation companies need suitable bidding models to maximize their profits. Therefore, each supplier will bid strategically for choosing the bidding coefficients to counter the competitors bidding strategy. In this paper optimal bidding strategy problem is solved using a novel algorithm based on Shuffled Frog Leaping Algorithm (SFLA). It is memetic meta-heuristic that is designed to seek a global optimal solution by performing a heuristic search. It combines the benefits of the Genetic-based Memetic Algorithm (MA) and the social behavior-based Particle Swarm Optimization (PSO). Due to this it has better precise search which avoids premature convergence and selection of operators. Therefore, the proposed method overcomes the short comings of selection of operators and premature convergence of Genetic Algorithm (GA) and PSO method. Important merit of the proposed SFALA is that faster convergence. The proposed method is numerically verified through computer simulations on IEEE 30-bus system consist of 6 suppliers and practical 75-bus Indian system consist of 15 suppliers. The result shows that SFLA takes less computational time and producing higher profits compared to Fuzzy Adaptive PSO (FAPSO), PSO and GA. 相似文献

16.

语音合成中基于稳定段边界的不定长基元选取

王欣吴志勇蔡莲红《软件学报》2014,25(S2):63-69

语音合成技术是人机言语交互中重要的媒介方式,基元选取算法一直是拼接式语音合成中的研究重点.在传统的语音合成中基于代价函数的拼接合成基元选取算法的基础上,将双音子(diphone)的稳定段边界模型应用到单词和音节中,最后使用3种基元模型的分层不定长选音算法,从语料库中优选出最佳合成基元序列拼接合成最终语音.该算法一方面利用分层统一的不定长选音策略,尽可能地选取具有更好韵律特性和声学连续性的较大基元,从而显著减少拼接点,将有可能发生协同发音或者切分错误的拼接点包含到更大的基元内部;另一方面通过稳定段切分修改传统拼接基元边界类型,充分利用了diphone的稳定段边界良好的拼接特性,从而提高了合成语音的连续性和自然度.评测结果显示,这种方法与传统diphone拼接合成方法相比,其合成效果有显著的提升. 相似文献

17.

Code generation based on formal BURS theory and heuristic search

A. Nymeyer J.-P. Katoen 《Acta Informatica》1997,34(8):597-635

BURS theory provides a powerful mechanism to efficiently generate pattern matches in a given expression tree. BURS, which stands for bottom-up rewrite system, is based on term rewrite systems, to which costs are added. We formalise the underlying theory, and derive an algorithm that computes all pattern matches. This algorithm terminates if the term rewrite system is finite. We couple this algorithm with the well-known search algorithm A that carries out pattern selection. The search algorithm is directed by a cost heuristic that estimates the minimum cost of code that has yet to be generated. The advantage of using a search algorithm is that we need to compute only those costs that may be part of an optimal rewrite sequence (and not the costs of all possible rewrite sequences as in dynamic programming). A system that implements the algorithms presented in this work has been built. Received: 20 November 1995 / 26 June 1996 相似文献

18.

离散萤火虫算法在高速列车运行调整中的应用

下载免费PDF全文

段少楠戴胜华《计算机工程与应用》2018,54(15):209-213

列车运行调整是一类特殊的NP完全问题,由于约束众多,搜索空间庞大,可行解范围狭小,因此难以获得最优解。针对高速列车运行调整问题的特点,以智能算法中有代表性发展优势的萤火虫算法（FA）为基础,根据实际问题提出一种离散的萤火虫算法（DFA）进行求解。为了增加萤火虫群的多样性,避免算法陷入局部最优解,采用了基于变邻域搜索算法的扰动机制。将该算法用于高速列车运行调整问题,经过算例对比分析,基于离散萤火虫算法调整方案的计算结果优于普通启发式算法调整结果。相似文献

19.

Clonal Strategy Algorithm Based on the Immune Memory 总被引：4，自引：0，他引：4

下载免费PDF全文

Ruo-Chen Liu Li-Cheng Jiao and Hai-Feng Du 《计算机科学技术学报》2005,20(5):728-734

Based on the clonal selection theory and immune memory mechanism in the natural immune system, a novel artificial immune system algorithm, Clonal Strategy Algorithm based on the Immune Memory (CSAIM), is proposed in this paper. The algorithm realizes the evolution of antibody population and the evolution of memory unit at the same time, and by using clonal selection operator, the global optimal computation can be combined with the local searching. According to antibody-antibody (Ab-Ab) affinity and antibody-antigen (Ab-Ag) affinity, the algorithm can allot adaptively the scales of memory unit and antibody population. It is proved theoretically that CSAIM is convergent with probability 1. And with the computer simulations of eight benchmark functions and one instance of traveling salesman problem (TSP), it is shown that CSAIM has strong abilities in having high convergence speed, enhancing the diversity of the population and avoiding the premature convergence to some extent. 相似文献

20.

Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm

Patrick A. Naylor Anastasis Kounoudes Jon Gudnason Mike Brookes 《IEEE transactions on audio, speech, and language processing》2007,15(1):34-43

We present the Dynamic Programming Projected Phase-Slope Algorithm (DYPSA) for automatic estimation of glottal closure instants (GCIs) in voiced speech. Accurate estimation of GCIs is an important tool that can be applied to a wide range of speech processing tasks including speech analysis, synthesis and coding. DYPSA is automatic and operates using the speech signal alone without the need for an EGG signal. The algorithm employs the phase-slope function and a novel phase-slope projection technique for estimating GCI candidates from the speech signal. The most likely candidates are then selected using a dynamic programming technique to minimize a cost function that we define. We review and evaluate three existing methods of GCI estimation and compare the new DYPSA algorithm to them. Results are presented for the APLAWD and SAM databases for which 95.7% and 93.1% of GCIs are correctly identified 相似文献