共查询到20条相似文献,搜索用时 109 毫秒
1.
We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs. We derive various feature functions measuring the closeness of a given test sentence to the training data and the difficulty of translating the sentence. We describe mono feature functions that are based on statistics of only one side of the parallel training corpora and duo feature functions that incorporate statistics involving both source and target sides of the training data. Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations. We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets. We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used. We show that by just looking at the test source sentences and not using the translation outputs at all, we can achieve better performance than a baseline system using SMT model dependent features that generated the translations. Furthermore, our prediction system is able to achieve the $2$ nd best performance overall according to the official results of the quality estimation task (QET) challenge when also looking at the translation outputs. Our representation and features achieve the top performance in QET among the models using the SVR learning model. 相似文献
2.
Vidal E. Casacuberta F. Rodriguez L. Civera J. Hinarejos C.D.M. 《IEEE transactions on audio, speech, and language processing》2006,14(3):941-951
Current machine translation systems are far from being perfect. However, such systems can be used in computer-assisted translation to increase the productivity of the (human) translation process. The idea is to use a text-to-text translation system to produce portions of target language text that can be accepted or amended by a human translator using text or speech. These user-validated portions are then used by the text-to-text translation system to produce further, hopefully improved suggestions. There are different alternatives of using speech in a computer-assisted translation system: From pure dictated translation to simple determination of acceptable partial translations by reading parts of the suggestions made by the system. In all the cases, information from the text to be translated can be used to constrain the speech decoding search space. While pure dictation seems to be among the most attractive settings, unfortunately perfect speech decoding does not seem possible with the current speech processing technology and human error-correcting would still be required. Therefore, approaches that allow for higher speech recognition accuracy by using increasingly constrained models in the speech recognition process are explored here. All these approaches are presented under the statistical framework. Empirical results support the potential usefulness of using speech within the computer-assisted translation paradigm. 相似文献
3.
Litian Sun Toshihiko Yamasaki Kiyoharu Aizawa 《Multimedia Tools and Applications》2018,77(5):5189-5213
The amount of visual data available on the Web is growing explosively and it is becoming increasingly important to explore methods for automatically estimating the quality of this content in a manner that is consistent with the aesthetic perceptions of humans. The key to this challenging problem is to design an appropriate set of features to extract the aesthetic properties from content. Most previous studies designed a set of aesthetic features based on photographic criteria, which were unavoidably limited to specific examples and they lacked an interpretation based on the mechanism of human aesthetic perception. According to psychological theory, visual complexity is an important property of the stimuli, because it directly influences the viewer’s arousal level, which is believed to be closely related to aesthetic perception. In this study, we propose an alternative set of features for aesthetic estimation based on a visual complexity principle. We extracted the visual complexity properties from an input image in terms of their composition, shape, and distribution. In addition, we demonstrated that the proposed features are consistent with human perception on the complexity in our visual complexity dataset. Next, we employed these features for photo-aesthetic quality estimation using a large-scale dataset. Various experiments were conducted under different conditions and comparisons with state-of-the-art methods shows that the proposed visual complexity feature outperforms photography rule-based features and even better than deep features. 相似文献
4.
Quality estimation of speech is essential for monitoring and maintenance of the quality of service at different nodes of modern telecommunication networks. It is also required in the selection of codecs in speech communication systems. There is no requirement of the original clean speech signal as a reference in non-intrusive speech quality evaluation, and thus it is of importance in evaluating the quality of speech at any node of the communication network. In this paper, non-intrusive speech quality assessment of narrowband speech is done by Gaussian Mixture Model (GMM) training using several combinations of auditory perception and speech production features, which include principal components of Lyon’s auditory model features, MFCC, LSF and their first and second differences. Results are obtained and compared for several combinations of auditory features for three sets of databases. The results are also compared with ITU-T Recommendation P.563 for non-intrusive speech quality assessment. It is found that many combinations of these feature sets outperform the ITU-T P.563 Recommendation under the test conditions. 相似文献
5.
Jesús González-Rubio J. Ramón Navarro-Cerdán Francisco Casacuberta 《Machine Translation》2013,27(3-4):281-301
Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets. 相似文献
6.
Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with human evaluation at the corpus level, but this is not the case at the segment level. As an attempt to overcome these two limitations, we address the problem of evaluating the quality of MT as a prediction task, where reference-independent features are extracted from the input sentences and their translation, and a quality score is obtained based on models produced from training data. We show that this approach yields better correlation with human evaluation as compared to commonly used metrics, even with models trained on different MT systems, language-pairs and text domains. 相似文献
7.
Adil Raja R. M. A. Azad Colin Flanagan Conor Ryan 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(1):89-94
Estimating the quality of Voice over Internet Protocol (VoIP) as perceived by humans is considered a formidable task. This
is partly due to the relatively large number of variables that are involved as determinants of quality. Moreover, discerning
the significance of one variable over the other is difficult. In this paper a novel approach based on genetic programming
(GP) is presented. It maps the effect of network traffic parameters on listeners’ perception of speech quality. The ITU-T
Recommendation P.862 (PESQ) algorithm is used as a reference model in this research. The GP discovered models that provide
effective VoIP quality estimation are highly correlated to ITU-T Recommendation P.862 (PESQ). They also outperform the ITU-T
Recommendation P.563 in estimating the effect that packet loss has on speech quality. The GP discovered models prove suited
to real-time and in vivo evaluation of VoIP calls. Additionally, they are deployable on a wide variety of hardware platforms. 相似文献
8.
Watanabe S. Sako A. Nakamura A. 《IEEE transactions on audio, speech, and language processing》2006,14(3):855-872
We describe the automatic determination of a large and complicated acoustic model for speech recognition by using variational Bayesian estimation and clustering (VBEC) for speech recognition. We propose an efficient method for decision tree clustering based on a Gaussian mixture model (GMM) and an efficient model search algorithm for finding an appropriate acoustic model topology within the VBEC framework. GMM-based decision tree clustering for triphone HMM states features a novel approach designed to reduce the overly large number of computations to a practical level by utilizing the statistics of monophone hidden Markov model states. The model search algorithm also reduces the search space by utilizing the characteristics of the acoustic model. The experimental results confirmed that VBEC automatically and rapidly yielded an optimum model topology with the highest performance. 相似文献
9.
Iosif Mporas Todor Ganchev Nikos Fakotakis 《International Journal of Speech Technology》2008,11(2):73-85
In this paper we propose a method for improving the performance of the segmentation of speech waveforms to phonetic units.
The proposed method is based on the well known Viterbi time-alignment algorithm and utilizes the phonetic boundary predictions
from multiple speech parameterization techniques. Specifically, we utilize the most appropriate, with respect to boundary
type, phone transition position prediction as initial point to start Viterbi time-alignment for the prediction of the successor
phonetic boundary. The proposed method was evaluated on the TIMIT database, with the exploitation of several, well known in
the area of speech processing, Fourier-based and wavelet-based speech parameterization algorithms. The experimental results
for the tolerance of 20 milliseconds indicated an improvement of the absolute segmentation accuracy of approximately 0.70%,
when compared to the baseline speech segmentation scheme. 相似文献
10.
《Advanced Engineering Informatics》2015,29(3):680-695
Stockpile blending is always considered to be an effective method of controlling the quality and maintaining the grade consistency of delivered bulk material, such as iron ore. However, major challenges remain to predict the quality of a stockpile during blending (stacking and reclaiming) operations, because the chemical composition of the ore body is not always available during blending. Consequently, the performance of current stockpile management systems is relatively poorer than expectations. This paper details an innovative modelling approach to estimate the quality of a stockpile during the blending process. The geometric model created from laser scanning data is capable of recording the dynamic shapes of the stockpile using mathematical equations. Therefore, the quality of the stockpile is calculated with a high degree of accuracy when the chemical analysis is completed. Thus, a quality embedded geometric model is created. Furthermore, this geometric model is associated with the kinematic model of a Bucket Wheel Reclaimer (BWR) to achieve precise auto-control in reclaiming operations. The link between these two models also supports the quality of every single cut accomplished by the BWR, to be calculated in advance. Using the calculation results in conjunction with precise machine control, the output grade can be predicted, planned and controlled. This will optimise the efficiency and effectiveness of stockpile blending. 相似文献
11.
This paper presents a new speech enhancement system that works in wavelet domain. The core of system is an improved WaveShrink module. First, different parameters of WaveShrink are studied; then, based on the features of speech signal, an improved wavelet-based speech enhancement system is proposed. The system uses a novel thresholding algorithm, and introduces a new method for threshold selection. Moreover, the efficiency of system has been increased by selecting more suitable parameters for voiced, unvoiced and silence regions, separately. The proposed system has been evaluated on different sentences under various noise conditions. The results show a plausible improvement in performance of system, in comparison with similar approaches. 相似文献
12.
针对目前机器翻译模型存在的曝光偏差和译文多样性差的问题,提出一种基于强化学习和机器翻译质量评估的中朝神经机器翻译模型QR-Transformer.首先,在句子级别引入评价机制来指导模型预测不完全收敛于参考译文;其次,采用强化学习方法作为指导策略,实现模型在句子级别优化目标序列;最后,在训练过程中融入单语语料并进行多粒度数据预处理以缓解数据稀疏问题.实验表明,QR-Transformer有效提升了中朝神经机器翻译性能,与Transformer相比,中—朝语向BLEU值提升了5.39,QE分数降低了5.16,朝—中语向BLEU值提升了2.73,QE分数下降了2.82. 相似文献
13.
14.
《Computer Speech and Language》2014,28(6):1287-1297
This paper proposes an efficient speech data selection technique that can identify those data that will be well recognized. Conventional confidence measure techniques can also identify well-recognized speech data. However, those techniques require a lot of computation time for speech recognition processing to estimate confidence scores. Speech data with low confidence should not go through the time-consuming recognition process since they will yield erroneous spoken documents that will eventually be rejected. The proposed technique can select the speech data that will be acceptable for speech recognition applications. It rapidly selects speech data with high prior confidence based on acoustic likelihood values and using only speech and monophone models. Experiments show that the proposed confidence estimation technique is over 50 times faster than the conventional posterior confidence measure while providing equivalent data selection performance for speech recognition and spoken document retrieval. 相似文献
15.
16.
We perform a systematic analysis of the effectiveness of features for the problem of predicting the quality of machine translation (MT) at the sentence level. Starting from a comprehensive feature set, we apply a technique based on Gaussian processes, a Bayesian non-linear learning method, to automatically identify features leading to accurate model performance. We consider application to several datasets across different language pairs and text domains, with translations produced by various MT systems and scored for quality according to different evaluation criteria. We show that selecting features with this technique leads to significantly better performance in most datasets, as compared to using the complete feature sets or a state-of-the-art feature selection approach. In addition, we identify a small set of features which seem to perform well across most datasets. 相似文献
17.
目的 抑郁症是一种严重的精神类障碍,会显著影响患者的日常生活和工作。目前的抑郁症临床评估方法几乎都依赖于临床访谈或问卷调查,缺少系统有效地挖掘与抑郁症密切相关模式信息的手段。为了有效帮助临床医生诊断患者的抑郁症严重程度,情感计算领域涌现出越来越多的方法进行自动化的抑郁症识别。为了有效挖掘和编码人们面部含有的具有鉴别力的情感信息,本文提出了一种基于动态面部特征和稀疏编码的抑郁症自动识别框架。方法 在面部特征提取方面,提出了一种新的可以深层次挖掘面部宏观和微观结构信息的动态特征描述符,即中值鲁棒局部二值模式—3D正交平面(median robust local binary patterns from three orthogonal planes,MRELBP-TOP)。由于MRELBP-TOP帧级特征的维度较高,且含有部分冗余信息。为了进一步去除冗余信息和保留关键信息,采用随机映射(random projection,RP)对帧级特征MRELBP-TOP进行降维。此外,为了进一步表征经过降维后的高层模式信息,采用稀疏编码(sparse coding,SC)来抽象紧凑的特征表示。最后,采用支持向量机进行抑郁程度的估计,即预测贝克抑郁分数(the Beck depression inventory-II,BDI-II)。结果 在AVEC2013(the continuous audiovisual emotion and depression 2013)和AVEC2014测试集上,抑郁程度估计值与真实值之间的均方根误差(root mean square error,RMSE)分别为9.70和9.22,相比基准算法,识别精度分别提高了29%和15%。实验结果表明,本文方法优于当前大多数基于视频的抑郁症识别方法。结论 本文构建了基于面部表情的抑郁症识别框架,实现了抑郁程度的有效估计;提出了帧级特征描述子MRELBP-TOP,有效提高了抑郁症识别的精度。 相似文献
18.
Binary translation is an important technique for porting programs as it allows binary code for one platform to execute on another. It is widely used in virtual machines and emulators. However, implementing a correct (and efficient) binary translator is still very challenging because many delicate details must be handled smartly. Manually identifying mistranslated instructions in an application program is difficult, especially when the application is large. Therefore, automatic validation tools are needed urgently to uncover hidden problems in a binary translator. We developed a new validation tool for binary translators. In our validation tool, the original binary code and the translated binary code run simultaneously. Both versions of the binary code continuously send their architecture states and the stored values, which are the values stored into memory cells, to a third process, the validator. Since most mistranslated instructions will result in wrong architecture states during execution, our validator can catch most mistranslated instructions emitted by a binary translator by comparing the corresponding architecture states. Corresponding architecture states may differ due to (1) translation errors, (2) different (but correct) memory layouts, and (3) return values of certain system calls. The need to differentiate the three sources of differences makes comparing architecture states very difficult, if not impossible. In our validator, we take special care to make memory layouts exactly the same and make the corresponding system calls always return exactly the same values in the original and in the translated binaries. Therefore, any differences in the corresponding architecture states indicate mistranslated instructions emitted by the binary translator. Besides solving the architecture-state-comparison problems, we also propose several methods to speed up the automatic validation. The first is the validation-block method, which reduces the number of validations while keeping the accuracy of instruction-level validation. The second is quick validation, which provides extremely fast validation at the expense of less accurate error information. Our validator can be applied to different binary translators. In our experiment, the validator has successfully validated programs translated by static, dynamic, and hybrid binary translators. 相似文献
19.
Kikui G. Yamamoto S. Takezawa T. Sumita E. 《IEEE transactions on audio, speech, and language processing》2006,14(5):1674-1682
This paper investigates issues in preparing corpora for developing speech-to-speech translation (S2ST). It is impractical to create a broad-coverage parallel corpus only from dialog speech. An alternative approach is to have bilingual experts write conversational-style texts in the target domain, with translations. There is, however, a risk of losing fidelity to the actual utterances. This paper focuses on balancing a tradeoff between these two kinds of corpora through the analysis of two newly developed corpora in the travel domain: a bilingual parallel corpus with 420 K utterances and a collection of in-domain dialogs using actual S2ST systems. We found that the first corpus is effective for covering utterances in the second corpus if complimented with a small number of utterances taken from monolingual dialogs. We also found that characteristics of in-domain utterances become closer to those of the first corpus when more restrictive conditions and instructions to speakers are given. These results suggest the possibility of a bootstrap-style of development of corpora and S2ST systems, where an initial S2ST system is developed with parallel texts, and is then gradually improved with in-domain utterances collected by the system as restrictions are relaxed. 相似文献
20.
Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing
based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients
(MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of
the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number
of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm
optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies.
The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are
conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as
in noisy environment. 相似文献