共查询到20条相似文献,搜索用时 15 毫秒
1.
We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs. We derive various feature functions measuring the closeness of a given test sentence to the training data and the difficulty of translating the sentence. We describe mono feature functions that are based on statistics of only one side of the parallel training corpora and duo feature functions that incorporate statistics involving both source and target sides of the training data. Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations. We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets. We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used. We show that by just looking at the test source sentences and not using the translation outputs at all, we can achieve better performance than a baseline system using SMT model dependent features that generated the translations. Furthermore, our prediction system is able to achieve the $2$ nd best performance overall according to the official results of the quality estimation task (QET) challenge when also looking at the translation outputs. Our representation and features achieve the top performance in QET among the models using the SVR learning model. 相似文献
2.
Litian Sun Toshihiko Yamasaki Kiyoharu Aizawa 《Multimedia Tools and Applications》2018,77(5):5189-5213
The amount of visual data available on the Web is growing explosively and it is becoming increasingly important to explore methods for automatically estimating the quality of this content in a manner that is consistent with the aesthetic perceptions of humans. The key to this challenging problem is to design an appropriate set of features to extract the aesthetic properties from content. Most previous studies designed a set of aesthetic features based on photographic criteria, which were unavoidably limited to specific examples and they lacked an interpretation based on the mechanism of human aesthetic perception. According to psychological theory, visual complexity is an important property of the stimuli, because it directly influences the viewer’s arousal level, which is believed to be closely related to aesthetic perception. In this study, we propose an alternative set of features for aesthetic estimation based on a visual complexity principle. We extracted the visual complexity properties from an input image in terms of their composition, shape, and distribution. In addition, we demonstrated that the proposed features are consistent with human perception on the complexity in our visual complexity dataset. Next, we employed these features for photo-aesthetic quality estimation using a large-scale dataset. Various experiments were conducted under different conditions and comparisons with state-of-the-art methods shows that the proposed visual complexity feature outperforms photography rule-based features and even better than deep features. 相似文献
3.
Vidal E. Casacuberta F. Rodriguez L. Civera J. Hinarejos C.D.M. 《IEEE transactions on audio, speech, and language processing》2006,14(3):941-951
Current machine translation systems are far from being perfect. However, such systems can be used in computer-assisted translation to increase the productivity of the (human) translation process. The idea is to use a text-to-text translation system to produce portions of target language text that can be accepted or amended by a human translator using text or speech. These user-validated portions are then used by the text-to-text translation system to produce further, hopefully improved suggestions. There are different alternatives of using speech in a computer-assisted translation system: From pure dictated translation to simple determination of acceptable partial translations by reading parts of the suggestions made by the system. In all the cases, information from the text to be translated can be used to constrain the speech decoding search space. While pure dictation seems to be among the most attractive settings, unfortunately perfect speech decoding does not seem possible with the current speech processing technology and human error-correcting would still be required. Therefore, approaches that allow for higher speech recognition accuracy by using increasingly constrained models in the speech recognition process are explored here. All these approaches are presented under the statistical framework. Empirical results support the potential usefulness of using speech within the computer-assisted translation paradigm. 相似文献
4.
Quality estimation of speech is essential for monitoring and maintenance of the quality of service at different nodes of modern telecommunication networks. It is also required in the selection of codecs in speech communication systems. There is no requirement of the original clean speech signal as a reference in non-intrusive speech quality evaluation, and thus it is of importance in evaluating the quality of speech at any node of the communication network. In this paper, non-intrusive speech quality assessment of narrowband speech is done by Gaussian Mixture Model (GMM) training using several combinations of auditory perception and speech production features, which include principal components of Lyon’s auditory model features, MFCC, LSF and their first and second differences. Results are obtained and compared for several combinations of auditory features for three sets of databases. The results are also compared with ITU-T Recommendation P.563 for non-intrusive speech quality assessment. It is found that many combinations of these feature sets outperform the ITU-T P.563 Recommendation under the test conditions. 相似文献
5.
Jesús González-Rubio J. Ramón Navarro-Cerdán Francisco Casacuberta 《Machine Translation》2013,27(3-4):281-301
Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets. 相似文献
6.
Adil Raja R. M. A. Azad Colin Flanagan Conor Ryan 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(1):89-94
Estimating the quality of Voice over Internet Protocol (VoIP) as perceived by humans is considered a formidable task. This
is partly due to the relatively large number of variables that are involved as determinants of quality. Moreover, discerning
the significance of one variable over the other is difficult. In this paper a novel approach based on genetic programming
(GP) is presented. It maps the effect of network traffic parameters on listeners’ perception of speech quality. The ITU-T
Recommendation P.862 (PESQ) algorithm is used as a reference model in this research. The GP discovered models that provide
effective VoIP quality estimation are highly correlated to ITU-T Recommendation P.862 (PESQ). They also outperform the ITU-T
Recommendation P.563 in estimating the effect that packet loss has on speech quality. The GP discovered models prove suited
to real-time and in vivo evaluation of VoIP calls. Additionally, they are deployable on a wide variety of hardware platforms. 相似文献
7.
Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with human evaluation at the corpus level, but this is not the case at the segment level. As an attempt to overcome these two limitations, we address the problem of evaluating the quality of MT as a prediction task, where reference-independent features are extracted from the input sentences and their translation, and a quality score is obtained based on models produced from training data. We show that this approach yields better correlation with human evaluation as compared to commonly used metrics, even with models trained on different MT systems, language-pairs and text domains. 相似文献
8.
Watanabe S. Sako A. Nakamura A. 《IEEE transactions on audio, speech, and language processing》2006,14(3):855-872
We describe the automatic determination of a large and complicated acoustic model for speech recognition by using variational Bayesian estimation and clustering (VBEC) for speech recognition. We propose an efficient method for decision tree clustering based on a Gaussian mixture model (GMM) and an efficient model search algorithm for finding an appropriate acoustic model topology within the VBEC framework. GMM-based decision tree clustering for triphone HMM states features a novel approach designed to reduce the overly large number of computations to a practical level by utilizing the statistics of monophone hidden Markov model states. The model search algorithm also reduces the search space by utilizing the characteristics of the acoustic model. The experimental results confirmed that VBEC automatically and rapidly yielded an optimum model topology with the highest performance. 相似文献
9.
Iosif Mporas Todor Ganchev Nikos Fakotakis 《International Journal of Speech Technology》2008,11(2):73-85
In this paper we propose a method for improving the performance of the segmentation of speech waveforms to phonetic units.
The proposed method is based on the well known Viterbi time-alignment algorithm and utilizes the phonetic boundary predictions
from multiple speech parameterization techniques. Specifically, we utilize the most appropriate, with respect to boundary
type, phone transition position prediction as initial point to start Viterbi time-alignment for the prediction of the successor
phonetic boundary. The proposed method was evaluated on the TIMIT database, with the exploitation of several, well known in
the area of speech processing, Fourier-based and wavelet-based speech parameterization algorithms. The experimental results
for the tolerance of 20 milliseconds indicated an improvement of the absolute segmentation accuracy of approximately 0.70%,
when compared to the baseline speech segmentation scheme. 相似文献
10.
This paper presents a new speech enhancement system that works in wavelet domain. The core of system is an improved WaveShrink module. First, different parameters of WaveShrink are studied; then, based on the features of speech signal, an improved wavelet-based speech enhancement system is proposed. The system uses a novel thresholding algorithm, and introduces a new method for threshold selection. Moreover, the efficiency of system has been increased by selecting more suitable parameters for voiced, unvoiced and silence regions, separately. The proposed system has been evaluated on different sentences under various noise conditions. The results show a plausible improvement in performance of system, in comparison with similar approaches. 相似文献
11.
12.
We perform a systematic analysis of the effectiveness of features for the problem of predicting the quality of machine translation (MT) at the sentence level. Starting from a comprehensive feature set, we apply a technique based on Gaussian processes, a Bayesian non-linear learning method, to automatically identify features leading to accurate model performance. We consider application to several datasets across different language pairs and text domains, with translations produced by various MT systems and scored for quality according to different evaluation criteria. We show that selecting features with this technique leads to significantly better performance in most datasets, as compared to using the complete feature sets or a state-of-the-art feature selection approach. In addition, we identify a small set of features which seem to perform well across most datasets. 相似文献
13.
Binary translation is an important technique for porting programs as it allows binary code for one platform to execute on another. It is widely used in virtual machines and emulators. However, implementing a correct (and efficient) binary translator is still very challenging because many delicate details must be handled smartly. Manually identifying mistranslated instructions in an application program is difficult, especially when the application is large. Therefore, automatic validation tools are needed urgently to uncover hidden problems in a binary translator. We developed a new validation tool for binary translators. In our validation tool, the original binary code and the translated binary code run simultaneously. Both versions of the binary code continuously send their architecture states and the stored values, which are the values stored into memory cells, to a third process, the validator. Since most mistranslated instructions will result in wrong architecture states during execution, our validator can catch most mistranslated instructions emitted by a binary translator by comparing the corresponding architecture states. Corresponding architecture states may differ due to (1) translation errors, (2) different (but correct) memory layouts, and (3) return values of certain system calls. The need to differentiate the three sources of differences makes comparing architecture states very difficult, if not impossible. In our validator, we take special care to make memory layouts exactly the same and make the corresponding system calls always return exactly the same values in the original and in the translated binaries. Therefore, any differences in the corresponding architecture states indicate mistranslated instructions emitted by the binary translator. Besides solving the architecture-state-comparison problems, we also propose several methods to speed up the automatic validation. The first is the validation-block method, which reduces the number of validations while keeping the accuracy of instruction-level validation. The second is quick validation, which provides extremely fast validation at the expense of less accurate error information. Our validator can be applied to different binary translators. In our experiment, the validator has successfully validated programs translated by static, dynamic, and hybrid binary translators. 相似文献
14.
Kikui G. Yamamoto S. Takezawa T. Sumita E. 《IEEE transactions on audio, speech, and language processing》2006,14(5):1674-1682
This paper investigates issues in preparing corpora for developing speech-to-speech translation (S2ST). It is impractical to create a broad-coverage parallel corpus only from dialog speech. An alternative approach is to have bilingual experts write conversational-style texts in the target domain, with translations. There is, however, a risk of losing fidelity to the actual utterances. This paper focuses on balancing a tradeoff between these two kinds of corpora through the analysis of two newly developed corpora in the travel domain: a bilingual parallel corpus with 420 K utterances and a collection of in-domain dialogs using actual S2ST systems. We found that the first corpus is effective for covering utterances in the second corpus if complimented with a small number of utterances taken from monolingual dialogs. We also found that characteristics of in-domain utterances become closer to those of the first corpus when more restrictive conditions and instructions to speakers are given. These results suggest the possibility of a bootstrap-style of development of corpora and S2ST systems, where an initial S2ST system is developed with parallel texts, and is then gradually improved with in-domain utterances collected by the system as restrictions are relaxed. 相似文献
15.
Jha Tulika Kavya Ramisetty Christopher Jabez Arunachalam Vasan 《International Journal of Speech Technology》2022,25(3):707-725
International Journal of Speech Technology - Speech emotion recognition is one of the fastest growing areas of interest in the field of affective computing. Emotion detection aids... 相似文献
16.
Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing
based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients
(MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of
the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number
of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm
optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies.
The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are
conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as
in noisy environment. 相似文献
17.
Batista Gracieth Cavalcanti Silva Washington Luis Santos de Oliveira Duarte Lopes Saotome Osamu 《Multimedia Tools and Applications》2019,78(22):31709-31731
Multimedia Tools and Applications - This paper proposes the implementation of an Automatic Speech Recognition (ASR) process through extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from... 相似文献
18.
K. Sreenivasa Rao Shashidhar G. Koolagudi Ramu Reddy Vempada 《International Journal of Speech Technology》2013,16(2):143-160
In this paper, global and local prosodic features extracted from sentence, word and syllables are proposed for speech emotion or affect recognition. In this work, duration, pitch, and energy values are used to represent the prosodic information, for recognizing the emotions from speech. Global prosodic features represent the gross statistics such as mean, minimum, maximum, standard deviation, and slope of the prosodic contours. Local prosodic features represent the temporal dynamics in the prosody. In this work, global and local prosodic features are analyzed separately and in combination at different levels for the recognition of emotions. In this study, we have also explored the words and syllables at different positions (initial, middle, and final) separately, to analyze their contribution towards the recognition of emotions. In this paper, all the studies are carried out using simulated Telugu emotion speech corpus (IITKGP-SESC). These results are compared with the results of internationally known Berlin emotion speech corpus (Emo-DB). Support vector machines are used to develop the emotion recognition models. The results indicate that, the recognition performance using local prosodic features is better compared to the performance of global prosodic features. Words in the final position of the sentences, syllables in the final position of the words exhibit more emotion discriminative information compared to the words and syllables present in the other positions. 相似文献
19.