首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, a decision‐tree‐based Markov model for phrase break prediction is proposed. The model takes advantage of the non‐homogeneous‐features‐based classification ability of decision tree and temporal break sequence modeling based on the Markov process. For this experiment, a text corpus tagged with parts‐of‐speech and three break strength levels is prepared and evaluated. The complex feature set, textual conditions, and prior knowledge are utilized; and chunking rules are applied to the search results. The proposed model shows an error reduction rate of about 11.6% compared to the conventional classification model.  相似文献   

2.
We propose the VS3‐NET model to solve the task of question answering questions with machine‐reading comprehension that searches for an appropriate answer in a given context. VS3‐NET is a model that trains latent variables for each question using variational inferences based on a model of a simple recurrent unit‐based sentences and self‐matching networks. The types of questions vary, and the answers depend on the type of question. To perform efficient inference and learning, we introduce neural question‐type models to approximate the prior and posterior distributions of the latent variables, and we use these approximated distributions to optimize a reparameterized variational lower bound. The context given in machine‐reading comprehension usually comprises several sentences, leading to performance degradation caused by context length. Therefore, we model a hierarchical structure using sentence encoding, in which as the context becomes longer, the performance degrades. Experimental results show that the proposed VS3‐NET model has an exact‐match score of 76.8% and an F1 score of 84.5% on the SQuAD test set.  相似文献   

3.
The MPEG‐D unified speech and audio coding (USAC) standardization process was initiated by MPEG to develop an audio codec that is able to provide consistent quality for mixed speech and music contents. The current USAC reference model structure consists of frequency domain (FD) and linear prediction domain (LPD) core modules and is controlled using a signal classifier tool. In this letter, we propose an LPD single‐mode USAC structure using an adaptive widowing‐based transform‐coded excitation module. We tested our system using official test items for all mono‐evaluation modes. The results of the experiment show that the objective and subjective performances of the proposed single‐mode USAC system are better than those of the FD/LPD dual‐mode USAC system.  相似文献   

4.
In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non‐stationary nature of speech, and the other is the inter‐frame change of a speech spectrum. We adopt the use of a sub‐frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double‐delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model–based speech recognizer.  相似文献   

5.
In this paper, we propose a rank‐weighted reconstruction feature to improve the robustness of a feed‐forward deep neural network (FFDNN)‐based acoustic model. In the FFDNN‐based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank‐weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.  相似文献   

6.
We predict performance metrics of cloud services using statistical learning, whereby the behaviour of a system is learned from observations. Specifically, we collect device and network statistics from a cloud testbed and apply regression methods to predict, in real‐time, client‐side service metrics for video streaming and key‐value store services. Results from intensive evaluation on our testbed indicate that our method accurately predicts service metrics in real time (mean absolute error below 16% for video frame rate and read latency, for instance). Further, our method is service agnostic in the sense that it takes as input operating systems and network statistics instead of service‐specific metrics. We show that feature set reduction significantly improves the prediction accuracy in our case, while simultaneously reducing model computation time. We find that the prediction accuracy decreases when, instead of a single service, both services run on the same testbed simultaneously or when the network quality on the path between the server cluster and the client deteriorates. Finally, we discuss the design and implementation of a real‐time analytics engine, which processes streams of device statistics and service metrics from testbed sensors and produces model predictions through online learning.  相似文献   

7.
We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable‐feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable‐feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable‐feature selection achieves better performance than the conventional feature recombination system with reliable‐feature selection.  相似文献   

8.
The optimum maximum voiced frequency (MVF) estimation‐based two‐band excitation for hidden Markov model‐based speech synthesis is presented. An analysis‐by‐synthesis scheme is adopted for the MVF estimation which leads to the minimum spectral distortion of synthesized speech. Experimental results show that the proposed method significantly improves synthetic speech quality.  相似文献   

9.
An adaptive speech streaming method to improve the perceived speech quality of a software‐based multipoint control unit (SW‐based MCU) over IP networks is proposed. First, the proposed method predicts whether the speech packet to be transmitted is lost. To this end, the proposed method learns the pattern of packet losses in the IP network, and then predicts the loss of the packet to be transmitted over that IP network. The proposed method classifies the speech signal into different classes of silence, unvoiced, speech onset, or voiced frame. Based on the results of packet loss prediction and speech classification, the proposed method determines the proper amount and bitrate of redundant speech data (RSD) that are sent with primary speech data (PSD) in order to assist the speech decoder to restore the speech signals of lost packets. Specifically, when a packet is predicted to be lost, the amount and bitrate of the RSD must be increased through a reduction in the bitrate of the PSD. The effectiveness of the proposed method for learning the packet loss pattern and assigning a different speech coding rate is then demonstrated using a support vector machine and adaptive multirate‐narrowband, respectively. The results show that as compared with conventional methods that restore lost speech signals, the proposed method remarkably improves the perceived speech quality of an SW‐based MCU under various packet loss conditions in an IP network.  相似文献   

10.
Feature selection is very important for feature‐based relation classification tasks. While most of the existing works on feature selection rely on linguistic information acquired using parsers, this letter proposes new features, including probabilistic and semantic relatedness features, to manifest the relatedness between patterns and certain relation types in an explicit way. The impact of each feature set is evaluated using both a chisquare estimator and a performance evaluation. The experiments show that the impact of relatedness features is superior to existing well‐known linguistic features, and the contribution of relatedness features cannot be substituted using other normally used linguistic feature sets.  相似文献   

11.
Unified speech and audio coding (USAC) is one of the latest coding technologies. It is based on a switchable coding structure, and has demonstrated the highest levels of performance for both speech and music contents. In this paper, we propose an extended version of USAC with a single‐mode of operation—which does not require a switching system—by extending the linear prediction‐coding mode. The main concept of this extension is the adoption of the advantages of frequency‐domain coding schemes, such as windowing and transition control. Subjective test results indicate that the proposed scheme covers speech, music, and mixed streams with adequate levels of performance. The obtained quality levels are comparable with those of USAC.  相似文献   

12.
In this paper, we propose a classification‐based approach for hybridizing statistical machine translation and rule‐based machine translation. Both the training dataset used in the learning of our proposed classifier and our feature extraction method affect the hybridization quality. To create one such training dataset, a previous approach used auto‐evaluation metrics to determine from a set of component machine translation (MT) systems which gave the more accurate translation (by a comparative method). Once this had been determined, the most accurate translation was then labelled in such a way so as to indicate the MT system from which it came. In this previous approach, when the metric evaluation scores were low, there existed a high level of uncertainty as to which of the component MT systems was actually producing the better translation. To relax such uncertainty or error in classification, we propose an alternative approach to such labeling; that is, a cut‐off method. In our experiments, using the aforementioned cut‐off method in our proposed classifier, we managed to achieve a translation accuracy of 81.5% — a 5.0% improvement over existing methods.  相似文献   

13.
Jihoon Park  Minsoo Hahn 《ETRI Journal》2015,37(6):1211-1219
In a hidden Markov model–based speech synthesis system using a two‐band excitation model, a maximum voiced frequency (MVF) is the most important feature as an excitation parameter because the synthetic speech quality depends on the MVF. This paper proposes an enhanced MVF estimation scheme based on a peak picking method. In the proposed scheme, both local peaks and peak lobes are picked from the spectrum of a linear predictive residual signal. The average of the normalized distances of local peaks and peak lobes is calculated and utilized as a feature to estimate an MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves the synthetic speech quality compared with that of a conventional one in a mobile device as well as a PC environment.  相似文献   

14.
In this letter, we propose an unsupervised framework for speech noise reduction based on the recent development of low‐rank and sparse matrix decomposition. The proposed framework directly separates the speech signal from noisy speech by decomposing the noisy speech spectrogram into three submatrices: the noise structure matrix, the clean speech structure matrix, and the residual noise matrix. Evaluations on the Noisex‐92 dataset show that the proposed method achieves a signal‐to‐distortion ratio approximately 2.48 dB and 3.23 dB higher than that of the robust principal component analysis method and the non‐negative matrix factorization method, respectively, when the input SNR is ?5 dB.  相似文献   

15.
基于句法分析和答案分类的中文问答系统   总被引:1,自引:0,他引:1       下载免费PDF全文
本文根据疑问词和谓语的距离信息对问句进行细致的句型分析,然后对答句进行浅层句法分析,在此基础上,抽取出问题特征集、答句特征集和组合特征集作为分类特征,引入最大熵模型和支持向量机训练答案抽取分类器.基于不同特征组合训练得到的分类器在五类事实性问题上进行了测试,其F值分别达到70.87%和85.75%.  相似文献   

16.
Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information‐based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best‐classification‐accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.  相似文献   

17.
A new mesh reconstruction scheme for approximating a surface from a set of unorganized 3D points is proposed. The proposed method, called a shrink‐wrapped boundary face (SWBF) algorithm, produces the final surface by iteratively shrinking the initial mesh generated from the definition of the boundary faces. SWBF surmounts the genus‐0 spherical topology restriction of previous shrink‐wrapping‐based mesh generation techniques and can be applied to any type of surface topology. Furthermore, SWBF is significantly faster than a related algorithm of Jeong and others, as SWBF requires only a local nearest‐point‐search in the shrinking process. Our experiments show that SWBF is very robust and efficient for surface reconstruction from an unorganized point cloud.  相似文献   

18.
Human activity recognition (HAR) has become effective as a computer vision tool for video surveillance systems. In this paper, a novel biometric system that can detect human activities in 3D space is proposed. In order to implement HAR, joint angles obtained using an RGB‐depth sensor are used as features. Because HAR is operated in the time domain, angle information is stored using the sliding kernel method. Haar‐wavelet transform (HWT) is applied to preserve the information of the features before reducing the data dimension. Dimension reduction using an averaging algorithm is also applied to decrease the computational cost, which provides faster performance while maintaining high accuracy. Before the classification, a proposed thresholding method with inverse HWT is conducted to extract the final feature set. Finally, the K‐nearest neighbor (k‐NN) algorithm is used to recognize the activity with respect to the given data. The method compares favorably with the results using other machine learning algorithms.  相似文献   

19.
《电子学报:英文版》2017,(6):1111-1117
The accurate classification of subjective and objective sentences is important in the preparation for micro-blog sentiment analysis. Since a single feature type cannot provide enough subjective information for classification, we propose a Support vector machine (SVM)-based classification model for Chinese micro-blogs using multiple features. We extracted the subjective features from the Part of speech (POS) and the dependency relationship between words, and constructed a 3-POS subjective pattern set and a dependency template set. We fused these two types of features and used an SVM-based model to classify Chinese micro-blog text. The experimental results showed that the performance of the classification model improved remarkably when using multiple features.  相似文献   

20.
The problem of the sentence‐based pronunciation evaluation task is defined in the context of subjective criteria. Three subjective criteria (that is, the minimum subjective word score, the mean subjective word score, and first impression) are proposed and modeled with the combination of word‐based assessment. Then, the subjective criteria are approximated with objective sentence pronunciation scores obtained with the combination of word‐based metrics. No a priori studies of common mistakes are required, and class‐based language models are used to incorporate incorrect and correct pronunciations. Incorrect pronunciations are automatically incorporated by making use of a competitive lexicon and the phonetic rules of students' mother and target languages. This procedure is applicable to any second language learning context, and subjective‐objective sentence score correlations greater than or equal to 0.5 can be achieved when the proposed sentence‐based pronunciation criteria are approximated with combinations of word‐based scores. Finally, the subjective‐objective sentence score correlations reported here are very comparable with those published elsewhere resulting from methods that require a priori studies of pronunciation errors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号