首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST) based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectively.  相似文献   

2.
3.
Fuzzy models of language structures   总被引:1,自引:0,他引:1  
Statistical distribution of language structures reflect important regularities controlling informational and psycho-physiological processes, which accompany the generation of verbal language or printed texts. In this paper, fuzzy quantitative models of language statistics are constructed. The suggested models are based on the assumption about a super-position of two kinds of uncertainties: probabilistic and possibilistic. The realization of this super-position in statistical distributions is achieved by the splitting procedure of the probability measure. In this way, the fuzzy versions of generalized binomial, Fucks', and Zipf-Mandelbrot's distributions are constructed describing the probabilistic and possibilistic organization of language at any level: morphological, syntactic, or phonological.  相似文献   

4.
Since the method of computer simulation was introduced to the study of human behavior, a number of computer models of language acquisition have been proposed and implemented on a computer system. Because of varied purposes of simulation and diverse theoretical backgrounds of the model builders, there are notable dissimilarities among the models. This paper reviews the existing computer models of language acquisition. The comparison is done in terms of the basic components of the models: (a) input to the computer system, (b) how linguistic knowledge and the process of learning are represented, and (c) what is initially built into the system and what is learned. Based on the review, the potential usefulness of the computer simulation methods in the area of child language is dicussed.  相似文献   

5.
As an attempt to associate a real number with a language, entropies of languages are computed by Banerji, Kuich, and others. As mappings from languages to real numbers, in this paper, measures over languages are presented. These measures satisfy additivity while entropies do not. Two kinds of measures, p-measure and ω-measure, are defined, and the computing method of these measures is shown for regular languages and context-free languages. Some properties of these measures are applied to show the nonregularity of several languages.  相似文献   

6.
Investigations on tidal processes in rivers can be performed in physical and numerical models. Moreover, in recent years, a hybrid model technique has been developed, in which physical and numerical models are combined on-line. The individual investigation areas are covered by that model type the particular advantages of which are specially suited for it; in this way inefficiencies are avoided. The application of hybrid models requires software support for the numerical model as well as for data acquisition, and control on the physical model. All this has to be integrated in one system. Moreover, the software has to be written for application under real-time conditions. It is modular-structured, and has a problem-oriented input language.  相似文献   

7.
8.
9.
綦艳霞  沈慧丽  陈朝晖  顾斌 《计算机应用》2012,32(12):3525-3528
针对由周期行为和模式转换机制组成的实时系统提出的SPARDL需求建模语言,详细阐明了其对应的SPARDL模型的Event-B解释。通过Event-B来解释SPARDL的语义,同时提出一种基于SPARDL模型特征的精化框架用于Event-B模型的开发。最后,通过案例研究的分析展示用Event-B对SPARDL模型建模和验证的方法的有效性。  相似文献   

10.
The aim of this work is to show the ability of stochastic regular grammars to generate accurate language models which can be well integrated, allocated and handled in a continuous speech recognition system. For this purpose, a syntactic version of the well-known n -gram model, called k -testable language in the strict sense (k -TSS), is used. The complete definition of a k -TSS stochastic finite state automaton is provided in the paper. One of the difficulties arising in representing a language model through a stochastic finite state network is that the recursive schema involved in the smoothing procedure must be adopted in the finite state formalism to achieve an efficient implementation of the backing-off mechanism. The use of the syntactic back-off smoothing technique applied to k -TSS language modelling allowed us to obtain a self-contained smoothed model integrating several k -TSS automata in a unique smoothed and integrated model, which is also fully defined in the paper. The proposed formulation leads to a very compact representation of the model parameters learned at training time: probability distribution and model structure. The dynamic expansion of the structure at decoding time allows an efficient integration in a continuous speech recognition system using a one-step decoding procedure. An experimental evaluation of the proposed formulation was carried out on two Spanish corpora. These experiments showed that regular grammars generate accurate language models (k -TSS) that can be efficiently represented and managed in real speech recognition systems, even for high values of k, leading to very good system performance.  相似文献   

11.
A linguistic ontology of space for natural language processing   总被引:1,自引:0,他引:1  
We present a detailed semantics for linguistic spatial expressions supportive of computational processing that draws substantially on the principles and tools of ontological engineering and formal ontology. We cover language concerned with space, actions in space and spatial relationships and develop an ontological organization that relates such expressions to general classes of fixed semantic import. The result is given as an extension of a linguistic ontology, the Generalized Upper Model, an organization which has been used for over a decade in natural language processing applications. We describe the general nature and features of this ontology and show how we have extended it for working particularly with space. Treaitng the semantics of natural language expressions concerning space in this way offers a substantial simplification of the general problem of relating natural spatial language to its contextualized interpretation. Example specifications based on natural language examples are presented, as well as an evaluation of the ontology's coverage, consistency, predictive power, and applicability.  相似文献   

12.
We discuss development of a word-unigram language model for online handwriting recognition. First, we tokenize a text corpus into words, contrasting with tokenization methods designed for other purposes. Second, we select for our model a subset of the words found, discussing deviations from an N-most-frequent-words approach. From a 600-million-word corpus, we generated a 53,000-word model which eliminates 45% of word-recognition errors made by a character-level-model baseline system. We anticipate that our methods will be applicable to offline recognition as well, and to some extent to other recognizers, such as speech recognizers and video retrieval systems. Received: November 1, 2001 / Revised version: July 22, 2002  相似文献   

13.
In statistical language models,how to integrate diverse linguistic knowledge in a general framework for long-distance dependencies is a challenging issue,In this paper,an improved language model incorporating linguistic structure into maximum entropy framework is presented.The poposed model combines trigram with the structure knowledge of base phrase in which trigram is used to capture the local relation between words.while the structure knowledge of base phrase is considered to represent the long-distance relations between syntactical structures.The knowledge of syntax,semantics and vocabulary is is integrated into the maximum entropy framework,Experimental results show that the proposed model improves by 24% for language model perplexity and increases about3% for sign language recognition rate compared with the trigram model.  相似文献   

14.
We outline an approach to parsing based on system modelling. The underlying assumption, which determines the limits of the approach, is that a narrative natural language text constitutes a symbolic model of the system described, written for the purpose of communicating static and/or dynamic system aspects.  相似文献   

15.
16.
In knowledge discovery in a text database, extracting and returning a subset of information highly relevant to a user's query is a critical task. In a broader sense, this is essentially identification of certain personalized patterns that drives such applications as Web search engine construction, customized text summarization and automated question answering. A related problem of text snippet extraction has been previously studied in information retrieval. In these studies, common strategies for extracting and presenting text snippets to meet user needs either process document fragments that have been delimitated a priori or use a sliding window of a fixed size to highlight the results. In this work, we argue that text snippet extraction can be generalized if the user's intention is better utilized. It overcomes the rigidness of existing approaches by dynamically returning more flexible start-end positions of text snippets, which are also semantically more coherent. This is achieved by constructing and using statistical language models which effectively capture the commonalities between a document and the user intention. Experiments indicate that our proposed solutions provide effective personalized information extraction services.  相似文献   

17.
Neural Computing and Applications - Text summarization resolves the issue of capturing essential information from a large volume of text data. Existing methods either depend on the end-to-end...  相似文献   

18.
Unconstrained off-line continuous handwritten text recognition is a very challenging task which has been recently addressed by different promising techniques. This work presents our latest contribution to this task, integrating neural network language models in the decoding process of three state-of-the-art systems: one based on bidirectional recurrent neural networks, another based on hybrid hidden Markov models and, finally, a combination of both. Experimental results obtained on the IAM off-line database demonstrate that consistent word error rate reductions can be achieved with neural network language models when compared with statistical N-gram language models on the three tested systems. The best word error rate, 16.1%, reported with ROVER combination of systems using neural network language models significantly outperforms current benchmark results for the IAM database.  相似文献   

19.
Two statistical language models have been investigated on their effectiveness in upgrading the accuracy of a Chinese character recognizer. The baseline model is one of lexical analytic nature which segments a sequence of character images according to the maximum matching of words with consideration of word binding forces. A model of bigram statistics of word-classes is then investigated and compared against the baseline model in terms of recognition rate improvement on the image recognizer. On the average, the baseline language model improves the recognition rate by about 7% while the bigram statistics model upgrades it by about 10%  相似文献   

20.
In this work, we propose and compare two different approaches to a two-level language model. Both of them are based on phrase classes but they consider different ways of dealing with phrases into the classes. We provide a complete formulation consistent with the two approaches. The language models proposed were integrated into an Automatic Speech Recognition (ASR) system and evaluated in terms of Word Error Rate. Several series of experiments were carried out over a spontaneous human–machine dialogue corpus in Spanish, where users asked for information about long-distance trains by telephone. It can be extracted from the obtained results that the integration of phrases into classes when using the language models proposed leads to an improvement of the performance of an ASR system. Moreover, the obtained results seem to indicate that the history length with which the best performance is achieved is related to the features of the model itself. Thus, not all the models show the best results with the same value of history length.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号