首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 342 毫秒
1.
A cost function is developed, based on information-theoretic concepts, that measures the complexity of a stochastic context-free grammar, as well as the discrepancy between its language and a given stochastic language sample. This function is used to guide a search procedure that finds simple grammars whose languages are good fits to a sample. Reasonable results have been obtained in a variety of cases, including parenthesis and addition strings, Basic English (the first 25 sentences in English Through Pictures) and chain-encoded chromosome boundaries.  相似文献   

2.
The aim of this work is to show the ability of stochastic regular grammars to generate accurate language models which can be well integrated, allocated and handled in a continuous speech recognition system. For this purpose, a syntactic version of the well-known n -gram model, called k -testable language in the strict sense (k -TSS), is used. The complete definition of a k -TSS stochastic finite state automaton is provided in the paper. One of the difficulties arising in representing a language model through a stochastic finite state network is that the recursive schema involved in the smoothing procedure must be adopted in the finite state formalism to achieve an efficient implementation of the backing-off mechanism. The use of the syntactic back-off smoothing technique applied to k -TSS language modelling allowed us to obtain a self-contained smoothed model integrating several k -TSS automata in a unique smoothed and integrated model, which is also fully defined in the paper. The proposed formulation leads to a very compact representation of the model parameters learned at training time: probability distribution and model structure. The dynamic expansion of the structure at decoding time allows an efficient integration in a continuous speech recognition system using a one-step decoding procedure. An experimental evaluation of the proposed formulation was carried out on two Spanish corpora. These experiments showed that regular grammars generate accurate language models (k -TSS) that can be efficiently represented and managed in real speech recognition systems, even for high values of k, leading to very good system performance.  相似文献   

3.
We have created a diagnostic/prognostic software tool for the analysis of complex systems, such as monitoring the “running health” of helicopter rotor systems. Although our software is not yet deployed for real-time in-flight diagnosis, we have successfully analyzed the data sets of actual helicopter rotor failures supplied to us by the US Navy. In this paper, we discuss both critical techniques supporting the design of our stochastic diagnostic system as well as issues related to its full deployment. We also present four examples of its use.Our diagnostic system, called DBAYES, is composed of a logic-based, first-order, and Turing-complete set of software tools for stochastic modeling. We use this language for modeling time-series data supplied by sensors on mechanical systems. The inference scheme for these software tools is based on a variant of Pearl’s loopy belief propagation algorithm [Pearl, P. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann]. Our language contains variables that can capture general classes of situations, events, and relationships. A Turing-complete language is able to reason about potentially infinite classes and situations, similar to the analysis of dynamic Bayesian networks. Since the inference algorithm is based on a variant of loopy belief propagation, the language includes expectation maximization type learning of parameters in the modeled domain. In this paper we briefly present the theoretical foundations for our first-order stochastic language and then demonstrate time-series modeling and learning in the context of fault diagnosis.  相似文献   

4.
5.
Probabilistic k-testable models (usually known as k-gram models in the case of strings) can be easily identified from samples and allow for smoothing techniques to deal with unseen events during pattern classification. In this paper, we introduce the family of stochastic k-testable tree languages and describe how these models can approximate any stochastic rational tree language. The model is applied to the task of learning a probabilistic k-testable model from a sample of parsed sentences. In particular, a parser for a natural language grammar that incorporates smoothing is shown.  相似文献   

6.
In this paper, we propose an implicit gradient descent algorithm for the classic k-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to k-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.  相似文献   

7.
This paper investigates whether perceived risk online is affected by the language in which a user browses a given website. In order to achieve this objective and test the proposed hypotheses, a 2 × 2 between-subjects experimental design was chosen, using two independent variables with two levels each, namely: culture (Spanish vs. British) and processing language (Spanish vs. English). The final sample comprised 491 individuals (264 Spanish and 227 British). Half the sample browsed in their mother tongue, and the other half in a second language. The results showed that Spanish users perceive less risk when browsing in English than in Spanish, while for the British there was no difference, in terms of perceived risk, between browsing in Spanish or English. Another interesting finding is the moderating effect of message involvement on the processing of information from the website, and thus its effect on the user’s perception of risk.  相似文献   

8.
Although much has been researched about feedback on traditional paper or wordprocessed compositions, responded to offline, little has yet been done on compositions published and responded to in a web 2.0 environment. This study therefore investigates the anonymous asynchronous non-reciprocal feedback given by 139 peers online to 56 English compositions published on the Storybird website by Taiwanese English major university students of two proficiency levels. Feedback responses were downloaded and submitted to detailed qualitative analysis leading to a taxonomy of feedback types which also provided quantitative findings. Overall the feedback was unlike that often reported in traditional studies of feedback given to non-native speakers by peers or teachers. Instead of a corrective and language oriented focus we found more attention paid to content, with a strong element of genuinely communicative response approximating feedback as conversation, consistent with the social function of Web 2.0. There was also evidence of respondents adjusting their feedback to the proficiency of the writer, not just in giving more language oriented feedback to weaker writers but also in mitigating its impact by greater use of interpersonal cues and communicative responses.  相似文献   

9.
The aim of this study was to investigate the role of English as Foreign Language (EFL) learners’ metacognitive listening strategies awareness and podcast-use readiness in using podcasting technology for learning English as a foreign language. One hundred and forty-one EFL students completed Metacognitive Awareness Listening Questionnaire (MALQ) that assessed their awareness and perceived use of listening strategies in five components including planning-evaluation, directed attention, person knowledge, mental translation, and problem solving. They also completed a questionnaire that assessed their readiness to use podcasting in terms of familiarity, attitude, and experience. Information on participants’ frequency of podcast use for learning English, frequency of the internet use, and digital device ownership was also obtained. The result of the analysis revealed that podcasting use was significantly related to metacognitive listening strategies awareness in general and its entire components except mental translation strategies while the strongest correlation was found with problem solving strategies (r = .49, p < 0.01). Podcasting use was also found to be significantly related to perceived podcast-use readiness and internet use hours. Further, multiple regressions showed that perceived podcast-use readiness, problem solving, and person knowledge -in order of power prediction- were good predictors of podcasting use for learning English as a foreign language.  相似文献   

10.
This article explores the ways in which instant messaging (IM) texts are produced by a group of university students in Hong Kong. Even though there exists a body of research on linguistic issues of computer-mediated communication (CMC) in non-Western contexts, much emphasis has been placed on the features of CMC English used by ESL learners. Instead of focusing on one particular language, this article reports on a number of language-related issues that are specific to the Hong Kong CMC context such as the use of Chinese and English, invented Cantonese spellings, and code-mixing. Drawing upon qualitative data such as observational notes and interviews, my study analyzes the text-making practices associated with the use of IM (ICQ and MSN Messenger) within the New Literacy Studies (NLS) framework [Gee, James Paul. (1996). Social linguistics and literacies. London: Routledge; Barton, David, Hamilton, Mary, & Ivani?, Roz (Eds.). (2000). Situated literacies, London: Routledge; Street, Brian V. (1998). New literacies in theory and practice: What are the implications for language in education? Linguistics and Education, 10(1), 1-24], which is a social practice approach to the study of reading and writing in real-life contexts. This article concludes by arguing that learning to produce texts in IM involves an entirely different process from that of formal language learning in the classroom. In a multilingual society like Hong Kong, teachers and educators need to be aware of such differences so as to bridge the gap between actual uses of language in students’ private lives and the form of language used in the formal classroom context.  相似文献   

11.
The difficulty of expressing database queries was examined as a function of the language used. Two distinctly different query methods were investigated. One used a standard database query language, SQL, requiring users to express an English query using a formal syntax and appropriate combinations of boolean operators. The second used a newly designed Truth-table Exemplar-Based Interface (TEBI), which only required subjects to be able to choose examplars from a system-generated table representing a sample database. Through users' choices of critical exemplars, the system could distinguish between interpretations of an otherwise ambiguous English query. Performance was measured by number correct, time to complete queries, and confidence in query correctness. Individual difference analyses were done to examine the relationship between subjects' characteristics and ability to express database queries. Subjects' performance was observed to be both better, and more resistant to variability in age and levels of cognitive skills, when using TEBI than when using SQL to specify queries. Possible reasons for these differences are discussed.  相似文献   

12.
Ron  Dana  Singer  Yoram  Tishby  Naftali 《Machine Learning》1996,25(2-3):117-149
We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second application we construct a simple stochastic model for E.coli DNA.  相似文献   

13.
14.
This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a fitness function incorporating a prior over the space of possible grammars. Our choice of prior is designed to bias learning towards structurally simpler grammars. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. Full details are given of our genetic algorithm (GA) and of our fitness function for grammars. We present the results of a number of experiments in learning grammars for a range of formal languages. Finally we compare the grammars induced using the GA-based approach with those found using the inside-outside algorithm. We find that our approach learns grammars that are both compact and fit the corpus data well.  相似文献   

15.
Stochastic programming with step decision rules (SPSDR) aims to produce efficient solutions to multistage stochastic optimization problems. SPSDR, like plain multistage Stochastic Programming (SP), operates on a Monte Carlo “computing sample” of moderate size that approximates the stochastic process. Unlike SP, SPSDR does not strive to build a balanced event tree out of that sample. Rather, it defines a solution as a special type of decision rule, with the property that the decisions at each stage are piecewise constant functions on the sample of scenarios. Those pieces define a partition of the set of scenarios at each stage t, but the partition at t+1 need not be refinement of the partition at t. However, the rule is constructed so that the non-anticipativity condition is met, a necessary condition to make the rules operational. To validate the method we show how to extend a non-anticipatory decision rule to arbitrary scenarios within a very large validation sample of scenarios. We apply three methods, SPSDR, SP and Robust Optimization, to the same 12-stage problem in supply chain management, and compare them relatively to different objectives and performance criteria. It appears that SPSDR performs better than SP in that it produces a more accurate estimate (prediction) of the value achieved by its solution on the validation sample, and also that the achieved value is better.  相似文献   

16.
Dictionaries and related language reference works constitute a rich but under-exploited resource for the history of languages and of language study in the Middle Ages. Unfortunately, the size and complexity of typical medieval dictionaries make editions and analyses by traditional methods prohibitively expensive in time and money. Using as an example the Latin-Middle English dictionaryMedulla grammatice, the paper describes some central problems in the study of medieval English lexicography and the solutions provided by computers, which, with their immense speed, profound memory, and perfect accuracy can help scholars analyze, edit, and promulgate medieval documents and the linguistic data they contain.  相似文献   

17.
18.
Communication in global software development is hindered by language differences in countries with a lack of English speaking professionals. Machine translation is a technology that uses software to translate from one natural language to another. The progress of machine translation systems has been steady in the last decade. As for now, machine translation technology is particularly appealing because it might be used, in the form of cross-language chat services, in countries that are entering into global software projects. However, despite the recent progress of the technology, we still lack a thorough understanding of how real-time machine translation affects communication. In this paper, we present a set of empirical studies with the goal of assessing to what extent real-time machine translation can be used in distributed, multilingual requirements meetings instead of English. Results suggest that, despite far from 100 % accurate, real-time machine translation is not disruptive of the conversation flow and, therefore, is accepted with favor by participants. However, stronger effects can be expected to emerge when language barriers are more critical. Our findings add to the evidence about the recent advances of machine translation technology and provide some guidance to global software engineering practitioners in regarding the losses and gains of using English as a lingua franca in multilingual group communication, as in the case of computer-mediated requirements meetings.  相似文献   

19.
In this paper we use results and techniques from the theory of rational power series to show that the complement of a one-letter stochastic language is stochastic, but that the family of stochastic languages is closed neither under union and intersection nor under product and homomorphism. We also give a condition on the poles of a rational one-variable power seriesr to ensure that the stochastic language defined byr and any cut-point is rational.  相似文献   

20.
Learning from observation (LfO), also known as learning from demonstration, studies how computers can learn to perform complex tasks by observing and thereafter imitating the performance of a human actor. Although there has been a significant amount of research in this area, there is no agreement on a unified terminology or evaluation procedure. In this paper, we present a theoretical framework based on Dynamic-Bayesian Networks (DBNs) for the quantitative modeling and evaluation of LfO tasks. Additionally, we provide evidence showing that: (1) the information captured through the observation of agent behaviors occurs as the realization of a stochastic process (and often not just as a sample of a state-to-action map); (2) learning can be simplified by introducing dynamic Bayesian models with hidden states for which the learning and model evaluation tasks can be reduced to minimization and estimation of some stochastic similarity measures such as crossed entropy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号