首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently Clark and Eyraud (2007) [10] have shown that substitutable context-free languages, which capture an aspect of natural language phenomena, are efficiently identifiable in the limit from positive data. Generalizing their work, this paper presents a polynomial-time learning algorithm for new subclasses of multiple context-free languages with variants of substitutability.  相似文献   

2.
We discuss new efficient learning algorithms for certain subclasses of regular and even linear languages based on the notion of terminal distinguishability introduced by Radhakrishnan and Nagaraja. The learning model we use is identification in the limit from positive samples as proposed by Gold and further studied by Angluin and many others. All classes we introduce in this paper are modifications of the language families TDRL (terminal distinguishable regular) and TDELL (terminal distinguishable even linear) defined by Radhakrishnan and Nagaraja. A tradeoff between the power of the language class and the time complexity of the identification algorithm is observed when the size of the underlying alphabet is considered as an additional parameter. Extending the classes of efficiently learnable languages is also important from the viewpoint of applications of the algorithms. One of these extensions is obtained basically by making use of the concept of control language which is known from formal language theory and has been employed for learning theoretic purposes in particular by Takada.  相似文献   

3.
Yokomori  Takashi 《Machine Learning》1995,19(2):153-179
This paper deals with the polynomial-time learnability of a language class in the limit from positive data, and discusses the learning problem of a subclass of deterministic finite automata (DFAs), called strictly deterministic automata (SDAs), in the framework of learning in the limit from positive data. We first discuss the difficulty of Pitt's definition in the framework of learning in the limit from positive data, by showing that any class of languages with an infinite descending chain property is not polynomial-time learnable in the limit from positive data. We then propose new definitions for polynomial-time learnability in the limit from positive data. We show in our new definitions that the class of SDAs is iteratively, consistently polynomial-time learnable in the limit from positive data. In particular, we present a learning algorithm that learns any SDA M in the limit from positive data, satisfying the properties that (i) the time for updating a conjecture is at most O(lm), (ii) the number of implicit prediction errors is at most O(ln), where l is the maximum length of all positive data provided, m is the alphabet size of M and n is the size of M, (iii) each conjecture is computed from only the previous conjecture and the current example, and (iv) at any stage the conjecture is consistent with the sample set seen so far. This is in marked contrast to the fact that the class of DFAs is neither learnable in the limit from positive data nor polynomial-time learnable in the limit.  相似文献   

4.
5.
In this paper we introduce a paradigm for learning in the limit of potentially infinite languages from all positive data and negative counterexamples provided in response to the conjectures made by the learner. Several variants of this paradigm are considered that reflect different conditions/constraints on the type and size of negative counterexamples and on the time for obtaining them. In particular, we consider the models where (1) a learner gets the least negative counterexample; (2) the size of a negative counterexample must be bounded by the size of the positive data seen so far; (3) a counterexample can be delayed. Learning power, limitations of these models, relationships between them, as well as their relationships with classical paradigms for learning languages in the limit (without negative counterexamples) are explored. Several surprising results are obtained. In particular, for Gold's model of learning requiring a learner to syntactically stabilize on correct conjectures, learners getting negative counterexamples immediately turn out to be as powerful as the ones that do not get them for indefinitely (but finitely) long time (or are only told that their latest conjecture is not a subset of the target language, without any specific negative counterexample). Another result shows that for behaviorally correct learning (where semantic convergence is required from a learner) with negative counterexamples, a learner making just one error in almost all its conjectures has the “ultimate power”: it can learn the class of all recursively enumerable languages. Yet another result demonstrates that sometimes positive data and negative counterexamples provided by a teacher are not enough to compensate for full positive and negative data.  相似文献   

6.
7.
This paper describes a machine learning system that discovered a “negative motif”, in transmembrane domain identification from amino acid sequences, and reports its experiments on protein data using PIR database. We introduce a decision tree whose nodes are labeled with regular patterns. As a hypothesis, the system produces such a decision tree for a small number of randomly chosen positive and negative examples from PIR. Experiments show that our system finds reasonable hypotheses very successfully. As a theoretical foundation, we show that the class of languages defined by decesion trees of depth at mostd overk-variable regular patterns is polynomial-time learnable in the sense of probably approximately correct (PAC) learning for any fixedd, k≥0.  相似文献   

8.
A pattern is a finite string of constants and variables (cf. [1]). The language of a pattern is the set of all strings which can be obtained by substituting non-null strings of constants for the variables of the pattern. In the present paper, we consider the problem of learning pattern languages from examples. As a main result we present an inconsistent polynomial-time algorithm which identifies every pattern language in the limit. Furthermore, we investigate inference of arbitrary pattern languages within the framework of learning from good examples. Finally, we show that every pattern language can be identified in polynomial time from polynomially many disjointness queries, only.  相似文献   

9.
There exist several works that study the class of reversible languages defined as the union closure of 0-reversible languages, their properties and suitable representations. In this work we define and study the class of locally reversible languages, defined as the union closure of k-reversible languages. We characterize the class and prove that it is a local (positive) variety of formal languages. We also extend the definition of quasi-reversible automata to deal with locally reversible languages and propose a polynomial algorithm to obtain, for any given locally k-reversible language, a quasi-k-reversible automaton.  相似文献   

10.
A new algorithm is presented for testing if a regular language is locally threshold testable. The new algorithm is slower than existing algorithms, but its correctness proof is shorter. The proof idea is to restate the problem in Presburger arithmetic.  相似文献   

11.
A natural approach towards powerful machine learning systems is to enable options for additional machine/user interactions, for instance by allowing the system to ask queries about the concept to be learned. This motivates the development and analysis of adequate formal learning models.In the present paper, we investigate two different types of query learning models in the context of learning indexable classes of recursive languages: Angluin's original model and a relaxation thereof, called learning with extra queries. In the original model the learner is restricted to query languages belonging to the target class, while in the new model it is allowed to query other languages, too. As usual, the following standard types of queries are considered: superset, subset, equivalence, and membership queries.The learning capabilities of the resulting query learning models are compared to one another and to different versions of Gold-style language learning from only positive data and from positive and negative data (including finite learning, conservative inference, and learning in the limit). A complete picture of the relation of all these models has been elaborated. A couple of interesting differences and similarities between query learning and Gold-style learning have been observed. In particular, query learning with extra superset queries coincides with conservative inference from only positive data. This result documents the naturalness of the new query model.  相似文献   

12.
Combinatorial property testing deals with the following relaxation of decision problems: Given a fixed property and an input x, one wants to decide whether x satisfies the property or is “far” from satisfying it. The main focus of property testing is in identifying large families of properties that can be tested with a certain number of queries to the input. In this paper we study the relation between the space complexity of a language and its query complexity. Our main result is that for any space complexity s(n) ≤ log n there is a language with space complexity O(s(n)) and query complexity 2Ω(s(n)). Our result has implications with respect to testing languages accepted by certain restricted machines. Alon et al. [FOCS 1999] have shown that any regular language is testable with a constant number of queries. It is well known that any language in space o(log log n) is regular, thus implying that such languages can be so tested. It was previously known that there are languages in space O(log n) that are not testable with a constant number of queries and Newman [FOCS 2000] raised the question of closing the exponential gap between these two results. A special case of our main result resolves this problem as it implies that there is a language in space O(log log n) that is not testable with a constant number of queries. It was also previously known that the class of testable properties cannot be extended to all context-free languages. We further show that one cannot even extend the family of testable languages to the class of languages accepted by single counter machines.   相似文献   

13.
14.
15.
We develop a novel learning algorithm RTI for identifying a deterministic real-time automaton (DRTA) from labeled time-stamped event sequences. The RTI algorithm is based on the current state of the art in deterministic finite-state automaton (DFA) identification, called evidence-driven state-merging (EDSM). In addition to having a DFA structure, a DRTA contains time constraints between occurrences of consecutive events. Although this seems a small difference, we show that the problem of identifying a DRTA is much more difficult than the problem of identifying a DFA: identifying only the time constraints of a DRTA given its DFA structure is already NP-complete. In spite of this additional complexity, we show that RTI is a correct and complete algorithm that converges efficiently (from polynomial time and data) to the correct DRTA in the limit. To the best of our knowledge, this is the first algorithm that can identify a timed automaton model from time-stamped event sequences.  相似文献   

16.
本文对泛型编程的核心思想和技术特征进行了较为深入的分析,介绍了泛型编程在语言实现上的现状与不足,着重论述了作者针对这些不足做出的改进工作,即对类型参数及其约束机制进行扩展以支持通用、高效的算法和数据结构的设计,并以Java语言作为实施例,详细介绍了如何通过现有对象技术来实现比较完整的泛型编程,是现有面向对象语类泛型编程的首例。  相似文献   

17.
Learning Syntax by Automata Induction   总被引:1,自引:1,他引:0  
In this paper we propose an explicit computer model for learning natural language syntax based on Angluin's (1982) efficient induction algorithms, using a complete corpus of grammatical example sentences. We use these results to show how inductive inference methods may be applied to learn substantial, coherent subparts of at least one natural language — English — that are not susceptible to the kinds of learning envisioned in linguistic theory. As two concrete case studies, we show how to learn English auxiliary verb sequences (such as could be taking, will have been taking) and the sequences of articles and adjectives that appear before noun phrases (such as the very old big deer). Both systems can be acquired in a computationally feasible amount of time using either positive examples, or, in an incremental mode, with implicit negative examples (examples outside a finite corpus are considered to be negative examples). As far as we know, this is the first computer procedure that learns a full-scale range of noun subclasses and noun phrase structure. The generalizations and the time required for acquisition match our knowledge of child language acquisition for these two cases. More importantly, these results show that just where linguistic theories admit to highly irregular subportions, we can apply efficient automata-theoretic learning algorithms. Since the algorithm works only for fragments of language syntax, we do not believe that it suffices for all of language acquisition. Rather, we would claim that language acquisition is nonuniform and susceptible to a variety of acquisition strategies; this algorithm may be one these.  相似文献   

18.
We develop theory on the efficiency of identifying (learning) timed automata. In particular, we show that: (i) deterministic timed automata cannot be identified efficiently in the limit from labeled data and (ii) that one-clock deterministic timed automata can be identified efficiently in the limit from labeled data. We prove these results based on the distinguishability of these classes of timed automata. More specifically, we prove that the languages of deterministic timed automata cannot, and that one-clock deterministic timed automata can be distinguished from each other using strings in length bounded by a polynomial. In addition, we provide an algorithm that identifies one-clock deterministic timed automata efficiently from labeled data.Our results have interesting consequences for the power of clocks that are interesting also out of the scope of the identification problem.  相似文献   

19.
基于蛋白质CGR的线粒体蛋白质序列比对   总被引:1,自引:0,他引:1       下载免费PDF全文
利用蛋白质混沌游走表示法(PCGR)提出一种新的蛋白质序列比对方法。通过计算两序列之间的PCGR点距离,就可以找到所有的局部相似片断。根据氨基酸的化学物理性质把氨基酸分成4和7类,针对分类与无分类的各种情况进行蛋白质序列比对。为了更直观地描述比对结果,采用点阵图来表示比对数据,不仅能显示两序列间所有相同片断,还可以体现出序列的相似性。  相似文献   

20.
Language Identification is the task of identifying a language from a given spoken utterance. Main task of a language identifier is to design an efficient algorithm which helps a machine to identify correctly a particular language from a given audio sample. We have proposed here a hybrid approach for identifying a language which is a combination of Vector Quantization (VQ) and Gaussian Mixture Models (GMM). A brief review of work carried out in the area of Speaker Identification using VQ-GMM hybrid approach is discussed here. We have carried out experiments for identifying four Indian Languages—Assamese, Bengali, Hindi and Indian English. The experiments were carried out on our own recorded standard language database collected from 50 speakers. Speech features were extracted using MFCCs. Results show that after applying hybrid approach, accuracy is best with highest mixture order and with the increase in mixture order, accuracy increases uniformly for all four languages. It is also concluded here that hybrid approach gives better results when compared with the baseline GMM system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号