首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A set of words is factorially balanced if the set of all the factors of its words is balanced. We prove that if all words of a factorially balanced set have a finite index, then this set is a subset of the set of factors of a Sturmian word. Moreover, characterizing the set of factors of a given length n of a Sturmian word by the left special factor of length n−1 of this Sturmian word, we provide an enumeration formula for the number of sets of words that correspond to some set of factors of length n of a Sturmian word.  相似文献   

2.
We characterize all quasiperiodic Sturmian words: A Sturmian word is not quasiperiodic if and only if it is a Lyndon word. Moreover, we study links between Sturmian morphisms and quasiperiodicity.  相似文献   

3.
A balanced word is one in which any two factors of the same length contain the same number of each letter of the alphabet up to one. Finite binary balanced words are called Sturmian words. A Sturmian word is bispecial if it can be extended to the left and to the right with both letters remaining a Sturmian word. There is a deep relation between bispecial Sturmian words and Christoffel words, that are the digital approximations of Euclidean segments in the plane. In 1997, J. Berstel and A. de Luca proved that palindromic bispecial Sturmian words are precisely the maximal internal factors of primitive Christoffel words. We extend this result by showing that bispecial Sturmian words are precisely the maximal internal factors of all Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the language of Sturmian words.  相似文献   

4.

In order to assess if productivity based on extrapolated data is a good predictor of longer texts, an experimental study was conducted. Two full-sized text input devices for touch typing and two miniaturized for tapping were used, all featuing QWERTY layout, in a repeated measurement design. Twenty subjects were exposed to both a task within the limit of working memory (nine words) and four running memory tasks (approx. 275 words). For miniaturized tapping keyboards, extrapolated data significantly underestimated both entry speed (uncorrected wpm, up to 17%) and character error rate (up to 61%) whereas it significantly overestimated ratio of correct words (up to 62%) of running memory tasks. Further, error-corrected entry speed was significantly overestimated up to a factor of 2.7. Results based on extrapolated productivity metrics must therefore be interpreted with caution. Running memory tasks with text length of more than 32 words is needed to assess productivity of text input devices if tapping is used.  相似文献   

5.
In order to assess if productivity based on extrapolated data is a good predictor of longer texts, an experimental study was conducted. Two full-sized text input devices for touch typing and two miniaturized for tapping were used, all featuing QWERTY layout, in a repeated measurement design. Twenty subjects were exposed to both a task within the limit of working memory (nine words) and four running memory tasks (approx. 275 words). For miniaturized tapping keyboards, extrapolated data significantly underestimated both entry speed (uncorrected wpm, up to 17%) and character error rate (up to 61%) whereas it significantly overestimated ratio of correct words (up to 62%) of running memory tasks. Further, error-corrected entry speed was significantly overestimated up to a factor of 2.7. Results based on extrapolated productivity metrics must therefore be interpreted with caution. Running memory tasks with text length of more than 32 words is needed to assess productivity of text input devices if tapping is used.  相似文献   

6.
Blanchet-Sadri et al. have shown that Avoidability, or the problem of deciding the avoidability of a finite set of partial words over an alphabet of size k≥2, is NP-hard [F. Blanchet-Sadri, R. Jungers, J. Palumbo, Testing avoidability on sets of partial words is hard, Theoret. Comput. Sci. 410 (2009) 968-972]. Building on their work, we analyze in this paper the complexity of natural variations on the problem. While some of them are NP-hard, others are shown to be efficiently decidable. Using some combinatorial properties of de Bruijn graphs, we establish a correspondence between lengths of cycles in such graphs and periods of avoiding words, resulting in a tight bound for periods of avoiding words. We also prove that Avoidability can be solved in polynomial space, and reduces in polynomial time to the problem of deciding the avoidability of a finite set of partial words of equal length over the binary alphabet. We give a polynomial bound on the period of an infinite avoiding word, in the case of sets of full words, in terms of two parameters: the length and the number of words in the set. We give a polynomial space algorithm to decide if a finite set of partial words is avoided by a non-ultimately periodic infinite word. The same algorithm also decides if the number of words of length n avoiding a given finite set of partial words grows polynomially or exponentially with n.  相似文献   

7.
We define a code in a sofic shift as a set of blocks of symbols of the shift such that any block of the shift has at most one decomposition into code words. It is maximal if it is not strictly included in another one. Such a code is complete in the sofic shift if any block of the shift occurs within some concatenation of code words. We prove that a maximal code in an irreducible sofic shift is complete in this shift. We give an explicit construction of a regular completion of a regular code in a sofic shift. This extends the well known result of Ehrenfeucht and Rozenberg to the case of codes in sofic systems. We also give a combinatorial proof of a result concerning the polynomial of a code in a sofic shift.  相似文献   

8.
In designing data structures for text databases, it is valuable to know how many different words are likely to be encountered in a particular collection. For example, vocabulary accumulation is central to index construction for text database systems; it is useful to be able to estimate the space requirements and performance characteristics of the main-memory data structures used for this task. However, it is not clear how many distinct words will be found in a text collection or whether new words will continue to appear after inspecting large volumes of data. We propose practical definitions of a word and investigate new word occurrences under these models in a large text collection. We inspected around two billion word occurrences in 45 GB of World Wide Web documents and found just over 9.74 million different words in 5.5 million documents; overall, 1 word in 200 was new. We observe that new words continue to occur, even in very large datasets, and that choosing stricter definitions of what constitutes a word has only limited impact on the number of new words found.  相似文献   

9.
Document classification and summarization are very important for document text retrieval. Generally, humans can recognize fields such as ?Sports? or ?Politics? based on specific words called Field Association (FA) words in those document fields. The traditional method causes misleading redundant words (unnecessary words) to be registered because the quality of the resulting FA words depends on learning data pre-classified by hand. Therefore recall and precision of document classification are degraded if the classified fields classified by hand are ambiguous. We propose two criteria: deleting unnecessary words with low frequencies, and deleting unnecessary words using category information. Moreover, using the proposed criteria unnecessary words can be deleted from the FA words dictionary created by the traditional method. Experimental results showed that 25% of 38 372 FA word candidates were identified as unnecessary and deleted automatically when the presented method was used. Furthermore, precision and F-measure were improved by 26% and 15%, respectively, compared with the traditional method.  相似文献   

10.
A language L is closed if L=L?. We consider an operation on closed languages, L−?, that is an inverse to Kleene closure. It is known that if L is closed and regular, then L−? is also regular. We show that the analogous result fails to hold for the context-free languages. Along the way we find a new relationship between the unbordered words and the prime palstars of Knuth, Morris, and Pratt. We use this relationship to enumerate the prime palstars, and we prove that neither the language of all unbordered words nor the language of all prime palstars is context-free.  相似文献   

11.
多载体数据流中的特定信息识别研究   总被引:1,自引:0,他引:1       下载免费PDF全文
郑德权  胡熠  于浩  赵铁军  王青松 《软件学报》2003,14(9):1538-1543
提出了一种识别多载体数据流中包含的特定信息的新方法.该方法按照特征词及其拼音匹配规则,基于统计自然语言理论,通过自动的归纳学习,将从语料库中获得的词性间的转移值作为系统知识,利用有效的知识逼近策略判断真实数据流中的特征词与其上下文的关系,并得到特征词在真实文本中的评测值,以此来考查真实数据流中出现的全部特征词与在语料中所学到的特征词上下文搭配规则上的相似程度.如果整个数据流的评测值超过阈值,该数据流将被屏蔽.实验结果表明,根据该方法开发的识别及监控多载体数据注中不良信息的实验系统取得很好的效果.  相似文献   

12.
We show that language inclusion for languages of infinite words defined by nondeterministic automata can be tested in polynomial time if the automata are unambiguous and have simple acceptance conditions, namely safety or reachability conditions. An automaton with safety condition accepts an infinite word if there is a run that never visits a forbidden state, and an automaton with reachability condition accepts an infinite word if there is a run that visits an accepting state at least once.  相似文献   

13.
We present a new modeling approach for speaker recognition that uses the maximum-likelihood linear regression (MLLR) adaptation transforms employed by a speech recognition system as features for support vector machine (SVM) speaker models. This approach is attractive because, unlike standard frame-based cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification without data fragmentation. We discuss the basics of the MLLR-SVM approach, and show how it can be enhanced by combining transforms relative to multiple reference models, with excellent results on recent English NIST evaluation sets. We then show how the approach can be applied even if no full word-level recognition system is available, which allows its use on non-English data even without matching speech recognizers. Finally, we examine how two recently proposed algorithms for intersession variability compensation perform in conjunction with MLLR-SVM.  相似文献   

14.
15.
In this paper, we study the problem of adding a large number of new words into a Chinese thesaurus according to their definitions in a Chinese dictionary, while minimizing the effort of hand tagging. To deal with the problem, we first make use of a kind of supervised learning technique to learn a set of defining formats for each class in the thesaurus, which tries to characterize the regularities about the definitions of the words in the class. We then use traditional techniques in Graph theory to derive a minimal subset of the new words to be added into the thesaurus, which meets the following condition: if we add the new words in the subset into the thesaurus by hand, the other new words can be added into the thesaurus automatically by matching their definitions with the defining formats of each class in the thesaurus. The method uses little, if any, language-specific or thesaurus-specific knowledge, and can be applied to the thesauri of other languages. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

16.
Music Information Retrieval Using Social Tags and Audio   总被引:1,自引:0,他引:1  
In this paper we describe a novel approach to applying text-based information retrieval techniques to music collections. We represent tracks with a joint vocabulary consisting of both conventional words, drawn from social tags, and audio muswords, representing characteristics of automatically-identified regions of interest within the signal. We build vector space and latent aspect models indexing words and muswords for a collection of tracks, and show experimentally that retrieval with these models is extremely well-behaved. We find in particular that retrieval performance remains good for tracks by artists unseen by our models in training, and even if tags for their tracks are extremely sparse.   相似文献   

17.
Reasoning about multithreaded object-oriented programs is difficult, due to the non-local nature of object aliasing, data races, and deadlocks. We propose a programming model that prevents data races and deadlocks, and supports local reasoning in the presence of object aliasing and concurrency. Our programming model builds on the multi-threading and synchronization primitives as they are present in current mainstream languages. Java or C# programs developed according to our model can be annotated by means of stylized comments to make the use of the model explicit. We show that such annotated programs can be formally verified to comply with the programming model. In other words, if the annotated program verifies, the underlying Java or C# program is guaranteed to be free from data races and deadlocks, and it is sound to reason locally about program behavior. Our approach supports immutable objects as well as static fields and static initializers. We have implemented a verifier for programs developed according to our model in a custom build of the Spec# programming system, and have validated our approach on a case study.  相似文献   

18.
A set of words X over a finite alphabet A is said to be unavoidable if all but finitely many words in A* have a factor in X. We examine the problem of calculating the cardinality of minimal unavoidable sets of words of uniform length; we correct an error in [8], state a conjecture offering a formula for the minimum size of these so called n-good sets for all values of n, and show that the conjecture is correct in an infinite number of cases.  相似文献   

19.
Recently the second two authors characterized quasiperiodic Sturmian words, proving that a Sturmian word is non-quasiperiodic if and only if, it is an infinite Lyndon word. Here we extend this study to episturmian words (a natural generalization of Sturmian words) by describing all the quasiperiods of an episturmian word, which yields a characterization of quasiperiodic episturmian words in terms of their directive words. Even further, we establish a complete characterization of all episturmian words that are Lyndon words. Our main results show that, unlike the Sturmian case, there is a much wider class of episturmian words that are non-quasiperiodic, besides those that are infinite Lyndon words. Our key tools are morphisms and directive words, in particular normalized directive words, which we introduced in an earlier paper. Also of importance is the use of return words to characterize quasiperiodic episturmian words, since such a method could be useful in other contexts.  相似文献   

20.
Data words are words with additional edges that connect pairs of positions carrying the same data value. We consider a natural model of automaton walking on data words, called Data Walking Automaton, and study its closure properties, expressiveness, and the complexity of some basic decision problems. Specifically, we show that the class of deterministic Data Walking Automata is closed under all Boolean operations, and that the class of non-deterministic Data Walking Automata has decidable emptiness, universality, and containment problems. We also prove that deterministic Data Walking Automata are strictly less expressive than non-deterministic Data Walking Automata, which in turn are captured by Class Memory Automata.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号