首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

The running key cipher uses meaningful text as the key. Since the message also consists of meaningful text, the result is obtained by combining valid words. Automated attacks can find all such combinations that yield a given ciphertext. The results of these attacks are presented in this paper.  相似文献   

2.
字是语言文字的基本组成单位,字形结构统计研究是自然语言处理的基础,为字属性分析、输入法设计、排序、语音合成和字符信息熵研究等提供理论依据。该文通过分析藏文字形结构的特征,对藏文字的字形结构分成独体字和合体字,合体字按其构件的结构位和所含构件数进行分类。设计了藏文字形结构统计系统模型和算法,从约含8 500万藏文字的450M语料中对藏文字形结构进行统计,建立了藏文字形结构分布统计表,并对统计结果进行了分析。  相似文献   

3.
Large displays enable users to perform several tasks simultaneously. Under such circumstances, notification information provided through the concept of ambient displays plays a vital role in assisting users to switch among tasks. This paper presents the experimental results of a notification system design in the peripheral region of large displays. The aim is to provide guidance for notification information design by investigating detection and discrimination performance of human observers when visual notification information is presented away from the foveal region and viewed using peripheral vision. The proposed notification system was designed using an array of glyphs. Each glyph is a small gray square with a fixed size of 60 × 60 pixels. By changing the gray levels of adjacent glyphs dynamically, a glyph array presents a particular dynamic pattern. The experiments involved testing factors that comprised the visual angle, size and shape of glyph arrays, frequency of temporal modulation, phase shift of each pattern, and number of stimuli. The results show that glyph arrays are detected accurately if they are larger, even at wide viewing angles, and that the number of glyphs in a glyph array affects the performance more than the shapes of glyph arrays do. Furthermore, the discrimination performance is higher when both the frequency and phase are manipulated simultaneously (multidimensional design), compared with the case when each of these dimensions is varied separately (single-dimensional design). When the number of stimuli is set at 8, for example, users can maintain an accuracy rate of 70% for the multidimensional design, whereas the accuracy rate is only approximately 60% for the single-dimensional design.  相似文献   

4.
用户驱动的微博可视化搜索   总被引:1,自引:1,他引:0       下载免费PDF全文
目的 微博作为一个社交与信息分享平台,日信息量数以亿计,如何高效地搜索用户感兴趣的信息成为亟待解决的问题.提出了一个新颖的用户驱动的可视化微博信息搜索方法.方法 采用特征词及其权重来建模用户的兴趣特征,并基于此建立用户与特征词之间的相关关系.搜索微博信息时,首先定位与检索词相关的微博用户,在相关微博用户的微博中筛选与搜索相关的微博.另外,采用关注度传递算法对搜索进行扩展,将返回的特征词和微博用户进行可视化展示,并提供交互供用户查看与选定特征词或用户相关的微博.结果 实验结果表明,基于本文方法,用户可以高效地定位感兴趣的微博信息.结论 以用户作为桥梁,大大缩小了微博信息的搜索范围,同时采用关注度传递算法对搜索进行扩展,对结果进行可视化展示.实验表明本文方法能够使用户快速搜索出感兴趣的信息.  相似文献   

5.
由于目前计算机缺乏对汉字字形统一有效的形式化描述和比对计算方法,致使无法描画输入所需的各种可能汉字,也无法利用计箅机对字形进行比对分析.提出一种具有颗粒度适当、无歧义、规范化基元,能描述各种可能字形(包括错字、古籍异体字、拼合字)骨架异同的笔段网格汉字字形描述方法;并基于该方法给出了字形比对算法,它能自动提取字形包含的简单笔画和复合笔画,根据字形的不同自适应地选取复合笔画或简单笔画,并以此为单位进行比对;最后将计算两字形最优配对笔画间的向量距离总和作为比对结果.实验结果表明,该方法具有很强的字形描述能力,字形比对算法对结构规范字形的比对准确牢较高,可用于支持各种汉字的描画输入及面向字形比对分析的各种应用.  相似文献   

6.
Lexical collocations have particular statistical distributions. We have developed a set of statistical techniques for retrieving and identifying collocations from large textual corpora. The techniques we developed are able to identify collocations of arbitrary length as well as flexible collocations. These techniques have been implemented in a lexicographic tool, Xtract, which is able to automatically acquire collocations with high retrieval performance. Xtract works in three stages. The first stage is based on a statistical technique for identifying word pairs involved in a syntactic relation. The words can appear in the text in any order and can be separated by an arbitrary number of other words. The second stage is based on a technique to extract n-word collocations (or n-grams) in a much simpler way than related methods. These collocations can involve closed class words such as particles and prepositions. A third stage is then applied to the output of stage one and applies parsing techniques to sentences involving a given word pair in order to identify the proper syntactic relation between the two words. A secondary effect of the third stage is to filter out a number of candidate collocations as irrelevant and thus produce higher quality output. In this paper we present an overview of Xtract and we describe several uses for Xtract and the knowledge it retrieves such as language generation and machine translation.Frank Smadja is in the Department of Computer Science at Columbia University and has been working on lexical collocations for his doctoral thesis.  相似文献   

7.
This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.  相似文献   

8.
Abstract

A uniform data structure is defined to represent sentence components (words) for a parallel processor of the natural language. The basic notation of rough sets is used for the representation of sentence elements in the computer, to exhibit their context and to select proper meanings. An approximation space reflecting the context of any word in a sentence is defined by induction from the most general meanings of words adjoining that word in the graph model of the sentence and from a contextual knowledge base. The contextual knowledge base specifies the meaning of each word with respect to the most general words expected to adjoin that word in the graph model of the sentence.  相似文献   

9.
In this paper we address the question of how to quickly model glyph‐based Geographic Information System visualizations. Our solution is based on using shape grammars to set up the different aspects of a visualization, including the geometric content of the visualization, methods for resolving layout conflicts and interaction methods. Our approach significantly increases modelling efficiency over similarly flexible systems currently in use.  相似文献   

10.
Abstract

A method for evaluating the effectiveness of different feature combinations and training strategies is described. Preliminary tests have been made using two groups of feature combinations derived from SPOT High Resolution Visible (HRV) data and two sets of training samples. The method is objective, and needs no ground confirmation or interaction from the image analyst. It is recommended as a surrogate for detailed accuracy assessment when attempting to find an optimum set of training pixels or feature combinations for image classification.  相似文献   

11.

This paper describes a spell checking system that learns user behavior. Based on that insight, the system with high likelihood suggests correct replacements for incorrect words and declares unknown, but correct words to be correct. The system relies on three dictionaries, a so-called user history file, and two logic modules to carry out the learning and spell checking. Tests have proved that the system is very fast and highly reliable. Specifically, the top ranked replacement word for an incorrect word was the correct word 96% of the time. Words that were not in the large dictionary but that nevertheless were correct, for example, persons' names, compound words, and control commands, were declared to be correct 82% of the time. It was never observed that an incorrect word was accepted as correct.  相似文献   

12.

The concept of a rough finite-state semi-automaton, in which the result of any transition is a rough set of states, is formulated and then extended to that of a rough finite-state automaton by adding the set of accepting states. The behavior of such an automaton is defined and turns out to be a rough set of input words.  相似文献   

13.
RALPH ERSKINE 《Cryptologia》2013,37(4):332-336
Abstract

Simple substitution ciphers are a class of puzzles often found in newspapers, in which each plaintext letter is mapped to a fixed ciphertext letter and spaces are preserved. In this article, a system for automatically solving them is described even when the ciphertext is too short for statistical analysis, and when the puzzle contains non-dictionary words. The approach is based around a dictionary attack; several important performance optimizations are described as well as effective techniques for dealing with non-dictionary words. Quantitative performance results for several variations of the approach and two other implementations are presented.  相似文献   

14.
ContextTraceability relations among software artifacts often tend to be missing, outdated, or lost. For this reason, various traceability recovery approaches—based on Information Retrieval (IR) techniques—have been proposed. The performances of such approaches are often influenced by “noise” contained in software artifacts (e.g., recurring words in document templates or other words that do not contribute to the retrieval itself).AimAs a complement and alternative to stop word removal approaches, this paper proposes the use of a smoothing filter to remove “noise” from the textual corpus of artifacts to be traced.MethodWe evaluate the effect of a smoothing filter in traceability recovery tasks involving different kinds of artifacts from five software projects, and applying three different IR methods, namely Vector Space Models, Latent Semantic Indexing, and Jensen–Shannon similarity model.ResultsOur study indicates that, with the exception of some specific kinds of artifacts (i.e., tracing test cases to source code) the proposed approach is able to significantly improve the performances of traceability recovery, and to remove “noise” that simple stop word filters cannot remove.ConclusionsThe obtained results not only help to develop traceability recovery approaches able to work in presence of noisy artifacts, but also suggest that smoothing filters can be used to improve performances of other software engineering approaches based on textual analysis.  相似文献   

15.
ContextSoftware has become an innovative solution nowadays for many applications and methods in science and engineering. Ensuring the quality and correctness of software is challenging because each program has different configurations and input domains. To ensure the quality of software, all possible configurations and input combinations need to be evaluated against their expected outputs. However, this exhaustive test is impractical because of time and resource constraints due to the large domain of input and configurations. Thus, different sampling techniques have been used to sample these input domains and configurations.ObjectiveCombinatorial testing can be used to effectively detect faults in software-under-test. This technique uses combinatorial optimization concepts to systematically minimize the number of test cases by considering the combinations of inputs. This paper proposes a new strategy to generate combinatorial test suite by using Cuckoo Search concepts.MethodCuckoo Search is used in the design and implementation of a strategy to construct optimized combinatorial sets. The strategy consists of different algorithms for construction. These algorithms are combined to serve the Cuckoo Search.ResultsThe efficiency and performance of the new technique were proven through different experiment sets. The effectiveness of the strategy is assessed by applying the generated test suites on a real-world case study for the purpose of functional testing.ConclusionResults show that the generated test suites can detect faults effectively. In addition, the strategy also opens a new direction for the application of Cuckoo Search in the context of software engineering.  相似文献   

16.

Optical character recognition (OCR) has proved a powerful tool for the digital analysis of printed historical documents. However, its ability to localize and identify individual glyphs is challenged by the tremendous variety in historical type design, the physicality of the printing process, and the state of conservation. We propose to mitigate these problems by a downstream fine-tuning step that corrects for pathological and undesirable extraction results. We implement this idea by using a joint energy-based model which classifies individual glyphs and simultaneously prunes potential out-of-distribution (OOD) samples like rubrications, initials, or ligatures. During model training, we introduce specific margins in the energy spectrum that aid this separation and explore the glyph distribution’s typical set to stabilize the optimization procedure. We observe strong classification at 0.972 AUPRC across 42 lower- and uppercase glyph types on a challenging digital reproduction of Johannes Balbus’ Catholicon, matching the performance of purely discriminative methods. At the same time, we achieve OOD detection rates of 0.989 AUPRC and 0.946 AUPRC for OOD ‘clutter’ and ‘ligatures’ which substantially improves upon recently proposed OOD detection techniques. The proposed approach can be easily integrated into the postprocessing phase of current OCR to aid reproduction and shape analysis research.

  相似文献   

17.
In this survey article, we review glyph-based visualization techniques that have been exploited when visualizing spatial multivariate medical data. To classify these techniques, we derive a taxonomy of glyph properties that is based on classification concepts established in information visualization. Considering both the glyph visualization as well as the interaction techniques that are employed to generate or explore the glyph visualization, we are able to classify glyph techniques into two main groups: those supporting pre-attentive and those supporting attentive processing. With respect to this classification, we review glyph-based techniques described in the medical visualization literature. Based on the outcome of the literature review, we propose design guidelines for glyph visualizations in the medical domain.  相似文献   

18.
Abstract. This paper describes a method for the correction of optically read Devanagari character strings using a Hindi word dictionary. The word dictionary is partitioned in order to reduce the search space besides preventing forced matching to an incorrect word. The dictionary partitioning strategy takes into account the underlying OCR process. The dictionary words at the top level have been divided into two partitions, namely: a short-words partition and the remaining words partition. The short-word partition is sub-partitioned using the envelope information of the words. The envelope consists of the number of top, lower, core modifiers along with the number of core charactersp. Devanagari characters are written in three strips. Most of the characters referred to as core characters are written in the middle strip. The remaining words are further partitioned using tags. A tag is a string of fixed length associated with each partition. The correction process uses a distance matrix for a assigning penalty for a mismatch. The distance matrix is based on the information about errors that the classification process is known to make and the confidence figure that the classification process associates with its output. An improvement of approximately 20% in recognition performance is obtained. For a short word, 590 words are searched on average from 14 sub-partitions of the short-words partition before an exact match is found. The average number of partitions and the average number of words increase to 20 and 1585, respectively, when an exact match is not found. For tag-based partitions, on an average, 100 words from 30 partitions are compared when either an exact match is found or a word within the preset threshold distance is found. If an exact match or a match within a preset threshold is not found, the average number of partitions becomes 75 and 450 words on an average are compared. To the best of our knowledge this is the first work on the use of a Hindi word dictionary for OCR post-processing. Received August 6, 2001 / Accepted August 22, 2001  相似文献   

19.
ABSTRACT

Exponential recurrence phenomenon has been reported in the study of gaps between repetitions of words in a text. The phenomenon has its applications in several computer–based natural language systems. In this article, four leading statistical models of text generation are evaluated and we identify the Simon Yule model of Zipf's law as a promising approach. A realistic refinement of the Simon–Yule model is made to allow for a decreasing entry rate of new words. Simulation methods are used to show that the exponential recurrence phenomenon is preserved with this change in assumptions. Significant implications of the approach are discussed.  相似文献   

20.
该文介绍了哈萨克文专用字母的特殊书写习惯,以及哈萨克文编码字符处理现状。指出当前广泛使用的字母替换法不符合国际和国家相关标准,并且会导致哈萨克文排序错误,增加文字转换、语音合成等功能的实现难度。为解决上述不足,对字母替换法进行了三个改进,包括用专用字母与符号“”结合表示它们自己;专用字母各种书写形式带符号的字形中,仅将独立字符形式带符号“”的字形包含在OpenType字体中;用字形替换规则识别专用字母与哈萨克文字母不相邻的上下文环境。为便于改进方法的应用,该文介绍了与改进方法一致的OpenType字体字形替换规则设置方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号