首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 144 毫秒
1.
A language-independent syntax-directed pretty printer has been implemented as the first step towards building a language-independent syntax-directed editor. The syntax-directed pretty printer works in two phases: the grammar processing phase and the program processing phase. In the grammar processing stage, a grammar which contains a context-free grammar and information for the parser and pretty printer is processed and all files needed by the second phase are written. With these files, the syntax-directed pretty printer works for the language of the grammar. The syntax-directed editor would use the same grammar processing phase to construct the files needed to make it work for a specific language. In the program processing phase, programs in the language of the grammar are parsed and parse trees are built. If syntax errors are found, error messages are produced and error recovery is done. The parse trees are pretty printed according to the pretty printer specifications given in the grammar, resulting in well-indented, syntactically clear programs.  相似文献   

2.
Benchmarks are heavily used in different areas of computer science to evaluate algorithms and tools. In program analysis and testing, open‐source and commercial programs are routinely used as benchmarks to evaluate different aspects of algorithms and tools. Unfortunately, many of these programs are written by programmers who introduce different biases, not to mention that it is very difficult to find programs that can serve as benchmarks with high reproducibility of results. We propose a novel approach for generating random benchmarks for evaluating program analysis and testing tools and compilers. Our approach uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs. We implemented our tool for Java and applied it to generate a set of large benchmark programs of up to 5M lines of code each with which we evaluated different program analysis and testing tools and compilers. The generated benchmarks let us independently rediscover several issues in the evaluated tools. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
由于Java Web应用业务场景复杂,且对输入数据的结构有效性要求较高,现有的测试方法和工具在测试Java Web时存在测试用例的有效率较低的问题.为了解决上述问题,本文提出了基于解析树的Java Web应用灰盒模糊测试方法.首先为Java Web应用程序的输入数据包进行语法建模创建解析树,区分分隔符和数据块,并为解析树中每一个叶子结点挂接一个种子池,隔离测试用例的单个数据块,通过数据包拼接生成符合Java Web应用业务格式的输入,从而提高测试用例的有效率;为了保留高质量的数据块,在测试期间根据测试程序的执行反馈信息,为每个数据块种子单独赋予权值;为了突破深度路径,会在相应种子池中基于条件概率学习提取数据块种子特征.本文实现了基于解析树的Java Web应用灰盒模糊测试系统PTreeFuzz,测试结果表明,该系统相较于现有工具取得了更好的测试准确率.  相似文献   

4.
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

5.
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

6.
We show in this paper that parsing with regular expressions instead of context-free grammars, when it is possible, is desirable. We present efficient algorithms for performing different tasks that concern parsing: producing the external representation and the internal representation of parse trees; producing all possible parse trees or a single one. Each of our algorithms to produce a parse tree from an input string has an optimal time complexity, linear with the length of the string. Moreover, ambiguous regular expressions can be used. Received: 21 October 1997 / 30 May 2000  相似文献   

7.
Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators.  相似文献   

8.
基于IDA-Pro的软件逆向分析方法   总被引:1,自引:0,他引:1       下载免费PDF全文
二进制程序转换作为软件逆向分析的主要手段发挥着积极作用。该文给出一种程序转换方法,应用软件二进制程序经IDA Pro反汇编得汇编语言程序,依据下推自动机原理设计汇编文法识别该汇编文件、制定相应的转换规则和优化措施将汇编语言转换成中间语言。转换所得中间语言可读性较强,具有通用性且易于理解。该方法达到了较高的自动化程度,缩小了目标程序的代码量,其应用可有效地减少软件分析和调试人员在追踪代码时所需的时间和工作量。给出应用上述方法进行程序转换的实例。  相似文献   

9.
We study CFG parse tree enumeration in this paper. By dividing the set of all parse trees into infinite hierarchies according to height of parse tree, the hierarchical lexicographic order on the set of parse trees is established. Then grammar-based algorithms for counting and enumerating CFG parse trees in this order are presented. To generate a parse tree of height n, the time complexity is O(n). If τ is a lowest parse tree for its yield, then O(n) =O(∥ gt ∥+ 1), where ∥τ ∥ is the length of the sentence (yield) generated by τ. The sentence can be obtained as a by-product of the parse tree. To compute sentence from its parse tree (needn’t be lowest one), the time complexity is O(node)+O(∥τ ∥ + 1), where node is the number of non-leaf nodes of parse tree τ. To generate both a complete lowest parse tree and its yield at the same time, the time complexity is O(∥τ ∥ + 1). Supported by the National Natural Science Foundation of China (Grant Nos. 60273023, 60721061)  相似文献   

10.
We first propose a modular framework for recursive composition of P systems. This modular approach provides encapsulation and information hiding, facilitating the design of P programs for complex algorithms. Using this framework, we developed a P program that solves the classical version of the Byzantine agreement problem, for N participants connected in a complete graph, according to the well known Byzantine agreement algorithm based on EIG trees. We prove the correctness of this modular composition and conclude with a list of open problems.  相似文献   

11.
Probabilistic context-free grammars (PCFGs) provide a simple way to represent a particular class of distributions over sentences in a context-free language. Efficient parsing algorithms for answering particular queries about a PCFG (i.e., calculating the probability of a given sentence, or finding the most likely parse) have been developed and applied to a variety of pattern-recognition problems. We extend the class of queries that can be answered in several ways: (1) allowing missing tokens in a sentence or sentence fragment, (2) supporting queries about intermediate structure, such as the presence of particular nonterminals, and (3) flexible conditioning on a variety of types of evidence. Our method works by constructing a Bayesian network to represent the distribution of parse trees induced by a given PCFG. The network structure mirrors that of the chart in a standard parser, and is generated using a similar dynamic programming approach. We present an algorithm for constructing Bayesian networks from PCFGs, and show how queries or patterns of queries on the network correspond to interesting queries on PCFGs. The network formalism also supports extensions to encode various context sensitivities within the probabilistic dependency structure  相似文献   

12.
To model combinatorial decision problems involving uncertainty and probability, we introduce scenario based stochastic constraint programming. Stochastic constraint programs contain both decision variables, which we can set, and stochastic variables, which follow a discrete probability distribution. We provide a semantics for stochastic constraint programs based on scenario trees. Using this semantics, we can compile stochastic constraint programs down into conventional (non-stochastic) constraint programs. This allows us to exploit the full power of existing constraint solvers. We have implemented this framework for decision making under uncertainty in stochastic OPL, a language which is based on the OPL constraint modelling language [Van Hentenryck et al., 1999]. To illustrate the potential of this framework, we model a wide range of problems in areas as diverse as portfolio diversification, agricultural planning and production/inventory management.  相似文献   

13.
Reusing software through copying and pasting is a continuous plague in software development despite the fact that it creates serious maintenance problems. Various techniques have been proposed to find duplicated redundant code (also known as software clones). A recent study has compared these techniques and shown that token-based clone detection based on suffix trees is fast but yields clone candidates that are often not syntactic units. Current techniques based on abstract syntax trees—on the other hand—find syntactic clones but are considerably less efficient. This paper describes how we can make use of suffix trees to find syntactic clones in abstract syntax trees. This new approach is able to find syntactic clones in linear time and space. The paper reports the results of a large case study in which we empirically compare the new technique to other techniques using the Bellon benchmark for clone detectors. The Bellon benchmark consists of clone pairs validated by humans for eight software systems written in C or Java from different application domains. The new contributions of this paper over the conference paper are the additional analysis of Java programs, the exploration of an alternative path that uses parse trees instead of abstract syntax trees, and the investigation of the impact on recall and precision when clone analyses insist on consistent parameter renaming.  相似文献   

14.
Program diagnosis systems were developed to help users solve programming problems. By providing guidence on errors and misconceptions, these systems can help the users in writing programs and understanding their dynamic behavior. Features of software visualization which aim at providing visual and concrete depictions to the abstractions and operations of programs have also shown to be making programs more understandable. The main theme of this paper is to asses the usefulness of incorporating features of software visualization into the design of program diagnosis systems intended for novices. We report an empirical evaluation to assess the effectiveness of supporting visualization features during problem solving. The system used in the evaluation integrates visualzation and immediacy features and supports a model-tracing based approach to program diagnosis. Unlike other similar systems, our prototype system supports a more flexible style of interaction by increasing the grain size of diagnosis to a complete programming statement. The evaluation reported here seems to suggest that when supported with visualization features, systems for program diagnosis tend to be more effective in helping the users during problem solving.  相似文献   

15.
Traditional Earley parsers operate in two phases: first recognizing the input, then constructing the forest of parse trees. Practically speaking, this quirk makes it awkward to use in a compiler-compiler, because semantic actions attached to rules are only executed after the fact. We address this problem by identifying safe Earley sets, points during the recognition phase at which partial parse trees can be constructed; this means that semantic actions may be executed on the fly. A secondary benefit is that Earley sets can be deleted during recognition, resulting in a substantial savings of both space and time.  相似文献   

16.
基于逻辑行和最大接纳距离的网页正文抽取   总被引:3,自引:0,他引:3       下载免费PDF全文
网页正文抽取是很多互联网应用的基础工作和必须解决的问题。目前的主流方法是基于DOM树结构,此方法需要解析出网页的DOM树结构。对于目前互联网上的网页来源众多、结构众多的情形,基于DOM树的处理方法除了性能不足以外,还会遇到抽取精度上的问题。针对这些问题,该文提出了一个网页正文抽取的新方法,该方法不依赖DOM树,而是考虑人们编写网页的方式形成一些启发式规则,并结合相关的统计规律,以逻辑行为基本处理单位,基于最大接纳距离进行网页正文抽取。实验表明,论文的方法能够高效、高精度地抽取出网页正文。  相似文献   

17.
Anne D. Wilson 《Software》1984,14(9):807-816
Utility programs discussed in this paper were initially developed to automate the ‘drawing’ of the non-binary trees which occur when using a Jackson type methodology for program design; these trees represent either data structures or program structures, showing the selective, sequential and iterative relationships between nodes First a language was developed to allow information about tree structures to be input to the programs. Techniques and algorithms are described which enabled utility programs to be coded to process this data; transforming it into printable diagrammatic representation of trees; pseudo-code representation of program trees; and PL/I or Cobol programs Secondly a merge operation for trees had to be defined, reflecting the step in Warmer's and Jackson's methodologies of combining data tree structures together to provide a program structure. Further algorithms are described which enable this step to be performed automatically, they include checks that each particular merge can be performed Because the aims and concepts have been fluid some of the ideas have been discarded, and programs have been rewritten. Valuable insights were gained from persuing both the rejected and the acceptable ideas, these are described in this paper.  相似文献   

18.
Induction of decision trees is one of the most successful approaches to supervised machine learning. Branching programs are a generalization of decision trees and, by the boosting analysis, exponentially more efficiently learnable than decision trees. However, this advantage has not been seen to materialize in experiments. Decision trees are easy to simplify using pruning. Reduced error pruning is one of the simplest decision tree pruning algorithms. For branching programs no pruning algorithms are known. In this paper we prove that reduced error pruning of branching programs is infeasible. Finding the optimal pruning of a branching program with respect to a set of pruning examples that is separate from the set of training examples is NP-complete. Because of this intractability result, we have to consider approximating reduced error pruning. Unfortunately, it turns out that even finding an approximate solution of arbitrary accuracy is computationally infeasible. In particular, reduced error pruning of branching programs is APX-hard. Our experiments show that, despite the negative theoretical results, heuristic pruning of branching programs can reduce their size without significantly altering the accuracy.  相似文献   

19.
This paper considers the problem of finding relevant answers to multi-sentence questions, which is urgent for many applied fields. In particular, this problem arises in industrial systems that are aimed at providing goods and services. One of the major approaches to this problem is that a set of potential answers that were obtained using a keyword search is repeatedly ordered by comparing syntactic answer parse trees with a question parse tree. This work modifies the approach based on using parse trees and improves it by passing to a more exact representation of semantic and syntactic text structure: the consideration of text paragraphs as a unit of analyzed information. The software implementation of the approach was performed and the results of the implementation were placed in open access as an adjustment for the Apache SOLR search engine, by which the suggested technology can be easily integrated with industrial search systems.  相似文献   

20.
The genetic programming (GP) paradigm, which applies the Darwinian principle of evolution to hierarchical computer programs, has been applied with breakthrough success in various scientific and engineering applications. However, one of the main drawbacks of GP has been the often large amount of computational effort required to solve complex problems. Much disparate research has been conducted over the past 25 years to devise innovative methods to improve the efficiency and performance of GP. This paper attempts to provide a comprehensive overview of this work related to Canonical Genetic Programming based on parse trees and originally championed by Koza (Genetic programming: on the programming of computers by means of natural selection. MIT, Cambridge, 1992). Existing approaches that address various techniques for performance improvement are identified and discussed with the aim to classify them into logical categories that may assist with advancing further research in this area. Finally, possible future trends in this discipline and some of the open areas of research are also addressed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号