首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Our confidence in the future performance of any algorithm, including optimization algorithms, depends on how carefully we select test instances so that the generalization of algorithm performance on future instances can be inferred. In recent work, we have established a methodology to generate a 2-d representation of the instance space, comprising a set of known test instances. This instance space shows the similarities and differences between the instances using measurable features or properties, and enables the performance of algorithms to be viewed across the instance space, where generalizations can be inferred. The power of this methodology is the insights that can be generated into algorithm strengths and weaknesses by examining the regions in instance space where strong performance can be expected. The representation of the instance space is dependent on the choice of test instances however. In this paper we present a methodology for generating new test instances with controllable properties, by filling observed gaps in the instance space. This enables the generation of rich new sets of test instances to support better the understanding of algorithm strengths and weaknesses. The methodology is demonstrated on graph colouring as a case study.  相似文献   

2.
Instance selection is becoming more and more relevant due to the huge amount of data that is being constantly produced. However, although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is of hundreds of thousands or millions. In the best case, these algorithms are of efficiency O(n 2), n being the number of instances. When we face huge problems, scalability is an issue, and most algorithms are not applicable. This paper presents a divide-and-conquer recursive approach to the problem of instance selection for instance based learning for very large problems. Our method divides the original training set into small subsets where the instance selection algorithm is applied. Then the selected instances are rejoined in a new training set and the same procedure, partitioning and application of an instance selection algorithm, is repeated. In this way, our approach is based on the philosophy of divide-and-conquer applied in a recursive manner. The proposed method is able to match, and even improve, for the case of storage reduction, the results of well-known standard algorithms with a very significant reduction of execution time. An extensive comparison in 30 datasets form the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets with from 300,000 to more than a million instances, with very good results and fast execution time.  相似文献   

3.
Text classification is usually based on constructing a model through learning from training examples to automatically classify text documents. However, as the size of text document repositories grows rapidly, the storage requirement and computational cost of model learning become higher. Instance selection is one solution to solve these limitations whose aim is to reduce the data size by filtering out noisy data from a given training dataset. In this paper, we introduce a novel algorithm for these tasks, namely a biological-based genetic algorithm (BGA). BGA fits a “biological evolution” into the evolutionary process, where the most streamlined process also complies with the reasonable rules. In other words, after long-term evolution, organisms find the most efficient way to allocate resources and evolve. Consequently, we can closely simulate the natural evolution of an algorithm, such that the algorithm will be both efficient and effective. The experimental results based on the TechTC-100 and Reuters-21578 datasets show the outperformance of BGA over five state-of-the-art algorithms. In particular, using BGA to select text documents not only results in the largest dataset reduction rate, but also requires the least computational time. Moreover, BGA can make the k-NN and SVM classifiers provide similar or slightly better classification accuracy than GA.  相似文献   

4.
Instance spanning constraints (ISC) are the instrument to establish controls across multiple instances of one or several processes. A multitude of applications crave for ISC support. Consider, for example, the bundling and unbundling of cargo across several instances of a logistics process or dependencies between examinations in different medical treatment processes. Non-compliance with ISC can lead to severe consequences and penalties, e.g., dangerous effects due to undesired drug interactions. ISC might stem from regulatory documents, extracted by domain experts. Another source for ISC are process execution logs. Process execution logs store execution information for process instances, and hence, inherently, the effects of ISC. Discovering ISC from process execution logs can support ISC design and implementation (if the ISC was not known beforehand) and the validation of the ISC during its life time. This work contributes a categorization of ISC as well as four discovery algorithms for ISC candidates from process execution logs. The discovered ISC candidates are put into context of the associated processes and can be further validated with domain experts. The algorithms are prototypically implemented and evaluated based on artificial and real-world process execution logs. The results facilitate ISC design as well as validation and hence contribute to a digitalized ISC and compliance management.  相似文献   

5.
郑黎晓  许智武  陈海明 《软件学报》2011,22(11):2564-2576
提出一种上下文无关文法的句子生成算法.对于给定文法,算法生成一个满足该文法分支覆盖准则的句子集.结合长度控制、冗余消除和句子集规模控制等策略,使得生成的句子较短、无冗余、句子集规模较小.考察了算法在基于文法的软件系统的测试数据生成方面的应用情况.实验结果表明,该算法生成的测试数据具有较强的程序揭错能力,并且能够帮助测试人员提高测试速度.  相似文献   

6.
从汉语格关系表示生成日语   总被引:3,自引:1,他引:3  
本文介绍了一个基于转换翻译的汉日机器翻译系统中日语生成子系统的设计和实现。文章首先描述了一种基于格关系的汉语依存分析树,分析树结点记录语法语义以及格关系信息;然后,针对日语的特征,分析了日语生成中的主要问题,包括译词选择、用言活用形确定、助词添加等;给出基于规则的日语生成系统的组织结构,重点介绍生成规则系统的设计和实现。最后,给出规则描述的实例以及翻译实例,提出进一步改进本系统的初步想法。  相似文献   

7.
Text-based image retrieval may perform poorly due to the irrelevant and/or incomplete text surrounding the images in the web pages. In such situations, visual content of the images can be leveraged to improve the image ranking performance. In this paper, we look into this problem of image re-ranking and propose a system that automatically constructs multiple candidate “multi-instance bags (MI-bags)”, which are likely to contain relevant images. These automatically constructed bags are then utilized by ensembles of Multiple Instance Learning (MIL) classifiers and the images are re-ranked according to the final classification responses. Our method is unsupervised in the sense that, the only input to the system is the text query itself, without any user feedback or annotation. The experimental results demonstrate that constructing multiple instance bags based on the retrieval order and utilizing ensembles of MIL classifiers greatly enhance the retrieval performance, achieving on par or better results compared to the state-of-the-art.  相似文献   

8.
Backward demodulation is a simplification technique used in saturation-based theorem proving with superposition and ordered paramodulation. It requires instance retrieval, i.e., search for instances of some term in a typically large set of terms. Path indexing is a family of indexing techniques that can be used to solve this problem efficiently. We propose a number of powerful optimisations to standard path indexing. We also describe a novel framework that combines path indexing with relational joins. The main advantage of the proposed scheme is flexibility, which we illustrate by sketching how to adapt the scheme to instance retrieval modulo commutativity and backward subsumption on multi-literal clauses.  相似文献   

9.
Discovering the conditions under which an optimization algorithm or search heuristic will succeed or fail is critical for understanding the strengths and weaknesses of different algorithms, and for automated algorithm selection. Large scale experimental studies - studying the performance of a variety of optimization algorithms across a large collection of diverse problem instances - provide the resources to derive these conditions. Data mining techniques can be used to learn the relationships between the critical features of the instances and the performance of algorithms. This paper discusses how we can adequately characterize the features of a problem instance that have impact on difficulty in terms of algorithmic performance, and how such features can be defined and measured for various optimization problems. We provide a comprehensive survey of the research field with a focus on six combinatorial optimization problems: assignment, traveling salesman, and knapsack problems, bin-packing, graph coloring, and timetabling. For these problems - which are important abstractions of many real-world problems - we review hardness-revealing features as developed over decades of research, and we discuss the suitability of more problem-independent landscape metrics. We discuss how the features developed for one problem may be transferred to study related problems exhibiting similar structures.  相似文献   

10.
Generating relevant models   总被引:2,自引:0,他引:2  
Manthey and Bry's model generation approach to theorem proving for FOPC has been greeted with considerable interest. Unfortunately the original presentation of the technique can become arbitrarily inefficient when applied to problems whose statements contain large amounts of irrelevant information. We show how to avoid these problems whilst retaining nearly all the advantages of the basic approach.  相似文献   

11.
Nowadays, the availability of large collections of data requires techniques and tools capable of linking data together, by retrieving potentially useful relations among them and helping in associating together data representing the same or similar real objects. One of the main problems in developing data linking techniques and tools is to understand the quality of the results produced by the matching process. In this paper, we describe the experience of instance matching and data linking evaluation in the context of the Ontology Alignment Evaluation Initiative (IM@OAEI). Our goal is to be able to validate different proposed methods, identify most promising techniques and directions for improvement, and, subsequently, guide further research in the area as well as development of robust tools for real-world tasks.  相似文献   

12.
This paper describes a first attempt to base a paraphrase generation system upon Meľčuk and Žolkovskij's linguistic meaning-text (MT) model whose purpose is to establish correspondences between meanings, represented by networks, and (ideally) all synonymous texts having this meaning. The system described here contains a Prolog implementation of a small explanatory and combinatorial dictionary (the MT lexicon) and, using unification and backtracking, generates from a given network the sentences allowed by the dictionary and the lexical transformations of the model. The passage from a net to the final texts is done through a series of transformations of intermediary structures that closely correspond to MT utterance representations (semantic, deep-syntax, surface-syntax, and morphological representations). These are graphs and trees with labeled arcs. The Prolog unification (equality predicate) was extended to extract information from these representations and build new ones. The notion of utterance path, used by many authors, is replaced by that of covering by defining subnetworks.  相似文献   

13.
基于UML Statecharts的测试用例生成   总被引:4,自引:0,他引:4  
直接从含有层次和并发结构的UML statecharts图产生类的测试用例是比较困难的,提出了一种从UML statecharts图产生测试用例的方法:先把UML statecharts图转换成FREE(Flattened Regular Expression)模型图,再以FREE模型图为基础生成类的测试用例,同时,提出了FREE模型的测试覆盖准则,并提出了由FREE模型产生有限的迁移序列的算法。  相似文献   

14.
信息模型驱动的信息系统开发与元信息系统   总被引:3,自引:0,他引:3  
梁军  何建邦 《计算机科学》2003,30(3):117-119
信息系统开发经历了以计算为中心、数据为中心、对象为中心(数据与处理一体化)和正在发展的以模型为中心的四个阶段。以模型为中心的阶段的出现,使信息系统开发成为一个由信息模型(Information Model)驱动的过程,信息模型将贯穿于信息系统的分析、设计、实现、配置、维护和管理的各个阶段,从而需要一个基于信息模型的、辅助和管理信息系统开发与运行过程的信息系统,即元信息系统。  相似文献   

15.
基于高效多示例学习的目标跟踪   总被引:1,自引:0,他引:1  
彭爽  彭晓明 《计算机应用》2015,35(2):466-469
基于多示例学习(MIL)的跟踪算法能在很大程度上缓解漂移问题。然而,该算法的运行效率相对较低,精度也有待提高,这是由于MIL算法采用的强分类器更新策略效率不高,以及分类器更新速度与目标外观变化速度不一致引起的。为此提出一种新的强分类器更新策略,以大幅提升MIL算法的运行效率;同时提出一种动态更新分类器学习率的机制,使更新后的分类器更符合目标的外观,提高跟踪算法的精度。通过实验将该算法和MIL算法以及基于加权多示例学习的跟踪算法(WMIL)进行对比,实验结果表明,所提出算法的运行效率和跟踪精度都是三者中最好的,在背景中没有与被跟踪目标外观相似的干扰物体存在时有较好的跟踪优势。  相似文献   

16.
UMLTGF:一个基于灰盒方法从UML活动图生成测试用例的工具   总被引:8,自引:0,他引:8  
UML已经成为建模语言的事实标准,如何从UML分析设计模型生成测试用例也为面向对象软件测试带来了新的挑战.为了从UML设计模型中的活动图直接生成测试用例,给出了UML活动图的形式化定义和灰盒测试方法.该方法首先分析UML活动图上的所有执行路径(每条路径称为一个测试场景),然后根据测试场景中的节点和转换所代表的活动及其输入/输出变量、相关约束条件等生成测试用例.并根据该方法实现了一个自动生成测试用例的工具UMLTGF,它可以从Rational Rose的规约文件中提取活动图信息并生成相应的测试用例.该工具能够提高软件测试的效率,降低测试成本.  相似文献   

17.
18.
The problem of traffic sign recognition is generally approached by first constructing a classifier, which is trained by some relevant image features extracted from traffic signs, to recognize new unknown traffic signs. Feature selection and instance selection are two important data preprocessing steps in data mining, with the former aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. However, there has thus far been no study examining the impact of performing feature and instance selection on traffic sign recognition performance. Given that genetic algorithms (GA) have been widely used for these types of data preprocessing tasks in related studies, we introduce a novel genetic-based biological algorithm (GBA). GBA fits “biological evolution” into the evolutionary process, where the most streamlined process also complies with reasonable rules. In other words, after long-term evolution, organisms find the most efficient way to allocate resources and evolve. Similarly, we closely simulate the natural evolution of an algorithm, to find an option it will be both efficient and effective. Experiments are carried out comparing the performance of the GBA and a GA based on the German Traffic Sign Recognition Benchmark. The results show that the GBA outperforms the GA in terms of the reduction rate, classification accuracy, and computational cost.  相似文献   

19.
The related economic goals of test generation are quite important for software industry. Manufacturers ever seeking to increase their productivity need to avoid malfunctions at the time of system specification: the later the defaults are detected, the greater the cost is. Consequently, the development of techniques and tools able to efficiently support engineers who are in charge of elaborating the specification constitutes a major challenge whose fallout concerns not only sectors of critical applications but also all those where poor conception could be extremely harmful to the brand image of a product.This article describes the design and implementation of a set of tools allowing software developers to validate UML (the Unified Modeling Language) specifications. This toolset belongs to the AGATHA environment, which is an automated test generator, developed at CEA/LIST.The AGATHA toolset is designed to validate specifications of communicating concurrent units described using an EIOLTS formalism (Extended Input Output Labeled Transition System). The goal of the work described in this paper is to provide an interface between UML and an EIOLTS formalism giving the possibility to use AGATHA on UML specifications.In this paper we describe first the translation of UML models into the EIOLTS formalism, and the translation of the results of the behavior analysis, provided by AGATHA, back into UML. Then we present the AGATHA toolset; we particularly focus on how AGATHA overcomes several problems of combinatorial explosion. We expose the concept of symbolic calculus and detection of redundant paths, which are the main principles of AGATHA's kernel. This kernel properly computes all the symbolic behaviors of a system specified in EIOLTS and automatically generates tests by way of constraint solving. Eventually we apply our method to an example and explain the different results that are computed.  相似文献   

20.
Prototyping is an efficient and effective way to understand and validate system requirements at the early stage of software development. In this paper, we present an approach for transforming UML system requirement models with OCL specifications into executable prototypes with the function of checking multiplicity and invariant constraints. Generally, a use case in UML can be described as a sequence of system operations. A system operation can be formally defined by a pair of preconditions and postconditions specified using OCL in the context of the conceptual class model. By analyzing the semantics of the preconditions and postconditions, the execution of the operation can be prototyped as a sequence of primitive actions which first check the precondition, and then enforce the postcondition by transferring the system from a pre-state to a post-state step by step. The primitive actions are basic manipulations of the system state (an object diagram), including find objects and links, create and remove objects and links, and check and set attribute values. Based on this approach, we have developed a tool of automatic prototype generation and analysis: AutoPA3.0.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号