首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Many techniques have been proposed for credit risk assessment, from statistical models to artificial intelligence methods. During the last few years, different approaches to classifier ensembles have successfully been applied to credit scoring problems, demonstrating to be generally more accurate than single prediction models. The present paper goes one step beyond by introducing composite ensembles that jointly use different strategies for diversity induction. Accordingly, the combination of data resampling algorithms (bagging and AdaBoost) and attribute subset selection methods (random subspace and rotation forest) for the construction of composite ensembles is explored with the aim of improving the prediction performance. The experimental results and statistical tests show that this new two-level classifier ensemble constitutes an appropriate solution for credit scoring problems, performing better than the traditional single ensembles and very significantly better than individual classifiers.  相似文献   

2.
In a large software system knowing which files are most likely to be fault-prone is valuable information for project managers. They can use such information in prioritizing software testing and allocating resources accordingly. However, our experience shows that it is difficult to collect and analyze fine-grained test defects in a large and complex software system. On the other hand, previous research has shown that companies can safely use cross-company data with nearest neighbor sampling to predict their defects in case they are unable to collect local data. In this study we analyzed 25 projects of a large telecommunication system. To predict defect proneness of modules we trained models on publicly available Nasa MDP data. In our experiments we used static call graph based ranking (CGBR) as well as nearest neighbor sampling for constructing method level defect predictors. Our results suggest that, for the analyzed projects, at least 70% of the defects can be detected by inspecting only (i) 6% of the code using a Naïve Bayes model, (ii) 3% of the code using CGBR framework.  相似文献   

3.
In this paper, architectures and methods of decision aggregation in classifier ensembles are investigated. Typically, ensembles are designed in such a way that each classifier is trained independently and the decision fusion is performed as a post-process module. In this study, however, we are interested in making the fusion a more adaptive process. We first propose a new architecture that utilizes the features of a problem to guide the decision fusion process. By using both the features and classifiers outputs, the recognition strengths and weaknesses of the different classifiers are identified. This information is used to improve overall generalization capability of the system. Furthermore, we propose a co-operative training algorithm that allows the final classification to determine whether further training should be carried out on the components of the architecture. The performance of the proposed architecture is assessed by testing it on several benchmark problems. The new architecture shows improvement over existing aggregation techniques. Moreover, the proposed co-operative training algorithm provides a means to limit the users’ intervention, and maintains a level of accuracy that is competitive to that of most other approaches.  相似文献   

4.
Ensemble classification remains one of the most popular techniques in contemporary machine learning, being characterized by both high efficiency and stability. An ideal ensemble comprises mutually complementary individual classifiers which are characterized by the high diversity and accuracy. This may be achieved, e.g., by training individual classification models on feature subspaces. Random Subspace is the most well-known method based on this principle. Its main limitation lies in stochastic nature, as it cannot be considered as a stable and a suitable classifier for real-life applications. In this paper, we propose an alternative approach, Deterministic Subspace method, capable of creating subspaces in guided and repetitive manner. Thus, our method will always converge to the same final ensemble for a given dataset. We describe general algorithm and three dedicated measures used in the feature selection process. Finally, we present the results of the experimental study, which prove the usefulness of the proposed method.  相似文献   

5.
Incremental construction of classifier and discriminant ensembles   总被引:2,自引:0,他引:2  
We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets, incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers.  相似文献   

6.
The overproduce-and-choose strategy, which is divided into the overproduction and selection phases, has traditionally focused on finding the most accurate subset of classifiers at the selection phase, and using it to predict the class of all the samples in the test data set. It is therefore, a static classifier ensemble selection strategy. In this paper, we propose a dynamic overproduce-and-choose strategy which combines optimization and dynamic selection in a two-level selection phase to allow the selection of the most confident subset of classifiers to label each test sample individually. The optimization level is intended to generate a population of highly accurate candidate classifier ensembles, while the dynamic selection level applies measures of confidence to reveal the candidate ensemble with the highest degree of confidence in the current decision. Experimental results conducted to compare the proposed method to a static overproduce-and-choose strategy and a classical dynamic classifier selection approach demonstrate that our method outperforms both these selection-based methods, and is also more efficient in terms of performance than combining the decisions of all classifiers in the initial pool.  相似文献   

7.
Due to a limited control over changing operational conditions and personal physiology, systems used for video-based face recognition are confronted with complex and changing pattern recognition environments. Although a limited amount of reference data is initially available during enrollment, new samples often become available over time, through re-enrollment, post analysis and labeling of operational data, etc. Adaptive multi-classifier systems (AMCSs) are therefore desirable for the design and incremental update of facial models. For real time recognition of individuals appearing in video sequences, facial regions are captured with one or more cameras, and an AMCS must perform fast and efficient matching against the facial model of individual enrolled to the system. In this paper, an incremental learning strategy based on particle swarm optimization (PSO) is proposed to efficiently evolve heterogeneous classifier ensembles in response to new reference data. This strategy is applied to an AMCS where all parameters of a pool of fuzzy ARTMAP (FAM) neural network classifiers (i.e., a swarm of classifiers), each one corresponding to a particle, are co-optimized such that both error rate and network size are minimized. To provide a high level of accuracy over time while minimizing the computational complexity, the AMCS integrates information from multiple diverse classifiers, where learning is guided by an aggregated dynamical niching PSO (ADNPSO) algorithm that optimizes networks according both these objectives. Moreover, pools of FAM networks are evolved to maintain (1) genotype diversity of solutions around local optima in the optimization search space and (2) phenotype diversity in the objective space. Accurate and low cost ensembles are thereby designed by selecting classifiers on the basis of accuracy, and both genotype and phenotype diversity. For proof-of-concept validation, the proposed strategy is compared to AMCSs where incremental learning of FAM networks is guided through mono- and multi-objective optimization. Performance is assessed in terms of video-based error rate and resource requirements under different incremental learning scenarios, where new data is extracted from real-world video streams (IIT-NRC and MoBo). Simulation results indicate that the proposed strategy provides a level of accuracy that is comparable to that of using mono-objective optimization and reference face recognition systems, yet requires a fraction of the computational cost (between 16% and 20% of a mono-objective strategy depending on the data base and scenario).  相似文献   

8.
Corporate organizations sometimes offer similar software products in certain domains due to former company mergers or due to the complexity of the organization. The functional overlap of such products is an opportunity for future systematic reuse to reduce software development and maintenance costs. Therefore, we have tailored existing domain analysis methods to our organization to identify commonalities and variabilities among such products and to assess the potential for software product line (SPL) approaches. As an exploratory case study, we report on our experiences and lessons learned from conducting the domain analysis in four application cases with large-scale software products. We learned that the outcome of a domain analysis was often a smaller integration scenario instead of an SPL and that business case calculations were less relevant for the stakeholders and managers from the business units during this phase. We also learned that architecture reconstruction using a simple block diagram notation aids domain analysis and that large parts of our approach were reusable across application cases.  相似文献   

9.
Recent researches in fault classification have shown the importance of accurately selecting the features that have to be used as inputs to the diagnostic model. In this work, a multi-objective genetic algorithm (MOGA) is considered for the feature selection phase. Then, two different techniques for using the selected features to develop the fault classification model are compared: a single classifier based on the feature subset with the best classification performance and an ensemble of classifiers working on different feature subsets. The motivation for developing ensembles of classifiers is that they can achieve higher accuracies than single classifiers. An important issue for an ensemble to be effective is the diversity in the predictions of the base classifiers which constitute it, i.e. their capability of erring on different sub-regions of the pattern space. In order to show the benefits of having diverse base classifiers in the ensemble, two different ensembles have been developed: in the first, the base classifiers are constructed on feature subsets found by MOGAs aimed at maximizing the fault classification performance and at minimizing the number of features of the subsets; in the second, diversity among classifiers is added to the MOGA search as the third objective function to maximize. In both cases, a voting technique is used to effectively combine the predictions of the base classifiers to construct the ensemble output. For verification, some numerical experiments are conducted on a case of multiple-fault classification in rotating machinery and the results achieved by the two ensembles are compared with those obtained by a single optimal classifier.  相似文献   

10.
11.
UML is a commonly-used graphical language for the modelling of software. Works regarding UML’s effectiveness have studied projects that develop software systems from scratch. Yet the maintenance of software consumes a large share of the overall time and effort required to develop software systems. This study, therefore, focuses on the use of UML in software maintenance. We wish to elicit the practices of the software modelling used during maintenance in industry and understand what are perceived as hurdles and benefits when using modelling. In order to achieve a high level of realism, we performed a case study in a multinational company’s ICT department. The analysis is based on 31 interviews with employees who work on software maintenance projects. The interviewees played different roles and provided complementary views about the use, hurdles and benefits of software modelling and the use of UML. Our study uncovered a broad range of modelling-related practices, which are presented in a theoretical framework that illustrates how these practices are linked to the specific goals and context of software engineering projects. We present a list of recommended practices that contribute to the increased effectiveness of software modelling. The use of software modelling notations (like UML) is considered beneficial for software maintenance, but needs to be tailored to its context. Various practices that contribute to the effective use of modelling are commonly overlooked, suggesting that a more conscious holistic approach with which to integrate modelling practices into the overall software engineering approach is required.  相似文献   

12.
Requirements views, such as coverage and status views, are an important asset for monitoring and managing software development projects. We have developed a method that automates the process of reconstructing these views, and we have built a tool, ReqAnalyst, that supports this method. This paper presents an investigation as to which extent requirements views can be automatically generated in order to monitor requirements in industrial practice. The paper focuses on monitoring the requirements in test categories and test cases. In order to retrieve the necessary data, an information retrieval technique, called Latent Semantic Indexing, was used. The method was applied in an industrial study. A number of requirements views were defined and experiments were carried out with different reconstruction settings for generating these views. Finally, we explored how these views can help the developers during the software development process.
Hans-Gerhard GrossEmail:

Marco Lormans   is a PhD researcher at the Software Engineering department of Delft University of Technology and a consultant at Logica. He received a MSc. in computer science from Delft University of Technology. His research interests encompass (global) software development, and in particular the specification and management of requirements, and software quality assurance. Arie van Deursen   is a full professor at Delft University of Technology, where he is heading the Software Engineering Research Group. He obtained his MSc degree in computer science in 1990 from the Vrije Universiteit, Amsterdam. From 1996 until 2006 he was a research leader at CWI, the Dutch National Institute for Research in Mathematics in Computer Science. His research interests include software evolution and reverse engineering, as well as model-driven approaches to software engineering. He is one of the co-founders of Software Improvement Group, an Amsterdam-based software consultancy firm in the area of software system analysis. He has served on numerous program committees in the areas of software evolution, maintenance, and software engineering in general, and has been program chair for the IEEE Working Conference on Reverse Engineering in 2002 and 2003. Hans-Gerhard Gross   received an MSc in Computer Science (1996) from the University of Applied Sciences, Berlin, Germany, and a PhD in Software Engineering (2000) from the University of Glamorgan, Wales, UK. Following his PhD, Dr. Gross joined the Fraunhofer Institute for Experimental Software Engineering in Kaiserslautern, Germany, where he was responsible for a number of public research projects, devising software testing strategies, and for consulting projects with major German software organizations. Since 2005, Dr. Gross is employed as Assistant Professor at Delft University of Technology, The Netherlands. His research interests encompass all phases of software development, in general, and software testing, in particular.   相似文献   

13.
The success of software development using third party components highly depends on the ability to select a suitable component for the intended application. The evidence shows that there is limited knowledge about current industrial OTS selection practices. As a result, there is often a gap between theory and practice, and the proposed methods for supporting selection are rarely adopted in the industrial practice. This paper's goal is to investigate the actual industrial practice of component selection in order to provide an initial empirical basis that allows the reconciliation of research and industrial endeavors. The study consisted of semi-structured interviews with 23 employees from 20 different software-intensive companies that mostly develop web information system applications. It provides qualitative information that help to further understand these practices, and emphasize some aspects that have been overlooked by researchers. For instance, although the literature claims that component repositories are important for locating reusable components; these are hardly used in industrial practice. Instead, other resources that have not received considerable attention are used with this aim. Practices and potential market niches for software-intensive companies have been also identified. The results are valuable from both the research and the industrial perspectives as they provide a basis for formulating well-substantiated hypotheses and more effective improvement strategies.  相似文献   

14.
This article describes a philosophy and practical application of software organization for control of industrial robots. In order to apply an industrial robot to a work cell as a component or device, not only is a language to command the robot motion necessary, powerful functions of the robot to realize a robot work cell with a hierarchical structure are also required. First, we discuss an outline of industrial robot control and select required functions. Second, we discuss software organizations and significant points to realize them. Finally, as a practical application we introduce a robot system using a personal computer that is widely available in Japan.  相似文献   

15.
《Information Fusion》2009,10(2):150-162
Information fusion research has recently focused on the characteristics of the decision profiles of ensemble members in order to optimize performance. These characteristics are particularly important in the selection of ensemble members. However, even though the control of overfitting is a challenge in machine learning problems, much less work has been devoted to the control of overfitting in selection tasks. The objectives of this paper are: (1) to show that overfitting can be detected at the selection stage; and (2) to present strategies to control overfitting. Decision trees and k nearest neighbors classifiers are used to create homogeneous ensembles, while single- and multi-objective genetic algorithms are employed as search algorithms at the selection stage. In this study, we use bagging and random subspace methods for ensemble generation. The classification error rate and a set of diversity measures are applied as search criteria. We show experimentally that the selection of classifier ensembles conducted by genetic algorithms is prone to overfitting, especially in the multi-objective case. In this study, the partial validation, backwarding and global validation strategies are tailored for classifier ensemble selection problem and compared. This comparison allows us to show that a global validation strategy should be applied to control overfitting in pattern recognition systems involving an ensemble member selection task. Furthermore, this study has helped us to establish that the global validation strategy can be used to measure the relationship between diversity and classification performance when diversity measures are employed as single-objective functions.  相似文献   

16.
Almost every sufficiently complex software system today is configurable. Conditional compilation is a simple variability-implementation mechanism that is widely used in open-source projects and industry. Especially, the C preprocessor (CPP) is very popular in practice, but it is also gaining (again) interest in academia. Although there have been several attempts to understand and improve CPP, there is a lack of understanding of how it is used in open-source and industrial systems and whether different usage patterns have emerged. The background is that much research on configurable systems and product lines concentrates on open-source systems, simply because they are available for study in the first place. This leads to the potentially problematic situation that it is unclear whether the results obtained from these studies are transferable to industrial systems. We aim at lowering this gap by comparing the use of CPP in open-source projects and industry—especially from the embedded-systems domain—based on a substantial set of subject systems and well-known variability metrics, including size, scattering, and tangling metrics. A key result of our empirical study is that, regarding almost all aspects we studied, the analyzed open-source systems and the considered embedded systems from industry are similar regarding most metrics, including systems that have been developed in industry and made open source at some point. So, our study indicates that, regarding CPP as variability-implementation mechanism, insights, methods, and tools developed based on studies of open-source systems are transferable to industrial systems—at least, with respect to the metrics we considered.  相似文献   

17.
It is more important to properly handle exceptions, than to prevent exceptions from occurring, because they arise from so many different causes. In embedded systems, a vast number of exceptions are caused by hardware devices. In such cases, numerous software components are involved in these hardware device-originated exceptions, ranging from the device itself to the device driver, the kernel, and applications. Therefore, it takes a lot of time to debug software that fails to handle exceptions. This paper proposes a lightweight device exception testing method, and a related automation tool, AMOS v3.0. The proposed method artificially triggers more realistic device exceptions in runtime, and monitors how software components handle exceptions in detail. AMOS v3.0 has been applied to the exception testing of car-infotainment systems in an automobile company. The results based on this industrial field study have revealed that 39.13% of the failures in exception handling were caused by applications, 36.23% of the failures were caused by device drivers, and 24.64% were derived from the kernel. We conclude that the proposed method is highly effective, in that it can allow developers to identify the root cause of failure for exception handling.  相似文献   

18.
The satisfiability problem (SAT) is a fundamental problem in mathematical logic, constraint satisfaction, VLSI engineering, and computing theory. Methods to solve the satisfiability problem play an important role in the development of computing theory and systems. In this paper, we give a BDD (Binary Decision Diagrams) SAT solver for practical asynchronous circuit design. The BDD SAT solver consists of a structural SAT formula preprocessor and a complete, incremental SAT algorithm that is able to find an optimal solution. The preprocessor compresses a large size SAT formula representing the circuit into a number of smaller SAT formulas. This avoids the problem of solving very large SAT formulas. Each small size SAT formula is solved by the BDD SAT algorithm efficiently. Eventually, the results of these subproblems are integrated together that contribute to the solution of the original problem. According to recent industrial assessments, this BDD SAT solver provides solutions to the practical, industrial asynchronous circuit design problems.This research is supported in part by the 1993 ACM/IEEE Design Automation Award, by the Alberta Microelectronics Graduate Scholarship, by the NSERC research grant OGP0046423, and was supported in part by the NSERC strategic grant MEF0045793.Presently, Jun Gu is on leave with the Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.  相似文献   

19.
Search-based software testing promises the ability to generate and evaluate large numbers of test cases at minimal cost. From an industrial perspective, this could enable an increase in product quality without a matching increase in the time and effort required to do so.Search-based software testing, however, is a set of quite complex techniques and approaches that do not immediately translate into a process for use with most companies.For example, even if engineers receive the proper education and training in these new approaches, it can be hard to develop a general fitness function that covers all contingencies. Furthermore, in industrial practice, the knowledge and experience of domain specialists are often key for effective testing and thus for the overall quality of the final software system. But it is not clear how such domain expertise can be utilized in a search-based system.This paper presents an interactive search-based software testing (ISBST) system designed to operate in an industrial setting and with the explicit aim of requiring only limited expertise in software testing. It uses SBST to search for test cases for an industrial software module, while also allowing domain specialists to use their experience and intuition to interactively guide the search.In addition to presenting the system, this paper reports on an evaluation of the system in a company developing a framework for embedded software controllers. A sequence of workshops provided regular feedback and validation for the design and improvement of the ISBST system. Once developed, the ISBST system was evaluated by four electrical and system engineers from the company (the ‘domain specialists’ in this context) used the system to develop test cases for a commonly used controller module. As well as evaluating the utility of the ISBST system, the study generated interaction data that were used in subsequent laboratory experimentation to validate the underlying search-based algorithm in the presence of realistic, but repeatable, interactions.The results validate the importance that automated software testing tools in general, and search-based tools, in particular, can leverage input from domain specialists while generating tests. Furthermore, the evaluation highlighted benefits of using such an approach to explore areas that the current testing practices do not cover or cover insufficiently.  相似文献   

20.
In this paper, a simulation method is proposed to generate a set of classifier outputs with specified individual accuracies and fixed pairwise agreement. A diversity measure (kappa) is used to control the agreement among classifiers for building the classifier teams. The generated team outputs can be used to study the behaviour of class-type combination methods such as voting rules over multiple dependent classifiers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号