共查询到20条相似文献,搜索用时 15 毫秒
1.
The aim of bankruptcy prediction in the areas of data mining and machine learning is to develop an effective model which can provide the higher prediction accuracy. In the prior literature, various classification techniques have been developed and studied, in/with which classifier ensembles by combining multiple classifiers approach have shown their outperformance over many single classifiers. However, in terms of constructing classifier ensembles, there are three critical issues which can affect their performance. The first one is the classification technique actually used/adopted, and the other two are the combination method to combine multiple classifiers and the number of classifiers to be combined, respectively. Since there are limited, relevant studies examining these aforementioned disuses, this paper conducts a comprehensive study of comparing classifier ensembles by three widely used classification techniques including multilayer perceptron (MLP) neural networks, support vector machines (SVM), and decision trees (DT) based on two well-known combination methods including bagging and boosting and different numbers of combined classifiers. Our experimental results by three public datasets show that DT ensembles composed of 80–100 classifiers using the boosting method perform best. The Wilcoxon signed ranked test also demonstrates that DT ensembles by boosting perform significantly different from the other classifier ensembles. Moreover, a further study over a real-world case by a Taiwan bankruptcy dataset was conducted, which also demonstrates the superiority of DT ensembles by boosting over the others. 相似文献
2.
We propose to apply an adequate form of an ensemble output to the last level of an additional classifier – the post-aggregation element – as a method to improve ensemble’s performance. Our experimental results prove that a Gate-Generated Functional Weight Classifier post-aggregation serves to get this objective, both in situations in which data are available everywhere and when some features are missing for the post-aggregation task – a case which is relevant for distributed classification problems.Post-aggregation techniques can be especially useful for massive (integrated by many learners) ensembles – such as most the committees, which do not allow trainable first aggregations – and for human decision fusion, because it is unclear what features are considered in this kind of processes. 相似文献
3.
ContextSoftware documentation is an integral part of any software development process. However, software practitioners are often concerned about the value, degree of usage and usefulness of documentation during development and maintenance.ObjectiveMotivated by the needs of NovAtel Inc. (NovAtel), a world-leading company developing software systems in support of global navigation satellite systems, and based on the results of a former systematic mapping study, we aimed at better understanding of the usage and the usefulness of various technical documents during software development and maintenance.MethodWe utilized the results of a former systematic mapping study and performed an industrial case study at NovAtel. From the joint definition of the analysis goals, the research method incorporates qualitative and quantitative analysis of 55 documents (design, test and process related) and 1630 of their revisions. In addition, we conducted a survey on the usage and usefulness of documents. A total of 25 staff members from the industrial partner, all having a medium to high level of experience, participated in the survey.ResultsIn the context of the case study, a number of findings were derived. They include that (1) technical documentation was consulted least frequently for maintenance purpose and most frequently as an information source for development, (2) source code was considered most frequently as the preferred information source during software maintenance, (3) there is no significant difference between the usage of various documentation types during both development and maintenance, and (4) initial hypotheses stating that up-to-date information, accuracy and preciseness have the highest impact on usefulness of technical documentation.ConclusionsIt is concluded that the usage of documentation differs for various purposes and it depends on the type of the information needs as well as the tasks to be completed (e.g., development and maintenance). The results have been confirmed to be helpful for the company under study, and the firm is currently implementing some of the recommendations given. 相似文献
4.
ContextEnterprise software systems (e.g., enterprise resource planning software) are often deployed in different contexts (e.g., different organizations or different business units or branches of one organization). However, even though organizations, business units or branches have the same or similar business goals, they may differ in how they achieve these goals. Thus, many enterprise software systems are subject to variability and adapted depending on the context in which they are used.ObjectiveOur goal is to provide a snapshot of variability in large scale enterprise software systems. We aim at understanding the types of variability that occur in large industrial enterprise software systems. Furthermore, we aim at identifying how variability is handled in such systems.MethodWe performed an exploratory case study in two large software organizations, involving two large enterprise software systems. Data were collected through interviews and document analysis. Data were analyzed following a grounded theory approach.ResultsWe identified seven types of variability (e.g., functionality, infrastructure) and eight mechanisms to handle variability (e.g., add-ons, code switches).ConclusionsWe provide generic types for classifying variability in enterprise software systems, and reusable mechanisms for handling such variability. Some variability types and handling mechanisms for enterprise software systems found in the real world extend existing concepts and theories. Others confirm findings from previous research literature on variability in software in general and are therefore not specific to enterprise software systems. Our findings also offer a theoretical foundation for describing variability handling in practice. Future work needs to provide more evaluations of the theoretical foundations, and refine variability handling mechanisms into more detailed practices. 相似文献
5.
Many techniques have been proposed for credit risk assessment, from statistical models to artificial intelligence methods. During the last few years, different approaches to classifier ensembles have successfully been applied to credit scoring problems, demonstrating to be generally more accurate than single prediction models. The present paper goes one step beyond by introducing composite ensembles that jointly use different strategies for diversity induction. Accordingly, the combination of data resampling algorithms (bagging and AdaBoost) and attribute subset selection methods (random subspace and rotation forest) for the construction of composite ensembles is explored with the aim of improving the prediction performance. The experimental results and statistical tests show that this new two-level classifier ensemble constitutes an appropriate solution for credit scoring problems, performing better than the traditional single ensembles and very significantly better than individual classifiers. 相似文献
6.
This paper presents a case study of a software product company that has successfully integrated practices from software product line engineering and agile software development. We show how practices from the two fields support the company’s strategic and tactical ambitions, respectively. We also discuss how the company integrates strategic, tactical and operational processes to optimize collaboration and consequently improve its ability to meet market needs, opportunities and challenges. The findings from this study are relevant to software product companies seeking ways to balance agility and product management. The findings also contribute to research on industrializing software engineering. 相似文献
7.
Michał Koziarski Bartosz Krawczyk Michał Woźniak 《Pattern Analysis & Applications》2017,20(4):981-990
Ensemble classification remains one of the most popular techniques in contemporary machine learning, being characterized by both high efficiency and stability. An ideal ensemble comprises mutually complementary individual classifiers which are characterized by the high diversity and accuracy. This may be achieved, e.g., by training individual classification models on feature subspaces. Random Subspace is the most well-known method based on this principle. Its main limitation lies in stochastic nature, as it cannot be considered as a stable and a suitable classifier for real-life applications. In this paper, we propose an alternative approach, Deterministic Subspace method, capable of creating subspaces in guided and repetitive manner. Thus, our method will always converge to the same final ensemble for a given dataset. We describe general algorithm and three dedicated measures used in the feature selection process. Finally, we present the results of the experimental study, which prove the usefulness of the proposed method. 相似文献
8.
In a large software system knowing which files are most likely to be fault-prone is valuable information for project managers. They can use such information in prioritizing software testing and allocating resources accordingly. However, our experience shows that it is difficult to collect and analyze fine-grained test defects in a large and complex software system. On the other hand, previous research has shown that companies can safely use cross-company data with nearest neighbor sampling to predict their defects in case they are unable to collect local data. In this study we analyzed 25 projects of a large telecommunication system. To predict defect proneness of modules we trained models on publicly available Nasa MDP data. In our experiments we used static call graph based ranking (CGBR) as well as nearest neighbor sampling for constructing method level defect predictors. Our results suggest that, for the analyzed projects, at least 70% of the defects can be detected by inspecting only (i) 6% of the code using a Naïve Bayes model, (ii) 3% of the code using CGBR framework. 相似文献
9.
Nayer M. Wanas Author Vitae Author Vitae Mohamed S. Kamel Author Vitae 《Pattern recognition》2006,39(9):1781-1794
In this paper, architectures and methods of decision aggregation in classifier ensembles are investigated. Typically, ensembles are designed in such a way that each classifier is trained independently and the decision fusion is performed as a post-process module. In this study, however, we are interested in making the fusion a more adaptive process. We first propose a new architecture that utilizes the features of a problem to guide the decision fusion process. By using both the features and classifiers outputs, the recognition strengths and weaknesses of the different classifiers are identified. This information is used to improve overall generalization capability of the system. Furthermore, we propose a co-operative training algorithm that allows the final classification to determine whether further training should be carried out on the components of the architecture. The performance of the proposed architecture is assessed by testing it on several benchmark problems. The new architecture shows improvement over existing aggregation techniques. Moreover, the proposed co-operative training algorithm provides a means to limit the users’ intervention, and maintains a level of accuracy that is competitive to that of most other approaches. 相似文献
10.
The purpose of this research was to study various fusion strategies where the levels of correlation between features and auto-correlation within features could be controlled. The fusion strategies were chosen to reflect decision-level fusion (ISOC and ROC), feature level fusion, via a single Generalized Regression Neural Network (GRNN) employing all available features, and an intermediate level of fusion that employed the outputs of individual classifiers, in this case posterior probability estimates, before they are subjected to thresholds and mapped into decisions. This latter scheme involved fusing the posterior probability estimates by employing them as features in a probabilistic neural network. Correlation was injected into the data set both within a feature set (auto-correlation) and across feature sets, and sample size was varied for a two class problem. The fusion methods were then extended to three classifiers, and a method is demonstrated that selects the optimal classifier ensemble. 相似文献
11.
The number of applications for smartphones and tablets is growing exponentially in the last years. Many of these applications are supported by the so-called Location Based Services, which are expected to provide reliable real-time localization anytime and anywhere, no matter either outdoors or indoors. Even though outdoors world-wide localization has been successfully developed through the well-known Global Navigation Satellite System technology, its counterpart large-scale deployment indoors is not available yet. In previous work, we have already introduced a novel technology for indoor localization supported by a WiFi fingerprint approach. In this paper, we describe how to enhance such approach through the combination of hierarchical localization and fuzzy classifier ensembles. It has been tested and validated at the University of Edinburgh, yielding promising results. 相似文献
12.
We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets, incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers. 相似文献
13.
The overproduce-and-choose strategy, which is divided into the overproduction and selection phases, has traditionally focused on finding the most accurate subset of classifiers at the selection phase, and using it to predict the class of all the samples in the test data set. It is therefore, a static classifier ensemble selection strategy. In this paper, we propose a dynamic overproduce-and-choose strategy which combines optimization and dynamic selection in a two-level selection phase to allow the selection of the most confident subset of classifiers to label each test sample individually. The optimization level is intended to generate a population of highly accurate candidate classifier ensembles, while the dynamic selection level applies measures of confidence to reveal the candidate ensemble with the highest degree of confidence in the current decision. Experimental results conducted to compare the proposed method to a static overproduce-and-choose strategy and a classical dynamic classifier selection approach demonstrate that our method outperforms both these selection-based methods, and is also more efficient in terms of performance than combining the decisions of all classifiers in the initial pool. 相似文献
14.
Due to a limited control over changing operational conditions and personal physiology, systems used for video-based face recognition are confronted with complex and changing pattern recognition environments. Although a limited amount of reference data is initially available during enrollment, new samples often become available over time, through re-enrollment, post analysis and labeling of operational data, etc. Adaptive multi-classifier systems (AMCSs) are therefore desirable for the design and incremental update of facial models. For real time recognition of individuals appearing in video sequences, facial regions are captured with one or more cameras, and an AMCS must perform fast and efficient matching against the facial model of individual enrolled to the system. In this paper, an incremental learning strategy based on particle swarm optimization (PSO) is proposed to efficiently evolve heterogeneous classifier ensembles in response to new reference data. This strategy is applied to an AMCS where all parameters of a pool of fuzzy ARTMAP (FAM) neural network classifiers (i.e., a swarm of classifiers), each one corresponding to a particle, are co-optimized such that both error rate and network size are minimized. To provide a high level of accuracy over time while minimizing the computational complexity, the AMCS integrates information from multiple diverse classifiers, where learning is guided by an aggregated dynamical niching PSO (ADNPSO) algorithm that optimizes networks according both these objectives. Moreover, pools of FAM networks are evolved to maintain (1) genotype diversity of solutions around local optima in the optimization search space and (2) phenotype diversity in the objective space. Accurate and low cost ensembles are thereby designed by selecting classifiers on the basis of accuracy, and both genotype and phenotype diversity. For proof-of-concept validation, the proposed strategy is compared to AMCSs where incremental learning of FAM networks is guided through mono- and multi-objective optimization. Performance is assessed in terms of video-based error rate and resource requirements under different incremental learning scenarios, where new data is extracted from real-world video streams (IIT-NRC and MoBo). Simulation results indicate that the proposed strategy provides a level of accuracy that is comparable to that of using mono-objective optimization and reference face recognition systems, yet requires a fraction of the computational cost (between 16% and 20% of a mono-objective strategy depending on the data base and scenario). 相似文献
15.
Heiko Koziolek Thomas Goldschmidt Thijmen de Gooijer Dominik Domis Stephan Sehestedt Thomas Gamer Markus Aleksy 《Empirical Software Engineering》2016,21(2):411-448
Corporate organizations sometimes offer similar software products in certain domains due to former company mergers or due to the complexity of the organization. The functional overlap of such products is an opportunity for future systematic reuse to reduce software development and maintenance costs. Therefore, we have tailored existing domain analysis methods to our organization to identify commonalities and variabilities among such products and to assess the potential for software product line (SPL) approaches. As an exploratory case study, we report on our experiences and lessons learned from conducting the domain analysis in four application cases with large-scale software products. We learned that the outcome of a domain analysis was often a smaller integration scenario instead of an SPL and that business case calculations were less relevant for the stakeholders and managers from the business units during this phase. We also learned that architecture reconstruction using a simple block diagram notation aids domain analysis and that large parts of our approach were reusable across application cases. 相似文献
16.
Recent researches in fault classification have shown the importance of accurately selecting the features that have to be used as inputs to the diagnostic model. In this work, a multi-objective genetic algorithm (MOGA) is considered for the feature selection phase. Then, two different techniques for using the selected features to develop the fault classification model are compared: a single classifier based on the feature subset with the best classification performance and an ensemble of classifiers working on different feature subsets. The motivation for developing ensembles of classifiers is that they can achieve higher accuracies than single classifiers. An important issue for an ensemble to be effective is the diversity in the predictions of the base classifiers which constitute it, i.e. their capability of erring on different sub-regions of the pattern space. In order to show the benefits of having diverse base classifiers in the ensemble, two different ensembles have been developed: in the first, the base classifiers are constructed on feature subsets found by MOGAs aimed at maximizing the fault classification performance and at minimizing the number of features of the subsets; in the second, diversity among classifiers is added to the MOGA search as the third objective function to maximize. In both cases, a voting technique is used to effectively combine the predictions of the base classifiers to construct the ensemble output. For verification, some numerical experiments are conducted on a case of multiple-fault classification in rotating machinery and the results achieved by the two ensembles are compared with those obtained by a single optimal classifier. 相似文献
17.
Ana M. Fernández-Sáez Michel R. V. Chaudron Marcela Genero 《Empirical Software Engineering》2018,23(6):3281-3345
UML is a commonly-used graphical language for the modelling of software. Works regarding UML’s effectiveness have studied projects that develop software systems from scratch. Yet the maintenance of software consumes a large share of the overall time and effort required to develop software systems. This study, therefore, focuses on the use of UML in software maintenance. We wish to elicit the practices of the software modelling used during maintenance in industry and understand what are perceived as hurdles and benefits when using modelling. In order to achieve a high level of realism, we performed a case study in a multinational company’s ICT department. The analysis is based on 31 interviews with employees who work on software maintenance projects. The interviewees played different roles and provided complementary views about the use, hurdles and benefits of software modelling and the use of UML. Our study uncovered a broad range of modelling-related practices, which are presented in a theoretical framework that illustrates how these practices are linked to the specific goals and context of software engineering projects. We present a list of recommended practices that contribute to the increased effectiveness of software modelling. The use of software modelling notations (like UML) is considered beneficial for software maintenance, but needs to be tailored to its context. Various practices that contribute to the effective use of modelling are commonly overlooked, suggesting that a more conscious holistic approach with which to integrate modelling practices into the overall software engineering approach is required. 相似文献
18.
19.
Marco Lormans Arie van Deursen Hans-Gerhard Gross 《Empirical Software Engineering》2008,13(6):727-760
Requirements views, such as coverage and status views, are an important asset for monitoring and managing software development
projects. We have developed a method that automates the process of reconstructing these views, and we have built a tool, ReqAnalyst, that supports this method. This paper presents an investigation as to which extent requirements views can be automatically
generated in order to monitor requirements in industrial practice. The paper focuses on monitoring the requirements in test
categories and test cases. In order to retrieve the necessary data, an information retrieval technique, called Latent Semantic
Indexing, was used. The method was applied in an industrial study. A number of requirements views were defined and experiments
were carried out with different reconstruction settings for generating these views. Finally, we explored how these views can
help the developers during the software development process.
Marco Lormans is a PhD researcher at the Software Engineering department of Delft University of Technology and a consultant at Logica. He received a MSc. in computer science from Delft University of Technology. His research interests encompass (global) software development, and in particular the specification and management of requirements, and software quality assurance. Arie van Deursen is a full professor at Delft University of Technology, where he is heading the Software Engineering Research Group. He obtained his MSc degree in computer science in 1990 from the Vrije Universiteit, Amsterdam. From 1996 until 2006 he was a research leader at CWI, the Dutch National Institute for Research in Mathematics in Computer Science. His research interests include software evolution and reverse engineering, as well as model-driven approaches to software engineering. He is one of the co-founders of Software Improvement Group, an Amsterdam-based software consultancy firm in the area of software system analysis. He has served on numerous program committees in the areas of software evolution, maintenance, and software engineering in general, and has been program chair for the IEEE Working Conference on Reverse Engineering in 2002 and 2003. Hans-Gerhard Gross received an MSc in Computer Science (1996) from the University of Applied Sciences, Berlin, Germany, and a PhD in Software Engineering (2000) from the University of Glamorgan, Wales, UK. Following his PhD, Dr. Gross joined the Fraunhofer Institute for Experimental Software Engineering in Kaiserslautern, Germany, where he was responsible for a number of public research projects, devising software testing strategies, and for consulting projects with major German software organizations. Since 2005, Dr. Gross is employed as Assistant Professor at Delft University of Technology, The Netherlands. His research interests encompass all phases of software development, in general, and software testing, in particular. 相似文献
Hans-Gerhard GrossEmail: |
Marco Lormans is a PhD researcher at the Software Engineering department of Delft University of Technology and a consultant at Logica. He received a MSc. in computer science from Delft University of Technology. His research interests encompass (global) software development, and in particular the specification and management of requirements, and software quality assurance. Arie van Deursen is a full professor at Delft University of Technology, where he is heading the Software Engineering Research Group. He obtained his MSc degree in computer science in 1990 from the Vrije Universiteit, Amsterdam. From 1996 until 2006 he was a research leader at CWI, the Dutch National Institute for Research in Mathematics in Computer Science. His research interests include software evolution and reverse engineering, as well as model-driven approaches to software engineering. He is one of the co-founders of Software Improvement Group, an Amsterdam-based software consultancy firm in the area of software system analysis. He has served on numerous program committees in the areas of software evolution, maintenance, and software engineering in general, and has been program chair for the IEEE Working Conference on Reverse Engineering in 2002 and 2003. Hans-Gerhard Gross received an MSc in Computer Science (1996) from the University of Applied Sciences, Berlin, Germany, and a PhD in Software Engineering (2000) from the University of Glamorgan, Wales, UK. Following his PhD, Dr. Gross joined the Fraunhofer Institute for Experimental Software Engineering in Kaiserslautern, Germany, where he was responsible for a number of public research projects, devising software testing strategies, and for consulting projects with major German software organizations. Since 2005, Dr. Gross is employed as Assistant Professor at Delft University of Technology, The Netherlands. His research interests encompass all phases of software development, in general, and software testing, in particular. 相似文献
20.
Claudia AyalaAuthor Vitae Øyvind HaugeAuthor Vitae 《Journal of Systems and Software》2011,84(4):620-637
The success of software development using third party components highly depends on the ability to select a suitable component for the intended application. The evidence shows that there is limited knowledge about current industrial OTS selection practices. As a result, there is often a gap between theory and practice, and the proposed methods for supporting selection are rarely adopted in the industrial practice. This paper's goal is to investigate the actual industrial practice of component selection in order to provide an initial empirical basis that allows the reconciliation of research and industrial endeavors. The study consisted of semi-structured interviews with 23 employees from 20 different software-intensive companies that mostly develop web information system applications. It provides qualitative information that help to further understand these practices, and emphasize some aspects that have been overlooked by researchers. For instance, although the literature claims that component repositories are important for locating reusable components; these are hardly used in industrial practice. Instead, other resources that have not received considerable attention are used with this aim. Practices and potential market niches for software-intensive companies have been also identified. The results are valuable from both the research and the industrial perspectives as they provide a basis for formulating well-substantiated hypotheses and more effective improvement strategies. 相似文献