首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
A critical problem in software development is the monitoring, control and improvement in the processes of software developers. Software processes are often not explicitly modeled, and manuals to support the development work contain abstract guidelines and procedures. Consequently, there are huge differences between ‘actual’ and ‘official’ processes: “the actual process is what you do, with all its omissions, mistakes, and oversights. The official process is what the book, i.e., a quality manual, says you are supposed to do” (Humphrey in A discipline for software engineering. Addison-Wesley, New York, 1995). Software developers lack support to identify, analyze and better understand their processes. Consequently, process improvements are often not based on an in-depth understanding of the ‘actual’ processes, but on organization-wide improvement programs or ad hoc initiatives of individual developers. In this paper, we show that, based on particular data from software development projects, the underlying software development processes can be extracted and that automatically more realistic process models can be constructed. This is called software process mining (Rubin et al. in Process mining framework for software processes. Software process dynamics and agility. Springer Berlin, Heidelberg, 2007). The goal of process mining is to better understand the development processes, to compare constructed process models with the ‘official’ guidelines and procedures in quality manuals and, subsequently, to improve development processes. This paper reports on process mining case studies in a large industrial company in The Netherlands. The subject of the process mining is a particular process: the change control board (CCB) process. The results of process mining are fed back to practice in order to subsequently improve the CCB process.  相似文献   

2.
Tackling data with gap-interval time is an important issue faced by the temporal database community. While a number of interval logics have been developed, less work has been reported on gap-interval time. To represent and handle data with time, a clause ‘when’ is generally added into each conventional operator so as to incorporate time dimension in temporal databases, which clause ‘when’ is really a temporal logical sentence. Unfortunately, though several temporal database models have dealt with data with gap-interval time, they still put interval calculus methods on gap-intervals. Certainly, it is inadequate to tackle data with gap-interval time using interval calculus methods in historical databases. Consequently, what temporal expressions are valid in the clause ‘when’ for tackling data with gap-interval time? Further, what temporal operations and relations can be used in the clause ‘when’? To solve these problems, a formal tool for supporting data with gap-interval time must be explored. For this reason, a gap-interval-based logic for historical databases is established in this paper. In particular, we discuss how to determine the temporal relationships after an event explodes. This can be used to describe the temporal forms of tuples splitting in historical databases. Received 2 February 1999 / Revised 9 May 1999 / Accepted in revised form 20 November 1999  相似文献   

3.
Computing LTS Regression for Large Data Sets   总被引:9,自引:0,他引:9  
Data mining aims to extract previously unknown patterns or substructures from large databases. In statistics, this is what methods of robust estimation and outlier detection were constructed for, see e.g. Rousseeuw and Leroy (1987). Here we will focus on least trimmed squares (LTS) regression, which is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The computation time of existing LTS algorithms grows too much with the size of the data set, precluding their use for data mining. In this paper we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call ‘selective iteration’ and ‘nested extensions’. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. This allows us to apply FAST-LTS to large databases.  相似文献   

4.
An Overview of Data Mining and Knowledge Discovery   总被引:9,自引:0,他引:9       下载免费PDF全文
With massive amounts of data stored in databases,mining information and knowledge in databases has become an important issue in recent research.Researchers in many different fields have shown great interest in date mining and knowledge discovery in databases.Several emerging applications in information providing services,such as data warehousing and on-line services over the Internet,also call for various data mining and knowledge discovery tchniques to understand used behavior better,to improve the service provided,and to increase the business opportunities.In response to such a demand,this article is to provide a comprehensive survey on the data mining and knowledge discorvery techniques developed recently,and introduce some real application systems as well.In conclusion,this article also lists some problems and challenges for further research.  相似文献   

5.
The needs of efficient and flexible information retrieval on multi-structural data stored in database and network are significantly growing. Especially, its flexibility plays one of the key roles to acquire relevant information desired by users in retrieval process. However, most of the existing approaches are dedicated to a single content and data structure respectively, e.g., relational database and natural text. In this work, we propose “Multi-Structure Information Retrieval” (MSIR) approach applicable to various types of contents and data structures by adapting a small part of the approach to data structures. The power of this approach comes from the use of the invariant feature information obtained from byte patterns in the files through some mathematical transformation. The experimental evaluation of the proposed approach for both artificial and real data indicates its high feasibility. Fuminori Adachi: He received his Master of engineering from Osaka University in ’03. He is enrolled in the doctoral course of Osaka University from ’03. His current research interest includes scientific discovery, data mining and machine learning techniques. Takashi Washio, Ph.D.: He received his Ph.D. from Tohoku University in ’88. In ’88, he became a visiting reseacher in Massachusetts Institute of Technology. In ’90, he joined Mitsubishi Research Institute Inc., and is working for Osaka University from ’96. His current research interest includes scientific discovery, data mining and machine learning techniques. Atsushi Fujimoto: He is enrolled in the master cource of Osaka University from ’03. His Current research interest includes correlation analysis, data mining and machine learning techniques. Hiroshi Motoda, Ph.D.: He received his Ph.D. from University of Tokyo in ’72. In ’67, he joined Hitachi Ltd. and has been working for Osaka University since ’96. His current research interest includes scientific discovery, data mining and machine learning. Hidemitsu Hanafusa: He received Master of Engineering from Keio University in ’83. In ’83, he joined The Kansai Electric Power Co. Ins. (KEPCO). He researched on Maintenance Support System at INSS from ’97 to ’02. Now, he is working in KEPCO.  相似文献   

6.
Data Mining: A Key Contribution to E-business   总被引:5,自引:0,他引:5  
Data mining consists of extracting knowledge from huge volumes of data, allowing better business decisions to be taken. In this paper, we show how data mining is integrated in the knowledge discovery process. We highlight its potential applications and the techniques that are often used to perform it. Association rule mining is presented as a case study. Furthermore, we show through an integrated architecture how data mining can contribute to e-business via the new technologies. Finally, we present some commercially-available architectures.  相似文献   

7.
LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH   总被引:49,自引:0,他引:49  
Knowledge discovery in databases, or dala mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems.  相似文献   

8.
It is frequently the case that data mining is carried out in an environment which contains noisy and missing data. This is particularly likely to be true when the data were originally collected for different purposes, as is commonly the case in data warehousing. In this paper we discuss the use of domain knowledge, e.g., integrity constraints or a concept hierarchy, to re‐engineer the database and allocate sets to which missing or unacceptable outlying data may belong. Attribute‐oriented knowledge discovery has proved to be a powerful approach for mining multi‐level data in large databases. Such methods are set‐oriented in that attribute values are considered to belong to subsets of the domain. These subsets may be provided directly by the database or derived from a knowledge base using inductive logic programming to re‐engineer the database. In this paper we develop an algorithm which allows us to aggregate imprecise data and use it for multi‐level rule induction and knowledge discovery. ©2000 John Wiley & Sons, Inc.  相似文献   

9.
DSM as a knowledge capture tool in CODE environment   总被引:1,自引:0,他引:1  
A design structure matrix (DSM) provides a simple, compact, and visual representation of a complex system/ process. This paper shows how DSM, a system engineering tool, is applied as a knowledge capture (acquisition) tool in a generic NPD process. The acquired knowledge (identified in the DSM) is provided in the form of Questionnaires, which are organized into five performance indicators of the organization namely ‘Marketing’, ‘Technical’, ‘Financial’, ‘Resource Management’, and ‘Project Management’. Industrial application is carried out for knowledge validation. It is found form the application that the acquired knowledge helps NPD teams, managers and stakeholders to benchmark their NPD endeavor and select areas to focus their improvement efforts (up to 80% valid).  相似文献   

10.
Fingerprint classification is a challenging pattern recognition problem which plays a fundamental role in most of the large fingerprint-based identification systems. Due to the intrinsic class ambiguity and the difficulty of processing very low quality images (which constitute a significant proportion), automatic fingerprint classification performance is currently below operating requirements, and most of the classification work is still carried out manually or semi-automatically. This paper explores the advantages of combining the MASKS and MKL-based classifiers, which we have specifically designed for the fingerprint classification task. In particular, a combination at the ‘abstract level’ is proposed for exclusive classification, whereas a fusion at the ‘measurement level’ is introduced for continuous classification. The advantages of coupling these distinct techniques are well evident; in particular, in the case of exclusive classification, the FBI challenge, requiring a classification error ≤ 1% at 20% rejection, was met on NIST-DB14. Received: 06 November 2000, Received in revised form: 25 October 2001, Accepted: 03 January 2002  相似文献   

11.
David Smith 《AI & Society》2007,21(4):421-428
This article examines the UNESCO Convention on Intangible Cultural Heritage. It accepts the general case made by UNESCO, but urges greater attention to the ‘real-world’ knowledge of ordinary people. The paper rejects taxonomies of knowledge based on metaphysical discussions of knowing. Instead, it argues for an approach to knowledge based on the social production of ‘knowledge acts’. It concludes by asserting that support for the diversity of social enactment of knowledge could have valuable outcomes in the form of new ways of understanding new and emerging technologies.  相似文献   

12.
The scale of Taiwan’s mold industry was ranked the sixth in the world. But, under the global competitive pressure, Taiwan has lost its competitive advantage gradually. The only chance of Taiwan’s mold industry lies in improving the competitive abilities in product research, development and design. In mold manufacturing cycle, mold tooling test plays a very important role at accelerating the speed of production. An experienced engineer can minimize the error rate of mold tooling test according to his rich experiences in parameter adjustment. However, this experience is mostly implicit without theoretical basis and its knowledge is difficult to be transmitted. Benefiting from the well development of data mining technologies, this study aimed at constructing an intelligent classification knowledge discovery system for mold tooling test based on decision tree algorithm, so as to explore and accumulate the experimental knowledge for the use of Taiwan’s mold industry. This study took the only high-alloy steel manufacturer in Taiwan for case study, and performed system validation with 66 record data. The results showed the accuracy rates of prediction of training data and testing data are 97.6 and 86.9%, respectively. In addition, this study explored two classification knowledge rules and proposed concrete proposals for tooling test parameter adjustment. Moreover, this study provided two ways, rule verification and effectiveness comparison of four mining algorithms, to conduct model verification. The experimental results showed the decision tree algorithm has an excellent discriminatory power of classification and is able to provide clear and simple reference rules for decisions.  相似文献   

13.
Linguistic Problems with Requirements and Knowledge Elicitation   总被引:1,自引:0,他引:1  
Human and conversational aspects of requirements and knowledge identification are employed to show that requirements ‘engineering’ is not the same as civil engineering or scientific problem solving. Not only can requirements not be made fully explicit at the start of a project, they cannot be made fully explicit at all. A need is identified to enhance computer-based information systems (CBIS) development methods to accommodate: plurality of incommensurable perspectives, languages and agendas; dynamic representations of system features that can be experienced rather than abstracted and forced into an abstract paper-based representation; recognition that CBIS development is in general a continuous process where users changing their minds is a natural and necessary indication or organisational vitality.  It is suggested that prototyping and rapid application development go some way to addressing these requirements but that they require further development in the light of the theoretical light thrown on the nature of the problem.  相似文献   

14.
The paper sets out the challenges facing the Police in respect of the detection and prevention of the volume crime of burglary. A discussion of data mining and decision support technologies that have the potential to address these issues is undertaken and illustrated with reference the authors’ work with three Police Services. The focus is upon the use of “soft” forensic evidence which refers to modus operandi and the temporal and geographical features of the crime, rather than “hard” evidence such as DNA or fingerprint evidence. Three objectives underpin this paper. First, given the continuing expansion of forensic computing and its role in the emergent discipline of Crime Science, it is timely to present a review of existing methodologies and research. Second, it is important to extract some practical lessons concerning the application of computer science within this forensic domain. Finally, from the lessons to date, a set of conclusions will be advanced, including the need for multidisciplinary input to guide further developments in the design of such systems. The objectives are achieved by first considering the task performed by the intended systems users. The discussion proceeds by identifying the portions of these tasks for which automation would be both beneficial and feasible. The knowledge discovery from databases process is then described, starting with an examination of the data that police collect and the reasons for storing it. The discussion progresses to the development of crime matching and predictive knowledge which are operationalised in decision support software. The paper concludes by arguing that computer science technologies which can support criminal investigations are wide ranging and include geographical information systems displays, clustering and link analysis algorithms and the more complex use of data mining technology for profiling crimes or offenders and matching and predicting crimes. We also argue that knowledge from disciplines such as forensic psychology, criminology and statistics are essential to the efficient design of operationally valid systems.  相似文献   

15.
Process mining is the research domain that is dedicated to the a posteriori analysis of business process executions. The techniques developed within this research area are specifically designed to provide profound insight by exploiting the untapped reservoir of knowledge that resides within event logs of information systems. Process discovery is one specific subdomain of process mining that entails the discovery of control-flow models from such event logs. Assessing the quality of discovered process models is an essential element, both for conducting process mining research as well as for the use of process mining in practice. In this paper, a multi-dimensional quality assessment is presented in order to comprehensively evaluate process discovery techniques. In contrast to previous studies, the major contribution of this paper is the use of eight real-life event logs. For instance, we show that evaluation based on real-life event logs significantly differs from the traditional approach to assess process discovery techniques using artificial event logs. In addition, we provide an extensive overview of available process discovery techniques and we describe how discovered process models can be assessed regarding both accuracy and comprehensibility. The results of our study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting. However, it is also shown that, particularly for highly complex event logs, knowledge discovery from such data sets can become a major problem for traditional process discovery techniques.  相似文献   

16.
目前大多数数据挖掘方法是从单关系中发现模式,而多关系数据挖掘(MRDM)则可直接从关系数据库的多表中抽取有效模式。MRDM可以解决原有命题数据挖掘方法不能解决的问题,它不仅有更强的信息表示能力,可以表示和发现更复杂的模式,还可以在挖掘进程中有效地利用背景知识来提高挖掘效率和准确率。近年来,借鉴归纳逻辑程序设计(ILP)技术,已经形成许多多关系数据挖掘方法,如关系关联规则挖掘方法、关系分类聚类方法等。  相似文献   

17.
A Reduction Algorithm Meeting Users Requirements   总被引:9,自引:0,他引:9       下载免费PDF全文
Generally a database encompasses various kinds of knowledge and is shared by many users.Different users may prefer different kinds of knowledge.So it is important for a data mining algorithm to output specific knowledge according to users‘ current requirements (preference).We call this kind of data mining requirement-oriented knowledge discovery (ROKD).When the rough set theory is used in data mining,the ROKD problem is how to find a reduct and corresponding rules interesting for the user.Since reducts and rules are generated in the same way,this paper only concerns with how to find a particular reduct.The user‘s requirement is described by an order of attributes,called attribute order,which implies the importance of attributes for the user.In the order,more important attributes are located before less important ones.then the problem becomes how to find a reduct including those attributes anterior in the attribute order.An approach to dealing with such a problem is proposed.And its completeness for reduct is proved.After that,three kinds of attribute order are developed to describe various user requirements.  相似文献   

18.
Learning Organisations: The Process of Innovation and Technological Change   总被引:1,自引:0,他引:1  
In the present scenario of globalisation, knowledge has become the prime factor of production for competitive advantage. This calls for acquisition and utilisation of knowledge for innovation and technical change on a constant basis, which is only possible in a ‘learning organisation’. Innovative activities of a learning organisation are influenced by three main factors: (1) internal learning; (2) external learning; and (3) the innovation strategies decided upon by the enterprise management. An assumption has been made that, particularly in developing countries, absorption and adaptation of technologies, i.e. indigenisation, take place through a process of ‘learning by doing’. Taking this into consideration, this paper focuses on a few case studies carried out at NISTADS, New Delhi, India, on small enterprises in the formal as well as traditional sectors, highlighting the learning process in an organisational context and how it brings in innovation and technological change at enterprise level. The study demonstrates that the learning environment in an organisational context is an indispensable process to be innovative and building up capabilities for technological change. This in turn also calls for strong networking of the enterprises with academia, R&D institutions and other enterprises, to create knowledge clusters. This builds up a strong case for a network approach of learning organisations not only at the regional level but also at the cross-cultural level for constant innovation and technical change.  相似文献   

19.
The process of knowledge discovery in databases consists of several steps that are iterative and interactive. In each application, to go through this process the user has to exploit different algorithms and their settings that usually yield multiple models. Model selection, that is, the selection of appropriate models or algorithms to achieve such models, requires meta-knowledge of algorithm/model and model performance metrics. Therefore, model selection is usually a difficult task for the user. We believe that simplifying the process of model selection for the user is crucial to the success of real-life knowledge discovery activities. As opposed to most related work that aims to automate model selection, in our view model selection is a semiautomatic process, requiring an effective collaboration between the user and the discovery system. For such a collaboration, our solution is to give the user the ability to try various alternatives and to compare competing models quantitatively by performance metrics, and qualitatively by effective visualization. This paper presents our research on model selection and visualization in the development of a knowledge discovery system called D2MS. The paper addresses the motivation of model selection in knowledge discovery and related work, gives an overview of D2MS, and describes its solution to model selection and visualization. It then presents the usefulness of D2MS model selection in two case studies of discovering medical knowledge in hospital data—on meningitis and stomach cancer—using three data mining methods of decision trees, conceptual clustering, and rule induction.  相似文献   

20.
In this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE (MIning in muLtiple sEquences), an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns to avoid redundant data scanning, and therefore can effectively speed up the new patterns’ discovery process. Another unique feature of MILE is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As MILE consumes more memory than PrefixSpan, we also present a solution to trade time efficiency in memory constrained environments.
Xingquan ZhuEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号