共查询到20条相似文献,搜索用时 78 毫秒
1.
Jana Samalikova Rob Kusters Jos Trienekens Ton Weijters Paul Siemons 《Software Quality Journal》2011,19(1):101-120
A critical problem in software development is the monitoring, control and improvement in the processes of software developers.
Software processes are often not explicitly modeled, and manuals to support the development work contain abstract guidelines
and procedures. Consequently, there are huge differences between ‘actual’ and ‘official’ processes: “the actual process is
what you do, with all its omissions, mistakes, and oversights. The official process is what the book, i.e., a quality manual,
says you are supposed to do” (Humphrey in A discipline for software engineering. Addison-Wesley, New York, 1995). Software developers lack support to identify, analyze and better understand their processes. Consequently, process improvements
are often not based on an in-depth understanding of the ‘actual’ processes, but on organization-wide improvement programs
or ad hoc initiatives of individual developers. In this paper, we show that, based on particular data from software development
projects, the underlying software development processes can be extracted and that automatically more realistic process models
can be constructed. This is called software process mining (Rubin et al. in Process mining framework for software processes.
Software process dynamics and agility. Springer Berlin, Heidelberg, 2007). The goal of process mining is to better understand the development processes, to compare constructed process models with
the ‘official’ guidelines and procedures in quality manuals and, subsequently, to improve development processes. This paper
reports on process mining case studies in a large industrial company in The Netherlands. The subject of the process mining
is a particular process: the change control board (CCB) process. The results of process mining are fed back to practice in
order to subsequently improve the CCB process. 相似文献
2.
Shichao Zhang 《Knowledge and Information Systems》2000,2(1):97-114
Tackling data with gap-interval time is an important issue faced by the temporal database community. While a number of interval
logics have been developed, less work has been reported on gap-interval time. To represent and handle data with time, a clause
‘when’ is generally added into each conventional operator so as to incorporate time dimension in temporal databases, which
clause ‘when’ is really a temporal logical sentence. Unfortunately, though several temporal database models have dealt with
data with gap-interval time, they still put interval calculus methods on gap-intervals. Certainly, it is inadequate to tackle
data with gap-interval time using interval calculus methods in historical databases. Consequently, what temporal expressions
are valid in the clause ‘when’ for tackling data with gap-interval time? Further, what temporal operations and relations can
be used in the clause ‘when’? To solve these problems, a formal tool for supporting data with gap-interval time must be explored.
For this reason, a gap-interval-based logic for historical databases is established in this paper. In particular, we discuss
how to determine the temporal relationships after an event explodes. This can be used to describe the temporal forms of tuples
splitting in historical databases.
Received 2 February 1999 / Revised 9 May 1999 / Accepted in revised form 20 November 1999 相似文献
3.
Computing LTS Regression for Large Data Sets 总被引:9,自引:0,他引:9
Data mining aims to extract previously unknown patterns or substructures from large databases. In statistics, this is what
methods of robust estimation and outlier detection were constructed for, see e.g. Rousseeuw and Leroy (1987). Here we will focus on least trimmed squares (LTS) regression, which is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The computation time of existing LTS algorithms grows too much with the size of the data set, precluding their use for data
mining. In this paper we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics
and sums of squared residuals, and techniques which we call ‘selective iteration’ and ‘nested extensions’. We also use an
intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas
for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude.
This allows us to apply FAST-LTS to large databases. 相似文献
4.
With massive amounts of data stored in databases,mining information and knowledge in databases has become an important issue in recent research.Researchers in many different fields have shown great interest in date mining and knowledge discovery in databases.Several emerging applications in information providing services,such as data warehousing and on-line services over the Internet,also call for various data mining and knowledge discovery tchniques to understand used behavior better,to improve the service provided,and to increase the business opportunities.In response to such a demand,this article is to provide a comprehensive survey on the data mining and knowledge discorvery techniques developed recently,and introduce some real application systems as well.In conclusion,this article also lists some problems and challenges for further research. 相似文献
5.
Fuminori Adachi Takashi Washio Atsushi Fujimoto Hiroshi Motoda Hidemitsu Hanafusa 《New Generation Computing》2005,23(4):291-313
The needs of efficient and flexible information retrieval on multi-structural data stored in database and network are significantly
growing. Especially, its flexibility plays one of the key roles to acquire relevant information desired by users in retrieval
process. However, most of the existing approaches are dedicated to a single content and data structure respectively, e.g.,
relational database and natural text. In this work, we propose “Multi-Structure Information Retrieval” (MSIR) approach applicable
to various types of contents and data structures by adapting a small part of the approach to data structures. The power of
this approach comes from the use of the invariant feature information obtained from byte patterns in the files through some
mathematical transformation. The experimental evaluation of the proposed approach for both artificial and real data indicates
its high feasibility.
Fuminori Adachi: He received his Master of engineering from Osaka University in ’03. He is enrolled in the doctoral course of Osaka University
from ’03. His current research interest includes scientific discovery, data mining and machine learning techniques.
Takashi Washio, Ph.D.: He received his Ph.D. from Tohoku University in ’88. In ’88, he became a visiting reseacher in Massachusetts Institute of
Technology. In ’90, he joined Mitsubishi Research Institute Inc., and is working for Osaka University from ’96. His current
research interest includes scientific discovery, data mining and machine learning techniques.
Atsushi Fujimoto: He is enrolled in the master cource of Osaka University from ’03. His Current research interest includes correlation analysis,
data mining and machine learning techniques.
Hiroshi Motoda, Ph.D.: He received his Ph.D. from University of Tokyo in ’72. In ’67, he joined Hitachi Ltd. and has been working for Osaka University
since ’96. His current research interest includes scientific discovery, data mining and machine learning.
Hidemitsu Hanafusa: He received Master of Engineering from Keio University in ’83. In ’83, he joined The Kansai Electric Power Co. Ins. (KEPCO).
He researched on Maintenance Support System at INSS from ’97 to ’02. Now, he is working in KEPCO. 相似文献
6.
Data Mining: A Key Contribution to E-business 总被引:5,自引:0,他引:5
Nordine Melab 《Information & Communications Technology Law》2001,10(3):309-318
Data mining consists of extracting knowledge from huge volumes of data, allowing better business decisions to be taken. In this paper, we show how data mining is integrated in the knowledge discovery process. We highlight its potential applications and the techniques that are often used to perform it. Association rule mining is presented as a case study. Furthermore, we show through an integrated architecture how data mining can contribute to e-business via the new technologies. Finally, we present some commercially-available architectures. 相似文献
7.
LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH 总被引:49,自引:0,他引:49
Knowledge discovery in databases, or dala mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems. 相似文献
8.
It is frequently the case that data mining is carried out in an environment which contains noisy and missing data. This is particularly likely to be true when the data were originally collected for different purposes, as is commonly the case in data warehousing. In this paper we discuss the use of domain knowledge, e.g., integrity constraints or a concept hierarchy, to re‐engineer the database and allocate sets to which missing or unacceptable outlying data may belong. Attribute‐oriented knowledge discovery has proved to be a powerful approach for mining multi‐level data in large databases. Such methods are set‐oriented in that attribute values are considered to belong to subsets of the domain. These subsets may be provided directly by the database or derived from a knowledge base using inductive logic programming to re‐engineer the database. In this paper we develop an algorithm which allows us to aggregate imprecise data and use it for multi‐level rule induction and knowledge discovery. ©2000 John Wiley & Sons, Inc. 相似文献
9.
DSM as a knowledge capture tool in CODE environment 总被引:1,自引:0,他引:1
A design structure matrix (DSM) provides a simple, compact, and visual representation of a complex system/ process. This paper
shows how DSM, a system engineering tool, is applied as a knowledge capture (acquisition) tool in a generic NPD process. The
acquired knowledge (identified in the DSM) is provided in the form of Questionnaires, which are organized into five performance
indicators of the organization namely ‘Marketing’, ‘Technical’, ‘Financial’, ‘Resource Management’, and ‘Project Management’.
Industrial application is carried out for knowledge validation. It is found form the application that the acquired knowledge
helps NPD teams, managers and stakeholders to benchmark their NPD endeavor and select areas to focus their improvement efforts
(up to 80% valid). 相似文献
10.
Fingerprint classification is a challenging pattern recognition problem which plays a fundamental role in most of the large
fingerprint-based identification systems. Due to the intrinsic class ambiguity and the difficulty of processing very low quality
images (which constitute a significant proportion), automatic fingerprint classification performance is currently below operating
requirements, and most of the classification work is still carried out manually or semi-automatically. This paper explores
the advantages of combining the MASKS and MKL-based classifiers, which we have specifically designed for the fingerprint classification
task. In particular, a combination at the ‘abstract level’ is proposed for exclusive classification, whereas a fusion at the
‘measurement level’ is introduced for continuous classification. The advantages of coupling these distinct techniques are
well evident; in particular, in the case of exclusive classification, the FBI challenge, requiring a classification error
≤ 1% at 20% rejection, was met on NIST-DB14.
Received: 06 November 2000, Received in revised form: 25 October 2001, Accepted: 03 January 2002 相似文献
11.
David Smith 《AI & Society》2007,21(4):421-428
This article examines the UNESCO Convention on Intangible Cultural Heritage. It accepts the general case made by UNESCO, but
urges greater attention to the ‘real-world’ knowledge of ordinary people. The paper rejects taxonomies of knowledge based
on metaphysical discussions of knowing. Instead, it argues for an approach to knowledge based on the social production of
‘knowledge acts’. It concludes by asserting that support for the diversity of social enactment of knowledge could have valuable
outcomes in the form of new ways of understanding new and emerging technologies. 相似文献
12.
Duen-Yian Yeh Ching-Hsue Cheng Shih-Chuan Hsiao 《Journal of Intelligent Manufacturing》2011,22(4):585-595
The scale of Taiwan’s mold industry was ranked the sixth in the world. But, under the global competitive pressure, Taiwan
has lost its competitive advantage gradually. The only chance of Taiwan’s mold industry lies in improving the competitive
abilities in product research, development and design. In mold manufacturing cycle, mold tooling test plays a very important
role at accelerating the speed of production. An experienced engineer can minimize the error rate of mold tooling test according
to his rich experiences in parameter adjustment. However, this experience is mostly implicit without theoretical basis and
its knowledge is difficult to be transmitted. Benefiting from the well development of data mining technologies, this study
aimed at constructing an intelligent classification knowledge discovery system for mold tooling test based on decision tree
algorithm, so as to explore and accumulate the experimental knowledge for the use of Taiwan’s mold industry. This study took
the only high-alloy steel manufacturer in Taiwan for case study, and performed system validation with 66 record data. The
results showed the accuracy rates of prediction of training data and testing data are 97.6 and 86.9%, respectively. In addition,
this study explored two classification knowledge rules and proposed concrete proposals for tooling test parameter adjustment.
Moreover, this study provided two ways, rule verification and effectiveness comparison of four mining algorithms, to conduct
model verification. The experimental results showed the decision tree algorithm has an excellent discriminatory power of classification
and is able to provide clear and simple reference rules for decisions. 相似文献
13.
Linguistic Problems with Requirements and Knowledge Elicitation 总被引:1,自引:0,他引:1
David C. Sutton 《Requirements Engineering》2000,5(2):114-124
Human and conversational aspects of requirements and knowledge identification are employed to show that requirements ‘engineering’
is not the same as civil engineering or scientific problem solving. Not only can requirements not be made fully explicit at
the start of a project, they cannot be made fully explicit at all. A need is identified to enhance computer-based information
systems (CBIS) development methods to accommodate: plurality of incommensurable perspectives, languages and agendas; dynamic
representations of system features that can be experienced rather than abstracted and forced into an abstract paper-based
representation; recognition that CBIS development is in general a continuous process where users changing their minds is a
natural and necessary indication or organisational vitality.
It is suggested that prototyping and rapid application development go some way to addressing these requirements but that
they require further development in the light of the theoretical light thrown on the nature of the problem. 相似文献
14.
The paper sets out the challenges facing the Police in respect of the detection and prevention of the volume crime of burglary.
A discussion of data mining and decision support technologies that have the potential to address these issues is undertaken
and illustrated with reference the authors’ work with three Police Services. The focus is upon the use of “soft” forensic
evidence which refers to modus operandi and the temporal and geographical features of the crime, rather than “hard” evidence
such as DNA or fingerprint evidence. Three objectives underpin this paper. First, given the continuing expansion of forensic
computing and its role in the emergent discipline of Crime Science, it is timely to present a review of existing methodologies
and research. Second, it is important to extract some practical lessons concerning the application of computer science within
this forensic domain. Finally, from the lessons to date, a set of conclusions will be advanced, including the need for multidisciplinary
input to guide further developments in the design of such systems. The objectives are achieved by first considering the task
performed by the intended systems users. The discussion proceeds by identifying the portions of these tasks for which automation
would be both beneficial and feasible. The knowledge discovery from databases process is then described, starting with an
examination of the data that police collect and the reasons for storing it. The discussion progresses to the development of
crime matching and predictive knowledge which are operationalised in decision support software. The paper concludes by arguing
that computer science technologies which can support criminal investigations are wide ranging and include geographical information
systems displays, clustering and link analysis algorithms and the more complex use of data mining technology for profiling
crimes or offenders and matching and predicting crimes. We also argue that knowledge from disciplines such as forensic psychology,
criminology and statistics are essential to the efficient design of operationally valid systems. 相似文献
15.
Process mining is the research domain that is dedicated to the a posteriori analysis of business process executions. The techniques developed within this research area are specifically designed to provide profound insight by exploiting the untapped reservoir of knowledge that resides within event logs of information systems. Process discovery is one specific subdomain of process mining that entails the discovery of control-flow models from such event logs. Assessing the quality of discovered process models is an essential element, both for conducting process mining research as well as for the use of process mining in practice. In this paper, a multi-dimensional quality assessment is presented in order to comprehensively evaluate process discovery techniques. In contrast to previous studies, the major contribution of this paper is the use of eight real-life event logs. For instance, we show that evaluation based on real-life event logs significantly differs from the traditional approach to assess process discovery techniques using artificial event logs. In addition, we provide an extensive overview of available process discovery techniques and we describe how discovered process models can be assessed regarding both accuracy and comprehensibility. The results of our study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting. However, it is also shown that, particularly for highly complex event logs, knowledge discovery from such data sets can become a major problem for traditional process discovery techniques. 相似文献
16.
17.
Generally a database encompasses various kinds of knowledge and is shared by many users.Different users may prefer different kinds of knowledge.So it is important for a data mining algorithm to output specific knowledge according to users‘ current requirements (preference).We call this kind of data mining requirement-oriented knowledge discovery (ROKD).When the rough set theory is used in data mining,the ROKD problem is how to find a reduct and corresponding rules interesting for the user.Since reducts and rules are generated in the same way,this paper only concerns with how to find a particular reduct.The user‘s requirement is described by an order of attributes,called attribute order,which implies the importance of attributes for the user.In the order,more important attributes are located before less important ones.then the problem becomes how to find a reduct including those attributes anterior in the attribute order.An approach to dealing with such a problem is proposed.And its completeness for reduct is proved.After that,three kinds of attribute order are developed to describe various user requirements. 相似文献
18.
V. P. Kharbanda 《AI & Society》2002,16(1-2):89-99
In the present scenario of globalisation, knowledge has become the prime factor of production for competitive advantage.
This calls for acquisition and utilisation of knowledge for innovation and technical change on a constant basis, which is
only possible in a ‘learning organisation’. Innovative activities of a learning organisation are influenced by three main
factors: (1) internal learning; (2) external learning; and (3) the innovation strategies decided upon by the enterprise management.
An assumption has been made that, particularly in developing countries, absorption and adaptation of technologies, i.e. indigenisation,
take place through a process of ‘learning by doing’. Taking this into consideration, this paper focuses on a few case studies
carried out at NISTADS, New Delhi, India, on small enterprises in the formal as well as traditional sectors, highlighting
the learning process in an organisational context and how it brings in innovation and technological change at enterprise level.
The study demonstrates that the learning environment in an organisational context is an indispensable process to be innovative
and building up capabilities for technological change. This in turn also calls for strong networking of the enterprises with
academia, R&D institutions and other enterprises, to create knowledge clusters. This builds up a strong case for a network
approach of learning organisations not only at the regional level but also at the cross-cultural level for constant innovation
and technical change. 相似文献
19.
Tu Bao Ho Trong Dung Nguyen Hiroshi Shimodaira Masayuki Kimura 《Applied Intelligence》2003,19(1-2):125-141
The process of knowledge discovery in databases consists of several steps that are iterative and interactive. In each application, to go through this process the user has to exploit different algorithms and their settings that usually yield multiple models. Model selection, that is, the selection of appropriate models or algorithms to achieve such models, requires meta-knowledge of algorithm/model and model performance metrics. Therefore, model selection is usually a difficult task for the user. We believe that simplifying the process of model selection for the user is crucial to the success of real-life knowledge discovery activities. As opposed to most related work that aims to automate model selection, in our view model selection is a semiautomatic process, requiring an effective collaboration between the user and the discovery system. For such a collaboration, our solution is to give the user the ability to try various alternatives and to compare competing models quantitatively by performance metrics, and qualitatively by effective visualization. This paper presents our research on model selection and visualization in the development of a knowledge discovery system called D2MS. The paper addresses the motivation of model selection in knowledge discovery and related work, gives an overview of D2MS, and describes its solution to model selection and visualization. It then presents the usefulness of D2MS model selection in two case studies of discovering medical knowledge in hospital data—on meningitis and stomach cancer—using three data mining methods of decision trees, conceptual clustering, and rule induction. 相似文献
20.
In this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential
pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE (MIning in muLtiple sEquences), an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns
to avoid redundant data scanning, and therefore can effectively speed up the new patterns’ discovery process. Another unique
feature of MILE is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process
to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As
MILE consumes more memory than PrefixSpan, we also present a solution to trade time efficiency in memory constrained environments.
相似文献
Xingquan ZhuEmail: |