共查询到20条相似文献,搜索用时 15 毫秒
1.
Evelina Lamma Fabrizio Riguzzi Sergio Storari Paola Mello Anna Nanetti 《New Generation Computing》2003,21(2):123-133
A huge amount of data is daily collected from clinical microbiology laboratories. These data concern the resistance or susceptibility
of bacteria to tested antibiotics. Almost all microbiology laboratories follow standard antibiotic testing guidelines which
suggest antibiotic test execution methods and result interpretation and validation (among them, those annually published by
NCCLS2,3). Guidelines basically specify, for each species, the antibiotics to be tested, how to interpret the results of tests and
a list of exceptions regarding particular antibiotic test results. Even if these standards are quite assessed, they do not
consider peculiar features of a given hospital laboratory, which possibly influence the antimicrobial test results, and the
further validation process.
In order to improve and better tailor the validation process, we have applied knowledge discovery techniques, and data mining
in particular, to microbiological data with the purpose of discovering new validation rules, not yet included in NCCLS guidelines,
but considered plausible and correct by interviewed experts. In particular, we applied the knowledge discovery process in
order to find (association) rules relating to each other the susceptibility or resistance of a bacterium to different antibiotics.
This approach is not antithetic, but complementary to that based on NCCLS rules: it proved very effective in validating some
of them, and also in extending that compendium. In this respect, the new discovered knowledge has lead microbiologists to
be aware of new correlations among some antimicrobial test results, which were previously unnoticed. Last but not least, the
new discovered rules, taking into account the history of the considered laboratory, are better tailored to the hospital situation,
and this is very important since some resistances to antibiotics are specific to particular, local hospital environments.
Evelina Lamma, Ph.D.: She got her degree in Electrical Engineering at the University of Bologna in 1985, and her Ph.D. in Computer Science in 1990.
Her research activity centers on logic programming languages, artificial intelligence and agent-based programming. She was
co-organizers of the 3rd International Workshop on Extensions of Logic Programming ELP92, held in Bologna in February 1992,
and of the 6th Italian Congress on Artificial Intelligence, held in Bologna in September 1999. She is a member of the Italian
Association for Artificial Intelligence (AI*IA), associated with ECCAI. Currently, she is Full Professor at the University of Ferrara, where she teaches Artificial Intelligence
and Fondations of Computer Science.
Fabrizio Riguzzi, Ph.D.: He is Assistant Professor at the Department of Engineering of the University of Ferrara, Italy. He received his Laurea from
the University of Bologna in 1999. He joined the Department of Engineering of the University of Ferrara in 1999. He has been
a visiting researcher at the University of Cyprus and at the New University of Lisbon. His research interests include: data
mining (and in particular methods for learning from multirelational data), machine learning, belief revision, genetic algorithms
and software engineering.
Sergio Storari: He got his degree in Electrical Engineering at the University of Ferrara in 1998. His research activity centers on artificial
intelligence, knowledge-based systems, data mining and multi-agent systems. He is a member of the Italian Association for
Artificial Intelligence (AI*IA), associated with ECCAI. Currently, he is attending the third year of Ph.D. course about “Study and application of Artificial
Intelligence techniques for medical data analysis” at DEIS University of Bologna.
Paola Mello, Ph.D.: She got her degree in Electrical Engineering at the University of Bologna in 1982, and her Ph.D. in Computer Science in 1988.
Her research activity centers on knowledge representation, logic programming, artificial intelligence and knowledge-based
systems. She was co-organizers of the 3rd International Workshop on Extensions of Logic Programming ELP92, held in Bologna
in February 1992, and of the 6th Italian Congress on Artificial Intelligence, Held in Bologna in September 1999. She is a
member of the Italian Association for Artificial Intelligence (AI*IA), associated with ECCAI. Currently, she is Full Professor at the University of Bologna, where she teaches Artificial Intelligence
and Fondations of Computer Science.
Anna Nanetti: She got a degree in biologics sciences at the University of Bologna in 1974. Currently, she is an Academic Recearcher in
the Microbiology section of the Clinical, Specialist and Experimental Medicine Department of the Faculty of Medicine and Surgery,
University of Bologna. 相似文献
2.
The discovery of dependencies between attributes in databases is an important problem in data mining, and can be applied to facilitate future decision-making. In the present paper some properties of the branching dependencies are examined. We define a minimal branching dependency and we propose an algorithm for finding all minimal branching dependencies between a given set of attributes and a given attribute in a relation of a database. Our examination of the branching dependencies is motivated by their application in a database storing realized sales of products. For example, finding out that arbitrary p products have totally attracted at most q new users can prove to be crucial in supporting the decision making.In addition, we also consider the fractional and the fractional branching dependencies. Some properties of these dependencies are examined. An algorithm for finding all fractional dependencies between a given set of attributes and a given attribute in a database relation is proposed. We examine the general case of an arbitrary relation, as well as a particular case where the problem of discovering the fractional dependencies is considerably simplified. 相似文献
3.
This paper studies the problem of mining frequent itemsets along with their temporal patterns from large transaction sets. A model is proposed in which users define a large set of temporal patterns that are interesting or meaningful to them. A temporal pattern defines the set of time points where the user expects a discovered itemset to be frequent. The model is general in that (i) no constraints are placed on the interesting patterns given by the users, and (ii) two measures—inclusiveness and exclusiveness—are used to capture how well the temporal patterns match the time points given by the discovered itemsets. Intuitively, these measures indicate to what extent a discovered itemset is frequent at time points included in a temporal pattern p, but not at time points not in p. Using these two measures, one is able to model many temporal data mining problems appeared in the literature, as well as those that have not been studied. By exploiting the relationship within and between itemset space and pattern space simultaneously, a series of pruning techniques are developed to speed up the mining process. Experiments show that these pruning techniques allow one to obtain performance benefits up to 100 times over a direct extension of non-temporal data mining algorithms. 相似文献
4.
Fernando Alonso Juan P. Caraa-Valente Angel L. Gonzlez Csar Montes 《Expert systems with applications》2002,23(4)
The medical diagnosis system described here uses underlying knowledge in the isokinetic domain, obtained by combining the expertise of a physician specialised in isokinetic techniques and data mining techniques applied to a set of existing data. An isokinetic machine is basically a physical support on which patients exercise one of their joints, in this case the knee, according to different ranges of movement and at a constant speed. The data on muscle strength supplied by the machine are processed by an expert system that has built-in knowledge elicited from an expert in isokinetics. It cleans and pre-processes the data and conducts an intelligent analysis of the parameters and morphology of the isokinetic curves. Data mining methods based on the discovery of sequential patterns in time series and the fast Fourier transform, which identifies similarities and differences among exercises, were applied to the processed information to characterise injuries and discover reference patterns specific to populations. The results obtained were applied in two environments: one for the blind and another for elite athletes. 相似文献
5.
Ignasi Paredes-Oliva Pere Barlet-Ros Xenofontas Dimitropoulos 《Computer Networks》2013,57(18):3897-3913
Extracting knowledge from big network traffic data is a matter of foremost importance for multiple purposes including trend analysis, network troubleshooting, capacity planning, network forensics, and traffic classification. An extremely useful approach to profile traffic is to extract and display to a network administrator the multi-dimensional hierarchical heavy hitters (HHHs) of a dataset. However, existing schemes for computing HHHs have several limitations: (1) they require significant computational resources; (2) they do not scale to high dimensional data; and (3) they are not easily extensible. In this paper, we introduce a fundamentally new approach for extracting HHHs based on generalized frequent item-set mining (FIM), which allows to process traffic data much more efficiently and scales to much higher dimensional data than present schemes. Based on generalized FIM, we build and thoroughly evaluate a traffic profiling system we call FaRNet. Our comparison with AutoFocus, which is the most related tool of similar nature, shows that FaRNet is up to three orders of magnitude faster. Finally, we describe experiences on how generalized FIM is useful in practice after using FaRNet operationally for several months in the NOC of GÉANT, the European backbone network. 相似文献
6.
Sequential pattern mining is essential in many applications, including computational biology, consumer behavior analysis, web log analysis, etc. Although sequential patterns can tell us what items are frequently to be purchased together and in what order, they cannot provide information about the time span between items for decision support. Previous studies dealing with this problem either set time constraints to restrict the patterns discovered or define time-intervals between two successive items to provide time information. Accordingly, the first approach falls short in providing clear time-interval information while the second cannot discover time-interval information between two non-successive items in a sequential pattern. To provide more time-related knowledge, we define a new variant of time-interval sequential patterns, called multi-time-interval sequential patterns, which can reveal the time-intervals between all pairs of items in a pattern. Accordingly, we develop two efficient algorithms, called the MI-Apriori and MI-PrefixSpan algorithms, to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm, but the MI-Apriori algorithm has better scalability in long sequence data. 相似文献
7.
Kesheng Wang 《Journal of Intelligent Manufacturing》2007,18(4):487-495
Recent advances in computers and manufacturing techniques have made it easy to collect and store all kinds of data in manufacturing
enterprises. The problem of how to enable engineers and managers to understand large amount of data remains. Traditional data
analysis methods are no longer the best alternative to be used. Data Mining (DM) approaches have created new intelligent tools
for extracting useful information and knowledge automatically. All these will have a profound impact on current practices
in manufacturing. In this paper the nature and implications of DM techniques in manufacturing and their implementations on
product design and manufacturing are discussed. 相似文献
8.
国外先进数据挖掘工具的比较分析 总被引:9,自引:0,他引:9
近年来,国外陆续推出了一些先进的数据挖掘工具。国内也在不断地引入这些数据挖掘工具。随着数据挖掘工具的不断涌现,如何选择适合企业自身特定需要的数据挖掘工具,已成为企业引入数据挖掘技术的一大难题。文章在简要概述数据挖掘技术背景的基础上,从企业应用的角度,全面详细地比较分析了当前国外先进的数据挖掘工具。 相似文献
9.
As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users’ privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method. 相似文献
10.
11.
12.
KDD中知识的自动评价系统 总被引:1,自引:0,他引:1
How to automatically evaluate the discovered knowledge is very important in KDD,but the research on this problem is very little. The paper gives one automatic system for evaluating the knowledge, and provides many solutions. First some relative concepts are described,and the construction of this system is given,and uses the case to prove it. 相似文献
13.
Very little research in knowledge discovery has studied how to incorporate statistical methods to automate linear correlation discovery (LCD). We present an automatic LCD methodology that adopts statistical measurement functions to discover correlations from databases’ attributes. Our methodology automatically pairs attribute groups having potential linear correlations, measures the linear correlation of each pair of attribute groups, and confirms the discovered correlation. The methodology is evaluated in two sets of experiments. The results demonstrate the methodology’s ability to facilitate linear correlation discovery for databases with a large amount of data. 相似文献
14.
K. Selçuk Candan Jong Wook Kim Huan Liu Reshma Suvarna 《Knowledge and Information Systems》2006,10(2):185-210
Unprecedented amounts of media data are publicly accessible. However, it is increasingly difficult to integrate relevant media from multiple and diverse sources for effective applications. The functioning of a multimodal integration system requires metadata, such as ontologies, that describe media resources and media components. Such metadata are generally application-dependent and this can cause difficulties when media needs to be shared across application domains. There is a need for a mechanism that can relate the common and uncommon terms and media components. In this paper, we develop an algorithm to mine and automatically discover mappings in hierarchical media data, metadata, and ontologies, using the structural information inherent in these types of data. We evaluate the performance of this algorithm for various parameters using both synthetic and real-world data collections and show that the structure-based mining of relationships provides high degrees of precision. 相似文献
15.
Future trends in data mining 总被引:3,自引:1,他引:3
Hans-Peter Kriegel Karsten M. Borgwardt Peer Kröger Alexey Pryakhin Matthias Schubert Arthur Zimek 《Data mining and knowledge discovery》2007,15(1):87-97
Over recent years data mining has been establishing itself as one of the major disciplines in computer science with growing
industrial impact. Undoubtedly, research in data mining will continue and even increase over coming decades. In this article,
we sketch our vision of the future of data mining. Starting from the classic definition of “data mining”, we elaborate on
topics that — in our opinion — will set trends in data mining. 相似文献
16.
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity. 相似文献
17.
The built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study using both synthetic and real world data indicated that the proposed approach is promising. 相似文献
18.
The integration of data mining techniques with data warehousing is gaining popularity due to the fact that both disciplines complement each other in extracting knowledge from large datasets. However, the majority of approaches focus on applying data mining as a front end technology to mine data warehouses. Surprisingly, little progress has been made in incorporating mining techniques in the design of data warehouses. While methods such as data clustering applied on multidimensional data have been shown to enhance the knowledge discovery process, a number of fundamental issues remain unresolved with respect to the design of multidimensional schema. These relate to automated support for the selection of informative dimension and fact variables in high dimensional and data intensive environments, an activity which may challenge the capabilities of human designers on account of the sheer scale of data volume and variables involved. In this research, we propose a methodology that selects a subset of informative dimension and fact variables from an initial set of candidates. Our experimental results conducted on three real world datasets taken from the UCI machine learning repository show that the knowledge discovered from the schema that we generated was more diverse and informative than the standard approach of mining the original data without the use of our multidimensional structure imposed on it. 相似文献
19.
Metaquery (metapattern) is a data mining tool which is useful for learning rules involving more than one relation in the database. The notion of a metaquery has been proposed as a template or a second-order proposition in a language that describes the type of pattern to be discovered. This tool has already been successfully applied to several real-world applications.In this paper we advance the state of the art in metaquery research in several ways. First, we argue that the notion of a support value for metaqueries, where a support value is intuitively some indication to the relevance of the rules to be discovered, is not adequately defined in the literature, and, hence, propose our own definition. Second, we analyze some of the related computational problems, classify them as NP-hard and point out some tractable cases. Third, we propose some efficient algorithms for computing support and present preliminary experimental results that indicate the usefulness of our algorithms. 相似文献
20.
智能型数据挖掘工具的设计与实现 总被引:7,自引:1,他引:7
数据挖掘技术具有广阔的应用前景,研发具有独立知识产权的数据挖掘系统很有必要。SmartMiner是以在数据挖掘算法研究和专家系统研究中取得的成果为基础,提出的智能型数据挖掘工具。SmartMiner提出了挖掘作业描述语言MDL和挖掘任务模型脚本语言,设计了挖掘向导、可视化向导和挖掘任务模型,集成了数据仓库管理功能,挖掘引擎具有智能性,体系结构开放并可扩展。SmartMiner已经集成到一个大型商业连锁企业的经营决策系统中。 相似文献