首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Recent research has shown that association rules are useful in gene expression data analysis. Interestingness measure plays an important role in the association rule mining on small sample size, high dimensionality, and noisy gene expression data. This work introduces two interestingness measures by exploring prior knowledge contained in open biological databases. They are Max-Pathway-Distance (MaxPD), which explores the gene’s relativity in Kyoto encyclopedia of genes and genomes Pathway, and Max-Chromosomal-Distance (MaxCD), which makes use of the distance among genes in the chromosome. The properties of our proposed interestingness measures are also explored to mine the interesting rules efficiently. Experimental results on four real-life gene expression datasets show the effectiveness of MaxPD and MaxCD in both classification accuracy and biological interpretability.  相似文献   

3.
A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.  相似文献   

4.
5.
The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.  相似文献   

6.
The paper derives a framework suitable to discuss the classical Koopmans-Levin (KL) and maximum likelihood (ML) algorithms to estimate parameters of errors-in-variables linear models in a unified way. Using the capability of the unified approach a new parameter estimation algorithm is presented offering flexibility to ensure acceptable variance in the estimated parameters. The developed algorithm is based on the application of Hankel matrices of variable size and can equally be considered as a generalized version of the KL method (GKL) or as a reduced version of the ML estimation. The methodology applied to derive the GKL algorithm is used to present a straightforward derivation of the subspace identification algorithm.  相似文献   

7.
Because clinical research is carried out in complex environments, prior domain knowledge, constraints, and expert knowledge can enhance the capabilities and performance of data mining. In this paper we propose an unexpected pattern mining model that uses decision trees to compare recovery rates of two different treatments, and to find patterns that contrast with the prior knowledge of domain users. In the proposed model we define interestingness measures to determine whether the patterns found are interesting to the domain. By applying the concept of domain-driven data mining, we repeatedly utilize decision trees and interestingness measures in a closed-loop, in-depth mining process to find unexpected and interesting patterns. We use retrospective data from transvaginal ultrasound-guided aspirations to show that the proposed model can successfully compare different treatments using a decision tree, which is a new usage of that tool. We believe that unexpected, interesting patterns may provide clinical researchers with different perspectives for future research.  相似文献   

8.
9.
Measures of interestingness play a crucial role in association rule mining. An important methodological problem, on which several papers appeared in the literature, is to provide a reasonable classification of the measures. In this paper, we explore Boolean factor analysis, which uses formal concepts corresponding to classes of measures as factors, for the purpose of clustering of the measures. Unlike the existing studies, our method reveals overlapping clusters of interestingness measures. We argue that the overlap between clusters is a desired feature of natural groupings of measures and that because formal concepts are used as factors in Boolean factor analysis, the resulting clusters have a clear meaning and are easy to interpret. We conduct three case studies on clustering of measures, provide interpretations of the resulting clusters and compare the results to those of the previous approaches reported in the literature.  相似文献   

10.
A spatial co-location pattern represents relationships between spatial features that are frequently located in close proximity to one another. Such a pattern is one of the most important concepts for geographic context awareness of ubiquitous Geographic Information System (GIS). We constructed a framework for co-location pattern mining using the transaction-based approach, which employs maximal cliques as a transaction-type dataset; we first define transaction-type data and verify that the definition satisfies the requirements, and we also propose an efficient way to generate all transaction-type data. The constructed framework can play a role as a theoretical methodology of co-location pattern mining, which supports geographic context awareness of ubiquitous GIS.  相似文献   

11.
12.
Graph-based induction as a unified learning framework   总被引:6,自引:0,他引:6  
We describe a graph-based induction algorithm that extracts typical patterns from colored digraphs. The method is shown to be capable of solving a variety of learning problems by mapping the different learning problems into colored digraphs. The generality and scope of this method can be attributed to the expressiveness of the colored digraph representation, which allows a number of different learning problems to be solved by a single algorithm. We demonstrate the application of our method to two seemingly different learning tasks: inductive learning of classification rules, and learning macro rules for speeding up inference. We also show that the uniform treatment of these two learning tasks enables our method to solve complex learning problems such as the construction of hierarchical knowledge bases.  相似文献   

13.
Data mining in soft computing framework: a survey   总被引:19,自引:0,他引:19  
The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.  相似文献   

14.
To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric positive-definite perturbation matrix with minimal condition number can be identified, substantially enhancing the accuracy even under strict privacy requirements. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal reduction in accuracy. The quantitative utility of FRAPP, which is a general-purpose random-perturbation-based privacy-preserving mining technique, is evaluated specifically with regard to association and classification rule mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, either substantially lower modeling errors are incurred as compared to the prior techniques, or the errors are comparable to those of direct mining on the true database. A partial and preliminary version of this paper appeared in the Proc. of the 21st IEEE Intl. Conf. on Data Engineering (ICDE), Tokyo, Japan, 2005, pgs. 193–204.  相似文献   

15.
Structural and Multidisciplinary Optimization - Reliability-based design optimization (RBDO) is an active field of research with an ever increasing number of contributions. Numerous methods have...  相似文献   

16.
Manifold elastic net: a unified framework for sparse dimension reduction   总被引:4,自引:0,他引:4  
It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al., Ann Stat 32(2):407–499, 2004), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: (1) the local geometry of samples is well preserved for low dimensional data representation, (2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, (3) the projection matrix of MEN improves the parsimony in computation, (4) the elastic net penalty reduces the over-fitting problem, and (5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.  相似文献   

17.
This work proposes an object-oriented unified optimization framework (UOF) for general problem optimization. Based on biological inspired techniques, numerical deterministic methods, and C++ objective design, the UOF itself has significant potential to perform optimization operations on various problems. The UOF provides basic interfaces to define a general problem and generic solver, enabling these two different research fields to be bridged. The components of the UOF can be separated into problem and solver components. These two parts work independently allowing high-level code to be reused, and rapidly adapted to new problems and solvers. The UOF is customized to deal with several optimization problems. The first experiment involves a well-known discrete combinational problem, wihle the second one studies the robustness for the reverse modeling problem, which is in high demanded by device manufacturing companies. Additionally, experiments are undertaken to determine the capability of the proposed methods in both analog and digital circuit design automation. The final experiment designs antenna for rapidly growing wireless communication. Most experiments are categorized as simulation-based optimization tasks in the microelectronics industry. The results confirm that UOF has excellent flexibility and extensibility to solve these problems successfully. The developed open-source project is publicly available.  相似文献   

18.
Recent rapid advances in Information and Communication Technologies (ICTs) have highlighted the rising importance of the Business Model (BM) concept in the field of Information Systems (IS). Despite agreement on its importance to an organization's success, the concept is still fuzzy and vague, and there is little consensus regarding its compositional facets. Identifying the fundamental concepts, modeling principles, practical functions, and reach of the BM relevant to IS and other business concepts is by no means complete. This paper, following a comprehensive review of the literature, principally employs the content analysis method and utilizes a deductive reasoning approach to provide a hierarchical taxonomy of the BM concepts from which to develop a more comprehensive framework. This framework comprises four fundamental aspects. First, it identifies four primary BM dimensions along with their constituent elements forming a complete ontological structure of the concept. Second, it cohesively organizes the BM modeling principles, that is, guidelines and features. Third, it explains the reach of the concept showing its interactions and intersections with strategy, business processes, and IS so as to place the BM within the world of digital business. Finally, the framework explores three major functions of BMs within digital organizations to shed light on the practical significance of the concept. Hence, this paper links the BM facets in a novel manner offering an intact definition. In doing so, this paper provides a unified conceptual framework for the BM concept that we argue is comprehensive and appropriate to the complex nature of businesses today. This leads to fruitful implications for theory and practice and also enables us to suggest a research agenda using our conceptual framework.  相似文献   

19.
Sequential pattern mining is an important data mining problem with broad applications. However,it is also a challenging problem since the mining may have to generate or examine a combinatorially explosivenumber of intermediate subsequences. Recent studies have developed two major classes of sequential patternmining methods: (1) a candidate generation-and-test approach, represented by (i) GSP, a horizontal format-basedsequential pattern mining method, and (ii) SPADE, a vertical format-based method; and (2) a pattern-growthmethod, represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns. In this study, we perform a systematic introduction and presentation of the pattern-growth methodologyand study its principles and extensions. We first introduce two interesting pattern-growth algorithms, FreeSpanand PrefixSpan, for efficient sequential pattern mining. Then we introduce gSpan for mining structured patternsusing the same methodology. Their relative performance in l  相似文献   

20.
Distributed and Parallel Databases - While the problems of finding the shortest path and k-shortest paths have been extensively researched, the research community has been shifting its focus...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号