首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a novel framework for generating classification rules from relational data. This is a specialized version of the general framework intended for mining relational data and is defined in granular computing theory. In the framework proposed in this paper we define a method for deriving information granules from relational data. Such granules are the basis for generating relational classification rules. In our approach we follow the granular computing idea of switching between different levels of granularity of the universe. Thanks to this a granule-based relational data representation can easily be replaced by another one and thereby adjusted to a given data mining task, e.g. classification. A generalized relational data representation, as defined in the framework, can be treated as the search space for generating rules. On account of this the size of the search space may significantly be limited. Furthermore, our framework, unlike others, unifies not only the way the data and rules to be derived are expressed and specified, but also partially the process of generating rules from the data. Namely, the rules can be directly obtained from the information granules or constructed based on them.  相似文献   

2.
We address the problem of deriving lower and upper bounds for the cardinality of the projections of a database relation, given a set of functional dependencies on the relation schema and measures of the cardinalities of the attributes in the schema. It is shown that deciding whether a given number is the least upper bound of a projection cardinality is an NP-complete problem, whereas determining whether the greatest lower bound and the least upper bound coincide can be easily solved in linear time.  相似文献   

3.
Our aim is to design a pattern classifier using fuzzy relational calculus (FRC) which was initially proposed by Pedrycz (1990). In the course of doing so, we first consider a particular interpretation of the multidimensional fuzzy implication (MFI) to represent our knowledge about the training data set. Subsequently, we introduce the notion of a fuzzy pattern vector to represent a population of training patterns in the pattern space and to denote the antecedent part of the said particular interpretation of the MFI. We introduce a new approach to the computation of the derivative of the fuzzy max-function and min-function using the concept of a generalized function. During the construction of the classifier based on FRC, we use fuzzy linguistic statements (or fuzzy membership function to represent the linguistic statement) to represent the values of features (e.g., feature F/sub 1/ is small and F/sub 2/ is big) for a population of patterns. Note that the construction of the classifier essentially depends on the estimate of a fuzzy relation /spl Rfr/ between the input (fuzzy set) and output (fuzzy set) of the classifier. Once the classifier is constructed, the nonfuzzy features of a pattern can be classified. At the time of classification of the nonfuzzy features of the testpatterns, we use the concept of fuzzy masking to fuzzify the nonfuzzy feature values of the testpatterns. The performance of the proposed scheme is tested on synthetic data. Finally, we use the proposed scheme for the vowel classification problem of an Indian language.  相似文献   

4.
Before implementing a pattern recognition algorithm, a rational step is to estimate its validity by bounding the probability of error. The ability to make such an estimate impacts crucially on the satisfactoriness of the particular features used, on the number of samples required to train and test the system and on the overall paradigm. This study develops statistical upper and lower bounds for estimating the probability of error, in the one-dimensional case. The bounds are distribution-free except for requiring the existence of the relevant statistics and can be evaluated easily by hand or by computer. Many of the results are also applicable to other problems involving the estimation of an arbitrary distribution of a random variable. Some multidimensional generalizations may be feasible.  相似文献   

5.
In this paper, we investigate the generalization performance of the multi-graph regularized semi-supervised classification algorithm associated with the hinge loss. We provide estimates for the excess misclassification error of multi-graph regularized classifiers and show the relations between the generalization performance and the structural invariants of data graphs. Experiments performed on real database demonstrate the effectiveness of our theoretical analysis.  相似文献   

6.
A number of earlier studies that have attempted a theoretical analysis of majority voting assume independence of the classifiers. We formulate the majority voting problem as an optimization problem with linear constraints. No assumptions on the independence of classifiers are made. For a binary classification problem, given the accuracies of the classifiers in the team, the theoretical upper and lower bounds for performance obtained by combining them through majority voting are shown to be solutions of the corresponding optimization problem. The objective function of the optimization problem is nonlinear in the case of an even number of classifiers when rejection is allowed, for the other cases the objective function is linear and hence the problem is a linear program (LP). Using the framework we provide some insights and investigate the relationship between two candidate classifier diversity measures and majority voting performance.  相似文献   

7.
8.
The inherent uncertainty and incomplete information of the software development process presents particular challenges for identifying fault-prone modules and providing a preferred model early enough in a development cycle in order to guide software enhancement efforts effectively. Grey relational analysis (GRA) of grey system theory is a well known approach that is utilized for generalizing estimates under small sample and uncertain conditions. This paper examines the potential benefits for providing an early software-quality classification based on improved grey relational classifier. The particle swarm optimization (PSO) approach is adopted to explore the best fit of weights on software metrics in the GRA approach for deriving a classifier with preferred balance of misclassification rates. We have demonstrated our approach by using the data from the medical information system dataset. Empirical results show that the proposed approach provides a preferred balance of misclassification rates than the grey relational classifiers without using PSO. It also outperforms the widely used classifiers of classification and regression trees (CART) and C4.5 approaches.  相似文献   

9.
Transduction is an inference mechanism adopted from several classification algorithms capable of exploiting both labeled and unlabeled data and making the prediction for the given set of unlabeled data only. Several transductive learning methods have been proposed in the literature to learn transductive classifiers from examples represented as rows of a classical double-entry table (or relational table). In this work we consider the case of examples represented as a set of multiple tables of a relational database and we propose a new relational classification algorithm, named TRANSC, that works in a transductive setting and employs a probabilistic approach to classification. Knowledge on the data model, i.e., foreign keys, is used to guide the search process. The transductive learning strategy iterates on a k-NN based re-classification of labeled and unlabeled examples, in order to identify borderline examples, and uses the relational probabilistic classifier Mr-SBC to bootstrap the transductive algorithm. Experimental results confirm that TRANSC outperforms its inductive counterpart (Mr-SBC).  相似文献   

10.
This paper aims to propose a fuzzy classifier, which is a one-class-in-one-network structure consisting of multiple novel single-layer perceptrons. Since the output value of each single-layer perceptron can be interpreted as the overall grade of the relationship between the input pattern and one class, the degree of relationship between an attribute of the input pattern and that of this class can be taken into account by establishing a representative pattern for each class. A feature of this paper is that it employs the grey relational analysis to compute the grades of relationship for individual attributes. In particular, instead of using the sigmoid function as the activation function, a non-additive technique, the Choquet integral, is used as an activation function to synthesize the performance values, since an assumption of noninteraction among attributes may not be reasonable. Thus, a single-layer perceptron in the proposed structure performs the synthetic evaluation of the Choquet integral-based grey relational analysis for a pattern. Each connection weight is interpreted as a degree of importance of an attribute and can be determined by a genetic algorithm-based method. The experimental results further demonstrate that the test results of the proposed fuzzy classifier are better than or comparable to those of other fuzzy or non-fuzzy classification methods.  相似文献   

11.
Csáji  Balázs Cs.  Kis  Krisztián B. 《Machine Learning》2019,108(8-9):1677-1699
Machine Learning - We propose a data-driven approach to quantify the uncertainty of models constructed by kernel methods. Our approach minimizes the needed distributional assumptions, hence,...  相似文献   

12.
提出了一种基于灰关联测度的分裂式层次聚类算法来实现雷达辐射源信号的盲分类.目前广泛使用的许多聚类算法中都需要预先确定类的数目,该算法能很好的解决这个问题.通过提取雷达辐射源信号在频域内的小波系数作为聚类的样本空间,用灰关联测度来衡量数据样本之间的相似程度,采用自顶向下基于密度扩展的分裂式层次化聚类策略,生成不同层次的划分,然后根据提出的聚类有效性指标估计类的数目.仿真实验结果表明,该算法能够获得较好的分类结果.  相似文献   

13.
Estimating the global data distribution in large-scale networks is an important issue and yet to be well addressed. It can benefit many applications, especially in the cloud computing era, such as load balancing analysis, query processing, and data mining. Inspired by the inversion method for random variate (number) generation, in this paper, we present a novel model called distribution-free data density estimation for large ring-based networks to achieve high estimation accuracy with low estimation cost regardless of the distribution models of the underlying data. This model generates random samples for any arbitrary distribution by sampling the global cumulative distribution function and is free from sampling bias. Armed with this estimation method, we can estimate data densities over both one-dimensional and multidimensional tuple sets, where each dimension could be either continuous or discrete as its domain. In large-scale networks, the key idea for distribution-free estimation is to sample a small subset of peers for estimating the global data distribution over the data domain. Algorithms on computing and sampling the global cumulative distribution function based on which the global data distribution is estimated are introduced with a detailed theoretical analysis. Our extensive performance study confirms the effectiveness and efficiency of our methods in large ring-based networks.  相似文献   

14.
Query languages for relational multidatabases   总被引:2,自引:0,他引:2  
With the existence of many autonomous databases widely accessible through computer networks, users will require the capability to jointly manipulate data in different databases. A multidatabase system provides such a capability through a multidatabase manipulation language, such as MSQL. We propose a theoretical foundation for such languages by presenting a multirelational algebra and calculus based on the relational algebra and calculus. The proposal is illustrated by various queries on an example multidatabase. It is shown that properties of the multirelational algebra may be used for optimization and that every multirelational algebra query can be expressed as a multirelational calculus query. The connection between the multirelational languages and MSQL, the multidatabase version of SQL, is also investigated.  相似文献   

15.
Rights protection for relational data   总被引:1,自引:0,他引:1  
we introduce a solution for relational database content rights protection through watermarking. Rights protection for relational data is of ever-increasing interest, especially considering areas where sensitive, valuable content is to be outsourced. A good example is a data mining application, where data is sold in pieces to parties specialized in mining it. Different avenues are available, each with its own advantages and drawbacks. Enforcement by legal means is usually ineffective in preventing theft of copyrighted works, unless augmented by a digital counterpart, for example, watermarking. While being able to handle higher level semantic constraints, such as classification preservation, our solution also addresses important attacks, such as subset selection and random and linear data changes. We introduce wmdb., a proof-of-concept implementation and its application to real-life data, namely, in watermarking the outsourced Wal-Mart sales data that we have available at our institute.  相似文献   

16.
Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditional distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, comes at the expense of a more complex model-selection problem: an unbounded number of relational abstraction levels might need to be explored. Whereas current learning approaches for RDNs learn a single probability tree per random variable, we propose to turn the problem into a series of relational function-approximation problems using gradient-based boosting. In doing so, one can easily induce highly complex features over several iterations and in turn estimate quickly a very expressive model. Our experimental results in several different data sets show that this boosting method results in efficient learning of RDNs when compared to state-of-the-art statistical relational learning approaches.  相似文献   

17.
In this paper, we consider two kinds of sequential checkpoint placement problems with infinite/finite time horizon. For these problems, we apply approximation methods based on the variational principle and develop computation algorithms to derive the optimal checkpoint sequence approximately. Next, we focus on the situation where the knowledge on system failure is incomplete, i.e., the system failure time distribution is unknown. We develop the so-called min-max checkpoint placement methods to determine the optimal checkpoint sequence under an uncertain circumstance in terms of the system failure time distribution. In numerical examples, we investigate quantitatively the proposed distribution-free checkpoint placement methods, and refer to their potential applicability in practice.  相似文献   

18.
Charles Bachman's great idea of navigation in a data base is applied to the relational data model. This idea, understood as a way of user thinking, is formulated in such a manner, that no physical concepts are introduced to the user's awareness. Thus the demanded level of data independence is not reduced.A concept of “navigational statement” is defined. The navigational statements may be used to improve other relational languages, for example SEQUEL. It is shown that navigational statements simplify the grammar structure of language expressions and make them shorter and more readable. Elliptic (incomplete) navigational statements are also defined. Such statements may be regarded as a good tool for the casual user.This paper may be viewed as a (not necessarily complete) review of possibilities related to the idea of navigation in a relational data base.  相似文献   

19.
A robustification procedure for LQ state feedback design is presented. Such a procedure consists of choosing the state and input weighting matrices according to the kind of uncertainties on the system. Both structured and norm-bounded additive uncertainties are addressed, and upper bounds for the uncertainties that do not destabilize the closed-loop system are presented. Connections with the quadratic stabilizability problem are established  相似文献   

20.
This paper uses theory and experiment to help explain why state assignment algorithms which use two-level-based cost measures often give good multi-level logic implementations. First, we develop theorems that give conditions under which an input encoding that results in multi-cube functions in the minimized Boolean network can be re-encoded to change the multi-cube functions into smaller functions to produce a smaller network. Second, we measure the properties of some typical finite-state machines to determine how well they fit the requirements of the theorems. The good fit between the requirements of the theorems and the properties of typical state machines helps explain why state assignment algorithms designed for two-level-logic implementations are relatively successful in designing state assignments for multi-level logic.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号