首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a variant of Haar-like feature used in Viola and Jones detection framework,called scattered rectangle feature,based on the common-component analysis of local region feature. Three common components,feature filter,feature structure and feature form,are extracted without concern-ing the details of the studied region features,which cast a new light on region feature design for spe-cific applications and requirements: modifying some component(s) of a feature for an improved one or combining different components of existing features for a new favorable one. Scattered rectangle feature follows the former way,extending the feature structure component of Haar-like feature out of the restriction of the geometry adjacency rule,which results in a richer representation that explores much more orientations other than horizontal,vertical and diagonal,as well as misaligned,detached and non-rectangle shape information that is unreachable to Haar-like feature. The training result of the two face detectors in the experiments illustrates the benefits of scattered rectangle feature empirically; the comparison of the ROC curves under a rigid and objective detection criterion on MIT CMU upright face test set shows that the cascade based on scattered rectangle features outperforms that based on Haar-like features.  相似文献   

2.
Update management is very important for data integration systems. So update management in peer data management systems (PDMSs) is a hot research area. This paper researches on view maintenance in PDMSs. First, the definition of view is extended and the peer view, local view and global view are proposed according to the requirements of applications. There are two main factors to influence materialized views in PDMSs. One is that schema mappings between peers are changed, and the other is that peers update their data. Based on the requirements, this paper proposes an algorithm called 2DCMA, which includes two sub-algorithms: data and definition consistency maintenance algorithm% to effectively maintain views. For data consistency maintenance, Mork's rules are extended for governing the use of updategrams and boosters. The new rule system can be used to optimize the execution plan. And are extended for the data consistency maintenance algorithm is based on the new rule system. Furthermore, an ECA rule is adopted for definition consistency maintenance. Finally, extensive simulation experiments are conducted in SPDMS. The simulation results show that the 2DCMA algorithm has better performance than that of Mork's when maintaining data consistency. And the 2DCMA algorithm has better performance than that of centralized view maintenance algorithm when maintaining definition consistency.  相似文献   

3.
In machine learning and statistics, classification is the a new observation belongs, on the basis of a training set of data problem of identifying to which of a set of categories (sub-populations) containing observations (or instances) whose category membership is known. SVM (support vector machines) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes fon~as the output, making it a non-probabilistic binary linear classifier. In pattern recognition problem, the selection of the features used for characterization an object to be classified is importance. Kernel methods are algorithms that, by replacing the inner product with an appropriate positive definite function, impticitly perform a nonlinear mapping 4~ of the input data in Rainto a high-dimensional feature space H. Cover's theorem states that if the transformation is nonlinear and the dimensionality of the feature space is high enough, then the input space may be transformed into a new feature space where the patterns are linearly separable with high probability.  相似文献   

4.
The information content of rules and rule sets and its application   总被引:1,自引:1,他引:0  
The information content of rules is categorized into inner mutual information content and outer impartation information content. Actually, the conventional objective interestingness measures based on information theory are all inner mutual information, which represent the confidence of rules and the mutual information between the antecedent and consequent. Moreover, almost all of these measures lose sight of the outer impartation information, which is conveyed to the user and help the user to make decisions. We put forward the viewpoint that the outer impartation information content of rules and rule sets can be represented by the relations from input universe to output universe. By binary relations, the interaction of rules in a rule set can be easily represented by operators: union and intersection. Based on the entropy of relations, the outer impartation information content of rules and rule sets are well measured. Then, the conditional information content of rules and rule sets, the independence of rules and rule sets and the inconsistent knowledge of rule sets are defined and measured. The properties of these new measures are discussed and some interesting results are proven, such as the information content of a rule set may be bigger than the sum of the information content of rules in the rule set, and the conditional information content of rules may be negative. At last, the applications of these new measures are discussed. The new method for the appraisement of rule mining algorithm, and two rule pruning algorithms, λ-choice and RPClC, are put forward. These new methods and algorithms have predominance in satisfying the need of more efficient decision information.  相似文献   

5.
In this paper,we investigate a new problem–misleading classification in which each test instance is associated with an original class and a misleading class.Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes.We discuss two cases of misleading classification.For the case where the classification algorithm is unknown to the data owner,a KNN based Ranking Algorithm(KRA)is proposed to rank all candidate instances based on the similarities between candidate instances and test instances.For the case where the classification algorithm is known,we propose a Greedy Ranking Algorithm(GRA)which evaluates each candidate instance by building up a classifier to predict the test set.In addition,we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm.Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates.When the classification algorithm is known,GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.  相似文献   

6.
Ji Rong  Li 《通讯和计算机》2013,(5):720-723
Optimal fuzzy-valued feature subset selection is a technique for fuzzy-valued feature subset selection. By viewing the imprecise feature values as fuzzy sets, the information it contains would not be lost compared with the traditional methods. The performance of classification depends directly on the quality of training corpus. In practical applications, noise examples are unavoidable in the training corpus and thus influence the effect of the classification approach. This paper presents an algorithm for eliminating the class noise based on the analysis of the representative class information of the examples. The representative class information can be acquired by mining the most classification ambiguity of feature values. The proposed algorithm is applied to fuzzy decision tree induction. The experimental results show that the algorithm can effectively reduce the introduction of noise examples and raise the accuracy of classification on the data sets with a high noise ratio.  相似文献   

7.
一种基于决策表的分类规则挖掘新算法   总被引:2,自引:0,他引:2  
The mining of classification rules is an important field in Data Mining. Decision table of rough sets theory is an efficient tool for mining classification rules. The elementary concepts corresponding to decision table of Rough Sets Theory are introduced in this paper. A new algorithm for mining classification rules based on Decision Table is presented, along with a discernable function in reduction of attribute values, and a new principle for accuracy of rules. An example of its application to the car‘s classification problem is included, and the accuracy of rules discovered is analyzed. The potential fields for its application in data mining are also discussed.  相似文献   

8.
9.
In this paper, a channel selection rule for YASS (Yet-Another-Secure-Steganography) is proposed. Secret message embedding imposes distortion to the cover image. The larger the distortion, the less secure the steganographic algorithm. Our channel selection rule engages in minimizing this distortion brought in by YASS. In our rule, the distortion caused by unit change on each quantized DCT (Discrete Cosine Transformation) component is computed, and the components with smaller unit change distortion are selected with higher priority. This channel selection rule reduces distortion to the medium spatial domain image and the final JPEO image. Experimental results show that our improved YASS scheme outperforms original YASS scheme on the aspects of both perception and statistics. This new channel selection rule can also be combined with other enhancements in YASS framework to further boost the performance.  相似文献   

10.
Relationship Between Support Vector Set and Kernel Functions in SVM   总被引:15,自引:0,他引:15       下载免费PDF全文
Based on a constructive learning approach,covering algorithms,we investigate the relationship between support vector sets and kernel functions in support vector machines (SVM).An interesting result is obtained.That is,in the linearly non-separable case,any sample of a given sample set K can become a support vector under a certain kernel function.The result shows that when the sample set K is linearly non-separable,although the chosen kernel function satisfies Mercer‘s condition its corresponding support vector set is not necessarily the subset of K that plays a crucial role in classifying K.For a given sample set,what is the subset that plays the crucial role in classification?In order to explore the problem,a new concept,boundary or boundary points,is defined and its properties are discussed.Given a sample set K,we show that the decision functions for classifying the boundary points of K are the same as that for classifying the K itself.And the boundary points of K only depend on K and the structure of the space at which k is located and independent of the chosen approach for finding the boundary.Therefore,the boundary point set may become the subset of K that plays a crucial role in classification.These results are of importance to understand the principle of the support vector machine(SVM) and to develop new learning algorithms.  相似文献   

11.
In this paper,an improved algorithm is proposed for unconstrained global optimization to tackle non-convex nonlinear multivariate polynomial programming problems.The proposed algorithm is based on the Bernstein polynomial approach.Novel features of the proposed algorithm are that it uses a new rule for the selection of the subdivision point,modified rules for the selection of the subdivision direction,and a new acceleration device to avoid some unnecessary subdivisions.The performance of the proposed algorithm is numerically tested on a collection of 16 test problems.The results of the tests show the proposed algorithm to be superior to the existing Bernstein algorithm in terms of the chosen performance metrics.  相似文献   

12.
Reduction of attributes is one of important topics in the research on rough set theory.Wong S K M and Ziarko W have proved that finding the minimal attribute reduction of decision table is a NP-hard problem.Algorithm A (the improved algorithm to Jelonek) choices optimal candidate attribute by using approximation quality of single attribute,it improves efficiency of attribute reduction,but yet exists the main drawback that the single atribute having maximum approxiamtion quality is probably optimal candidate attribute.Therefore,in this paper, we introduce the concept of compatible decision rule,and propose an attribute reduction algorithm based on rules (ARABR).Algorithm ARABR provides a new method that measures the relevance between extending attribute and the set of present attributes,the method assures that the optimal attribute is extended,and obviously reduces the search space.Theory analysis shows that algorithm ARABR is of lower computational complexity than Jelonek's algorithm,and overcomes effectively the main drawback of algorithm A.  相似文献   

13.
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.  相似文献   

14.
In this paper,a new medical image classification scheme is proposed using selforganizing map(SOM)combined with multiscale technique.It addresses the problem of the handling of edge pixels in the traditional multiscale SOM classifiers.First,to solve the difficulty in manual selection of edge pixels,a multiscale edge detection algorithm based on wavelet transform is proposed.Edge pixels detected are then selected into the training set as a new class and a multiscale SOM classifier is trained using this training set.In this new scheme,the SOM classifier can perform both the classification on the entire image and the edge detection simultaneously.On the other hand,the misclassification of the traditional multiscale SOM classifier in regions near edges is graeatly reduced and the correct classification is improved at the same time.  相似文献   

15.
We propose a systematic ECG quality classification method based on a kernel support vector machine(KSVM) and genetic algorithm(GA) to determine whether ECGs collected via mobile phone are acceptable or not. This method includes mainly three modules, i.e., lead-fall detection, feature extraction, and intelligent classification. First, lead-fall detection is executed to make the initial classification. Then the power spectrum, baseline drifts, amplitude difference, and other time-domain features for ECGs are analyzed and quantified to form the feature matrix. Finally, the feature matrix is assessed using KSVM and GA to determine the ECG quality classification results. A Gaussian radial basis function(GRBF) is employed as the kernel function of KSVM and its performance is compared with that of the Mexican hat wavelet function(MHWF). GA is used to determine the optimal parameters of the KSVM classifier and its performance is compared with that of the grid search(GS) method. The performance of the proposed method was tested on a database from PhysioNet/Computing in Cardiology Challenge 2011, which includes 1500 12-lead ECG recordings. True positive(TP), false positive(FP), and classification accuracy were used as the assessment indices. For training database set A(1000 recordings), the optimal results were obtained using the combination of lead-fall, GA, and GRBF methods, and the corresponding results were: TP 92.89%, FP 5.68%, and classification accuracy 94.00%. For test database set B(500 recordings), the optimal results were also obtained using the combination of lead-fall, GA, and GRBF methods, and the classification accuracy was 91.80%.  相似文献   

16.
The aim of this paper is to propose a new algorithm for multilevel stabilization of large scale systems. In two-level stabilization method, a set of local stabilizers for the individual subsystems in a completely decentralized environment is designed. The solution of the control problem involves designing of a global controller on a higher hierarchical level that provides corrective signals to account for interconnections effect. The principle feature of this paper is to reduce conservativeness in global controller design. Here, the key point is to reduce the effect of interactions instead of neutralizing them. In fact, unlike prior methods, our idea does not ignore the possible beneficial aspects of the interactions and does not try to neutralize them.  相似文献   

17.
One of the obstacles of the efficient association rule mining is the explosive expansion of data sets since it is costly or impossible to scan large databases, esp., for multiple times. A popular solution to improve the speed and scalability of the association rule mining is to do the algorithm on a random sample instead of the entire database. But how to effectively define and efficiently estimate the degree of error with respect to the outcome of the algorithm, and how to determine the sample size needed are entangling researches until now. In this paper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct) learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast sampling strategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) and Shannon sampling theorem, for quickly obtaining acceptably approximate association rules at appropriate sample size. Both theoretical analysis and empirical study have showed that the sampling strategy can achieve a very good speed-accuracy trade-off.  相似文献   

18.
In this paper,a new covering algorithm called FCV1 is presented.FCV1 comprises two algorithms,one of which is able to fast search for a partial rule and exclude the large portion of neggative examples,the other algorithm incorporates the more optimized greedy set-covering algorithm,and runs on a small portion of training examples.Hence,the training process of FCV1 is much faster than that of AQ15.  相似文献   

19.
Given a hypergraph,this paper provides three algorithms for finding all its minimal cutsets,minimal link cutsets and the least cutsets.The result not only set up a new studying field on cutsets of hypergraph,but also lay a foundation of analyzing the performance of multibus systems.The algorithm for determining all the least cutsets in a hypergaph is polynomial complex and more efficient than that in [2].  相似文献   

20.
This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号