期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Concept learning by fuzzy k-NN classification and relevance feedback for efficient image retrieval

Hossein Nezamabadi-pour Ehsanollah Kabir 《Expert systems with applications》2009,36(3):5948-5954

A new method for combining visual and semantic features in image retrieval is presented. A fuzzy k-NN classifier assigns initial semantic labels to database images. These labels are gradually modified by relevance feedbacks from the users. Experimental results on a database of 1000 images from 10 semantic groups are reported. 相似文献

2.

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

Mohammad Javad Hosseini Ameneh Gholipour Hamid Beigy 《Knowledge and Information Systems》2016,46(3):567-597

相似文献

3.

Complexity reduction in efficient prototype-based classification

Francesc J. Ferri Author Vitae Author Vitae Filiberto Pla Author Vitae 《Pattern recognition》2006,39(2):161-163

相似文献

4.

The k-NN classifier and self-adaptive Hotelling data reduction technique in handwritten signatures recognition

Piotr Porwik Rafal Doroz Tomasz Orczyk 《Pattern Analysis & Applications》2015,18(4):983-1001

相似文献

5.

HSVNN: an efficient medical data classification using dimensionality reduction combined with hybrid support vector neural network

Prakash P. N. Senthil Rajkumar N. 《The Journal of supercomputing》2022,78(13):15439-15462

Early diagnosis and therapy are the most essential strategies to prevent deaths from diseases, such as cancer, brain tumors, and heart diseases. In this regard, information mining and artificial intelligence approaches have been valuable tools for providing useful data for early diagnosis. However, high-dimensional data can be challenging to examine, practically difficult to visualize, and costly to measure and store. Transferring a high-dimensional portrayal of the data to a lower-dimensional one without losing important information is the focal issue of dimensionality reduction. Therefore, in this study, dimensionality reduction-based medical data classification is presented. The proposed methodology consists of three modules: pre-processing, dimension reduction using an adaptive artificial flora (AAF) algorithm, and classification. The important features are selected using the AAF algorithm to reduce the dimension of the input data. From the results, a dimension-reduced dataset is obtained. The reduced data are then fed as input to the hybrid classifier. A hybrid support vector neural network is proposed for classification. Finally, the effectiveness of the proposed method is analyzed in terms of different metrics, namely accuracy, sensitivity, and specificity. The proposed method is implemented in MATLAB.

相似文献

6.

Scalable and efficient multi-label classification for evolving data streams

Jesse Read Albert Bifet Geoff Holmes Bernhard Pfahringer 《Machine Learning》2012,88(1-2):243-272

Many challenging real world problems involve multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as classifiers must be able to deal with huge numbers of examples and to adapt to change using limited time and memory while being ready to predict at any point. This paper proposes a new experimental framework for learning and evaluating on multi-label data streams, and uses it to study the performance of various methods. From this study, we develop a multi-label Hoeffding tree with multi-label classifiers at the leaves. We show empirically that this method is well suited to this challenging task. Using our new framework, which allows us to generate realistic multi-label data streams with concept drift (as well as real data), we compare with a selection of baseline methods, as well as new learning methods from the literature, and show that our Hoeffding tree method achieves fast and more accurate performance. 相似文献

7.

Pivot-based approximate k-NN similarity joins for big high-dimensional data

《Information Systems》2020

相似文献

8.

A feature reduction and unsupervised classification algorithm for multispectral data

K. Chidananda Gowda 《Pattern recognition》1984,17(6):667-676

A new scheme, incorporating dimensionality reduction and clustering, suitable for classification of a large volume of remotely sensed data using a small amount of memory is proposed. The scheme involves transforming the data from multidimensional n-space to a 3-dimensional primary color space of blue, green and red coordinates. The dimensionality reduction is followed by data reduction, which involves assigning 3-dimensional samples to a 2-dimensional array. Finally, a multi-stage ISODATA technique incorporating a novel seedpoint picking method is used to obtain the desired number of clusters.

The storage requirements are reduced to a low value by making five passes through the data and storing necessary information during each pass. The first three passes are used to find the minimum and maximum values of some of the variables. The data reduction is done and a classification table is formed during the fourth pass. The classification map is obtained during the fifth pass. The computer memory required is about 2K machine words.

The efficacy of the algorithm is justified by simulation studies using multispectral LANDSAT data. 相似文献

9.

STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

Hai Thanh Mai Jaeho Kim Yohan J. Roh Myoung Ho Kim 《GeoInformatica》2013,17(2):325-352

Histograms have been widely used for estimating selectivity in query optimization. In this paper, we propose a new histogram construction method for geographic data objects that are used in many real-world applications. The proposed method is based on analyses and utilization of clusters of objects that exist in a given data set, to build histograms with significantly enhanced accuracy. Our philosophy in allocating the histogram buckets is to allocate them to the subspaces that properly capture object clusters. Therefore, we first propose a procedure to find the centers of object clusters. Then, we propose an algorithm to construct the histogram buckets from these centers. The buckets are initialized from the clusters’ centers, then expanded to cover the clusters. Best expansion plans are chosen based on a notion of skewness gain. Results from extensive experiments using real-life data sets demonstrate that the proposed method can really improve the accuracy of the histograms further, when compared with the current state of the art histogram construction method for geographic data objects. 相似文献

10.

An efficient cluster-based tribes optimization algorithm for functional-link-based neurofuzzy inference systems

Cheng-Hung Chen Yen-Yun Liao 《Applied Soft Computing》2013,13(5):2261-2271

This study presents an efficient cluster-based tribes optimization algorithm (CTOA) for designing a functional-link-based neurofuzzy inference system (FLNIS) for prediction applications. The proposed CTOA learning algorithm was used to optimize the parameters of the FLNIS model. The proposed CTOA adopts a self-clustering algorithm to divide the swarm into multiple tribes, and uses different displacement strategies to update each particle. The CTOA also uses a tribal adaptation mechanism to generate or remove particles and reconstruct tribal links. The tribal adaptation mechanism can improve the quality of the tribe and the tribe adaptation. In CTOA, the displacement strategy and the tribal adaptation mechanism depend on the tribal leaders to strengthen the local search ability. Finally, the proposed FLNIS-CTOA method was applied to several prediction problems. The results of this study demonstrate the effectiveness of the proposed CTOA learning algorithm. 相似文献

11.

Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing 总被引：3，自引：0，他引：3

Jun Yan Benyu Zhang Ning Liu Shuicheng Yan Qiansheng Cheng Fan W. Qiang Yang Xi W. Zheng Chen 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(3):320-333

Dimensionality reduction is an essential data preprocessing technique for large-scale and streaming data classification tasks. It can be used to improve both the efficiency and the effectiveness of classifiers. Traditional dimensionality reduction approaches fall into two categories: feature extraction and feature selection. Techniques in the feature extraction category are typically more effective than those in feature selection category. However, they may break down when processing large-scale data sets or data streams due to their high computational complexities. Similarly, the solutions provided by the feature selection approaches are mostly solved by greedy strategies and, hence, are not ensured to be optimal according to optimized criteria. In this paper, we give an overview of the popularly used feature extraction and selection algorithms under a unified framework. Moreover, we propose two novel dimensionality reduction algorithms based on the orthogonal centroid algorithm (OC). The first is an incremental OC (IOC) algorithm for feature extraction. The second algorithm is an orthogonal centroid feature selection (OCFS) method which can provide optimal solutions according to the OC criterion. Both are designed under the same optimization criterion. Experiments on Reuters Corpus Volume-1 data set and some public large-scale text data sets indicate that the two algorithms are favorable in terms of their effectiveness and efficiency when compared with other state-of-the-art algorithms. 相似文献

12.

分类数据集的一致化特征选择约简

下载免费PDF全文

吴新玲《计算机工程与应用》2007,43(18):174-176

样本数据集的不一致性和冗余特征会降低分类的质量和效率。提出了一种一致化特征选择约简方法,该方法基于贝叶斯公式,采用阈值,将非一致数据归为最可能的一类,使数据集一致化。并在一致数据集上,运用类别区分矩阵选择可准确区分各类数据的最小特征变量集。给出的启发式搜索策略和应用实例表明:一致化特征选择约简方法能有效消除分类数据集的不一致性,选择最优的特征变量、降低数据的维数、减少数据集中的冗余信息。相似文献

13.

An efficient weighted Lagrangian twin support vector machine for imbalanced data classification

Yuan-Hai Shao Wei-Jie Chen Jing-Jing Zhang Zhen Wang Nai-Yang Deng 《Pattern recognition》2014

In this paper, we propose an efficient weighted Lagrangian twin support vector machine (WLTSVM) for the imbalanced data classification based on using different training points for constructing the two proximal hyperplanes. The main contributions of our WLTSVM are: (1) a graph based under-sampling strategy is introduced to keep the proximity information, which is robustness to outliers, (2) the weight biases are embedded in the Lagrangian TWSVM formulations, which overcomes the bias phenomenon in the original TWSVM for the imbalanced data classification, (3) the convergence of the training procedure of Lagrangian functions is proven and (4) it is tested and compared with some other TWSVMs on synthetic and real datasets to show its feasibility and efficiency for the imbalanced data classification. 相似文献

14.

Combining parametric and non-parametric algorithms for a partially unsupervised classification of multitemporal remote-sensing images

《Information Fusion》2002,3(4):289-297

In this paper, we propose a classification system based on a multiple-classifier architecture, which is aimed at updating land-cover maps by using multisensor and/or multisource remote-sensing images. The proposed system is composed of an ensemble of classifiers that, once trained in a supervised way on a specific image of a given area, can be retrained in an unsupervised way to classify a new image of the considered site. In this context, two techniques are presented for the unsupervised updating of the parameters of a maximum-likelihood classifier and a radial basis function neural-network classifier, on the basis of the distribution of the new image to be classified. Experimental results carried out on a multitemporal and multisource remote-sensing data set confirm the effectiveness of the proposed system. 相似文献

15.

Distributed processing of continuous sliding-window k-NN queries for data stream filtering

Kre?imir Pripu?i? Ivana Podnar ?arko Karl Aberer 《World Wide Web》2011,14(5-6):465-494

A sliding-window k-NN query (k-NN/w query) continuously monitors incoming data stream objects within a sliding window to identify k closest objects to a query. It enables effective filtering of data objects streaming in at high rates from potentially distributed sources, and offers means to control the rate of object insertions into result streams. Therefore k-NN/w processing systems may be regarded as one of the prospective solutions for the information overload problem in applications that require processing of structured data in real-time, such as the Sensor Web. Existing k-NN/w processing systems are mainly centralized and cannot cope with multiple data streams, where data sources are scattered over the Internet. In this paper, we propose a solution for distributed continuous k-NN/w processing of structured data from distributed streams. We define a k-NN/w processing model for such setting, and design a distributed k-NN/w processing system on top of the Content-Addressable Network (CAN) overlay. An extensive evaluation using both real and synthetic data sets demonstrates the feasibility of the proposed solution because it balances the load among the peers, while the messaging overhead within the P2P network remains reasonable. Moreover, our results clearly show the solution is scalable for an increasing number of queries and peers. 相似文献

16.

A software package for interactive motor unit potential classification using fuzzy k-NN classifier

Rasheed S Stashuk D Kamel M 《Computer methods and programs in biomedicine》2008,89(1):56-71

We present an interactive software package for implementing the supervised classification task during electromyographic (EMG) signal decomposition process using a fuzzy k-NN classifier and utilizing the MATLAB high-level programming language and its interactive environment. The method employs an assertion-based classification that takes into account a combination of motor unit potential (MUP) shapes and two modes of use of motor unit firing pattern information: the passive and the active modes. The developed package consists of several graphical user interfaces used to detect individual MUP waveforms from a raw EMG signal, extract relevant features, and classify the MUPs into motor unit potential trains (MUPTs) using assertion-based classifiers. 相似文献

17.

Noise-free principal component analysis: An efficient dimension reduction technique for high dimensional molecular data

《Expert systems with applications》2014,41(17):7797-7804

Principal component analysis (PCA) is one of the powerful dimension reduction techniques widely used in data mining field. PCA tries to project the data into lower dimensional space while preserving the intrinsic information hidden in the data as much as possible. Disadvantage of PCA is that, extracted principal components (PCs) are linear combination of all features, hence PCs are may still contaminated with noise in the data. To address this problem we propose a modified version of PCA called noise free PCA (NFPCA), in which regularization is introduced during the PCs extraction step to mitigate the effect of noise. Potentials of the proposed method is assessed in two important application of high-dimensional molecular data: classification and survival prediction. Multiple publicly available real-world data sets are used for this illustration. Experimental results show that, the NFPCA produce highly informative than the ordinary PCA method. This is largely due to the fact that the NFPCA suppress the effect of noise in the PCs more efficiently with minimum information lost. The NFPCA is a promising alternative to existing PCA approaches not only in terms of highly informative PCs, but also its relatively cheap computational cost. 相似文献

18.

Decision tree approach for classification and dimensionality reduction of electronic nose data 总被引：2，自引：0，他引：2

Jung Hwan Cho Pradeep U. KurupAuthor vitae 《Sensors and actuators. B, Chemical》2011,160(1):542

This paper presents a decision tree approach using two different tree models, C4.5 and CART, for use in the classification and dimensionality reduction of electronic nose (EN) data. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. The decision tree is proficient at both maintaining the role of dimensionality reduction and at organizing optimally sized classification trees, and therefore it could be a promising approach to analyze EN data. In the experiments conducted, six sensor response parameters were extracted from the dynamic sensor responses of each of the four metal oxide gas sensors. The six parameters observed were the rising time (T_r), falling time (T_f), total response time (T_t), normalized peak voltage change (y_p,n), normalized curve integral (C_I), and triangle area (T_A). One sensor parameter from each metal oxide sensor was used for the classification trees, and the best classification accuracy of 97.78% was achieved by CART using the C_I parameter. However, the accuracy of CART was improved using all of the sensor parameters as inputs to the classification tree. The improved results of CART, having an accuracy of 98.89%, was comparable to that of two popular classifiers, the multilayer perceptron (MLP) neural network and the fuzzy ARTMAP network (accuracy of 98.89%, and 100%, respectively). Furthermore, as a dimensionality reduction method the decision tree has shown a better discrimination accuracy of 100% for the MLP classifier and 98.89% for the fuzzy ARTMAP classifier as compared to those achieved with principle component analysis (PCA) giving 81.11% and 97.78%, and a variable selection method giving 92.22% and 93.33% (for the same MLP and fuzzy ARTMAP classifiers). Therefore, a decision tree could be a promising technique for a pattern recognition system for EN data in terms of two functions; as classifier which is an optimally organized classification tree, and as dimensionality reduction method for other pattern recognition techniques. 相似文献

19.

适用于IDS中数据分类的数值归约算法*

李玲娟梁玉龙王汝传《计算机应用研究》2007,24(12):146-148

以提高IDS中数据分类效率为目标,分析了IDS中被检测数据的特点,设计了一种适用于IDS中数据分类的数值归约算法（NRAADCI）。该算法一方面用值域来减少特征值数目,另一方面将孤立的点放大为一个区域以预测类似行为。最后以决策树分类算法为例,通过实验验证了该数值归约算法的有效性。实验结果表明,该算法在降低已有分类算法的时间复杂度的同时使分类准确率有所提升。相似文献

20.

A non-parametric method to determine basic probability assignment for classification problems

Peida Xu Xiaoyan Su Sankaran Mahadevan Chenzhao Li Yong Deng 《Applied Intelligence》2014,41(3):681-693

As an important tool for knowledge representation and decision-making under uncertainty, Dempster-Shafer evidence theory (D-S theory) has been used in many fields. The application of D-S theory is critically dependent on the availability of the basic probability assignment (BPA). The determination of BPA is still an open issue. A non-parametric method to obtain BPA is proposed in this paper. This method can handle multi-attribute datasets in classification problems. Each attribute value of the dataset sample is treated as a stochastic quantity. Its non-parametric probability density function (PDF) is calculated using the training data, which can be regarded as the probability model for the corresponding attribute. The BPA function is then constructed based on the relationship between the test sample and the probability models. The missing attribute values in datasets are treated as ignorance in the framework of the evidence theory. This method does not have the assumption of any particular distribution. As a result, it can be flexibly used in many engineering applications. The obtained BPA can avoid high conflict between evidence, which is desired in data fusion. Several benchmark classification problems are used to demonstrate the proposed method and to compare against existing methods. The constructed classifier based on the proposed method compares well to the state-of-the-art algorithms. 相似文献