共查询到20条相似文献,搜索用时 0 毫秒
1.
Clustering aims to partition a data set into homogenous groups which gather similar objects. Object similarity, or more often object dissimilarity, is usually expressed in terms of some distance function. This approach, however, is not viable when dissimilarity is conceptual rather than metric. In this paper, we propose to extract the dissimilarity relation directly from the available data. To this aim, we train a feedforward neural network with some pairs of points with known dissimilarity. Then, we use the dissimilarity measure generated by the network to guide a new unsupervised fuzzy relational clustering algorithm. An artificial data set and a real data set are used to show how the clustering algorithm based on the neural dissimilarity outperforms some widely used (possibly partially supervised) clustering algorithms based on spatial dissimilarity. 相似文献
2.
Neurocontroller design via supervised and unsupervised learning 总被引:1,自引:0,他引:1
In this paper we study the role of supervised and unsupervised neural learning schemes in the adaptive control of nonlinear dynamic systems. We suggest and demonstrate that the teacher's knowledge in the supervised learning mode includes a-priori plant sturctural knowledge which may be employed in the design of exploratory schedules during learning that results in an unsupervised learning scheme. We further demonstrate that neurocontrollers may realize both linear and nonlinear control laws that are given explicitly in an automated teacher or implicitly through a human operator and that their robustness may be superior to that of a model based controller. Examples of both learning schemes are provided in the adaptive control of robot manipulators and a cart-pole system. 相似文献
3.
《Applied Artificial Intelligence》2013,27(5-6):519-533
One relevant problem in data quality is missing data. Despite the frequent occurrence and the relevance of the missing data problem, many machine learning algorithms handle missing data in a rather naive way. However, missing data treatment should be carefully treated, otherwise bias might be introduced into the knowledge induced. In this work, we analyze the use of the k-nearest neighbor as an imputation method. Imputation is a term that denotes a procedure that replaces the missing values in a data set with some plausible values. One advantage of this approach is that the missing data treatment is independent of the learning algorithm used. This allows the user to select the most suitable imputation method for each situation. Our analysis indicates that missing data imputation based on the k-nearest neighbor algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data, and can also outperform the mean or mode imputation method, which is a method broadly used to treat missing values. 相似文献
4.
《Engineering Applications of Artificial Intelligence》2005,18(6):673-683
This paper proposes a new methodology which combines supervised and unsupervised learning for evaluating power system dynamic security. Based on the concept of stability margin, pre-fault power system conditions are assigned to the output neurons on the two-dimensional grid with the growing hierarchical self-organizing map technique (GHSOM) via supervised artificial neural networks (ANNs) which perform an estimation of post-fault power system state. The technique estimates the dynamic stability index that corresponds to the most critical value of synchronizing and damping torques of multimachine power systems. ANN-based pattern recognition is carried out with the growing hierarchical self-organizing feature mapping in order to provide adaptive neural network architecture during its unsupervised training process. Numerical tests, carried out on a IEEE 9 bus power system are presented and discussed. The analysis using such method provides accurate results and improves the effectiveness of system security evaluation. 相似文献
5.
Context-Aware Recommender Systems (CARS) have started to attract significant research attention in the last years, due to the interest of considering the context of the user in order to offer him/her more appropriate recommendations. However, the evaluation of CARS is a challenge, due to the scarce availability of appropriate datasets that incorporate context information related to the ratings provided by the users.In this paper, we present DataGenCARS, a complete Java-based synthetic dataset generator that can be used to obtain the required datasets for any type of scenario desired, allowing a high flexibility in the obtention of appropriate data that can be used to evaluate CARS. The generator presents features such as: a flexible definition of user schemas, user profiles, types of items, and types of contexts; a realistic generation of ratings and attributes of items; the possibility to mix real and synthetic datasets; functionalities to analyze existing datasets as a basis for synthetic data generation; and support for the automatic mapping between item schemas and Java classes. Moreover, an experimental evaluation illustrates the interest and the benefits provided by DataGenCARS. 相似文献
6.
Giorgio Gnecco 《Information Processing Letters》2010,110(23):1031-1036
For Tikhonov regularization in supervised learning from data, the effect on the regularized solution of a joint perturbation of the regression function and the data is investigated. Spectral windows in the finite-sample and population cases are compared via probabilistic estimates of the differences between regularized solutions. 相似文献
7.
8.
随着大数据的普及和算力的提升,深度学习已成为一个热门研究领域,但其强大的性能过分依赖网络结构和参数设置。因此,如何在提高模型性能的同时降低模型的复杂度,关键在于模型优化。为了更加精简地描述优化问题,本文以有监督深度学习作为切入点,对其提升拟合能力和泛化能力的优化方法进行归纳分析。给出优化的基本公式并阐述其核心;其次,从拟合能力的角度将优化问题分解为3个优化方向,即收敛性、收敛速度和全局质量问题,并总结分析这3个优化方向中的具体方法与研究成果;从提升模型泛化能力的角度出发,分为数据预处理和模型参数限制两类对正则化方法的研究现状进行梳理;结合上述理论基础,以生成对抗网络(generative adversarial network,GAN)变体模型的发展历程为主线,回顾各种优化方法在该领域的应用,并基于实验结果对优化效果进行比较和分析,进一步给出几种在GAN领域效果较好的优化策略。现阶段,各种优化方法已普遍应用于深度学习模型,能够较好地提升模型的拟合能力,同时通过正则化缓解模型过拟合问题来提高模型的鲁棒性。尽管深度学习的优化领域已得到广泛研究,但仍缺少成熟的系统性理论来指导优化方法的使用,... 相似文献
9.
在监督或半监督学习的条件下对数据流集成分类进行研究是一个很有意义的方向.从基分类器、关键技术、集成策略等三个方面进行介绍,其中,基分类器主要介绍了决策树、神经网络、支持向量机等;关键技术从增量、在线等方面介绍;集成策略主要介绍了boosting、stacking等.对不同集成方法的优缺点、对比算法和实验数据集进行了总结与分析.最后给出了进一步研究方向,包括监督和半监督学习下对于概念漂移的处理、对于同质集成和异质集成的研究,无监督学习下的数据流集成分类等. 相似文献
10.
A social stream refers to the data stream that records a series of social entities and the dynamic interactions between two entities. It can be employed to model the changes of entity states in numerous applications. The social streams, the combination of graph and streaming data, pose great challenge to efficient analytical query processing, and are key to better understanding users’ behavior. Considering of privacy and other related issues, a social stream generator is of great significance. A framework of synthetic social stream generator (SSG) is proposed in this paper. The generated social streams using SSG can be tuned to capture several kinds of fundamental social stream properties, including patterns about users’ behavior and graph patterns. Extensive empirical studies with several real-life social stream data sets show that SSG can produce data that better fit to real data. It is also confirmed that SSG can generate social stream data continuously with stable throughput and memory consumption. Furthermore, we propose a parallel implementation of SSG with the help of asynchronized parallel processing model and delayed update strategy. Our experiments verify that the throughput of the parallel implementation can increase linearly by increasing nodes. 相似文献
11.
This paper describes in full detail a model of a hierarchical classifier (HC). The original classification problem is broken
down into several subproblems and a weak classifier is built for each of them. Subproblems consist of examples from a subset
of the whole set of output classes. It is essential for this classification framework that the generated subproblems would
overlap, i.e. some individual classes could belong to more than one subproblem. This approach allows to reduce the overall
risk. Individual classifiers built for the subproblems are weak, i.e. their accuracy is only a little better than the accuracy
of a random classifier. The notion of weakness for a multiclass model is extended in this paper. It is more intuitive than
approaches proposed so far. In the HC model described, after a single node is trained, its problem is split into several subproblems
using a clustering algorithm. It is responsible for selecting classes similarly classified. The main scope of this paper is
focused on finding the most appropriate clustering method. Some algorithms are defined and compared. Finally, we compare a
whole HC with other machine learning approaches. 相似文献
12.
设计了一个通用的基于控制流和数据流的结构测试数据自动生成的工具。该工具根据控制流和数据流测试中所采用的覆盖标准来选取测试路径,并以改进后的迭代松弛法为核心,对所选取的路径生成测试数据。同时工具采用Fibonacci法优化选取路径,对不可达路径进行处理,并对测试数据的分支覆盖率、DCP覆盖率等进行了统计。实验结果表明该工具是可行的。 相似文献
13.
K. Chidananda Gowda 《Pattern recognition》1984,17(6):667-676
A new scheme, incorporating dimensionality reduction and clustering, suitable for classification of a large volume of remotely sensed data using a small amount of memory is proposed. The scheme involves transforming the data from multidimensional n-space to a 3-dimensional primary color space of blue, green and red coordinates. The dimensionality reduction is followed by data reduction, which involves assigning 3-dimensional samples to a 2-dimensional array. Finally, a multi-stage ISODATA technique incorporating a novel seedpoint picking method is used to obtain the desired number of clusters.
The storage requirements are reduced to a low value by making five passes through the data and storing necessary information during each pass. The first three passes are used to find the minimum and maximum values of some of the variables. The data reduction is done and a classification table is formed during the fourth pass. The classification map is obtained during the fifth pass. The computer memory required is about 2K machine words.
The efficacy of the algorithm is justified by simulation studies using multispectral LANDSAT data. 相似文献
14.
A neural network classifier, called supervised extended ART (SEART), that incorporates a supervised mechanism into the extended unsupervised ART is presented here. It uses a learning theory called Nested Generalized Exemplar (NGE) theory. In any time, the training instances may or may not have desired outputs, that is, this model can handle supervised learning and unsupervised learning simultaneously. The unsupervised component finds the cluster relations of instances, and the supervised component learns the desired associations between clusters and classes. In addition, this model has the ability of incremental learning. It works equally well when instances in a cluster belong to different classes. Also, multi-category and nonconvex classifications can be dealt with. Besides, the experimental results are very encouraging. 相似文献
15.
Cells efficiently carry out organic synthesis, energy transduction, and signal processing across a range of environmental conditions and at nanometer scales—rivaling any engineered system. In the cell, these processes are orchestrated by gene networks, which we define broadly as networks of interacting genes, proteins, and metabolites. Understanding how the dynamics of gene networks give rise to cellular functions is a principal challenge in biology, and identifying their structure is the first step towards their control. This knowledge has applications ranging from the improvement of antibiotics, the engineering of microbes for environmental remediation, and the creation of biologically-derived energy sources. In this review, we discuss several methods for the identification of gene networks. 相似文献
16.
Thorsten Twellmann Anke Meyer-Baese Oliver Lange Simon Foo Tim W. Nattkemper 《Engineering Applications of Artificial Intelligence》2008,21(2):129-140
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has become an important tool in breast cancer diagnosis, but evaluation of multitemporal 3D image data holds new challenges for human observers. To aid the image analysis process, we apply supervised and unsupervised pattern recognition techniques for computing enhanced visualizations of suspicious lesions in breast MRI data. These techniques represent an important component of future sophisticated computer-aided diagnosis (CAD) systems and support the visual exploration of spatial and temporal features of DCE-MRI data stemming from patients with confirmed lesion diagnosis. By taking into account the heterogeneity of cancerous tissue, these techniques reveal signals with malignant, benign and normal kinetics. They also provide a regional subclassification of pathological breast tissue, which is the basis for pseudo-color presentations of the image data. Intelligent medical systems are expected to have substantial implications in healthcare politics by contributing to the diagnosis of indeterminate breast lesions by non-invasive imaging. 相似文献
17.
数字化经络仪、中医健康量表和四诊仪是中医临床常用辅助诊断工具,提供了很多中医临床数据。数据分布不均衡,同一个病例具有多个诊断标记是临床数据常见现象。以亚健康数据为例探索针对不均衡数据的机器学习分类方法;以肾脏疾病为例研究综合三种辅助诊断工具的混合分类模型;以心血管病、血脂异常疾病、尿酸升高类疾病为例,探索多标记数据分类方法。实验均取得良好分类效果,同时所选择特征符合医学理论,具有临床指导意义。 相似文献
18.
19.
Pirbonyeh Amin Rezaie Vahideh Parvin Hamid Nejatian Samad Mehrabi Mehdi 《Pattern Analysis & Applications》2019,22(3):1149-1160
Pattern Analysis and Applications - The paper has proposed a linear unsupervised transfer learning (LUTL). Therefore, a cost function has been introduced. In the cost function of the proposed LUTL,... 相似文献
20.
Juan I. Gonzlez Hidalgo Bruno I. F. Maciel Roberto S. M. Barros 《Computational Intelligence》2019,35(4):670-692
Processing data streams requires new demands not existent on static environments. In online learning, the probability distribution of the data can often change over time (concept drift). The prequential assessment methodology is commonly used to evaluate the performance of classifiers in data streams with stationary and non‐stationary distributions. It is based on the premise that the purpose of statistical inference is to make sequential probability forecasts for future observations, rather than to express information about the past accuracy achieved. This article empirically evaluates the prequential methodology considering its three common strategies used to update the prediction model, namely, Basic Window, Sliding Window, and Fading Factors. Specifically, it aims to identify which of these variations is the most accurate for the experimental evaluation of the past results in scenarios where concept drifts occur, with greater interest in the accuracy observed within the total data flow. The prequential accuracy of the three variations and the real accuracy obtained in the learning process of each dataset are the basis for this evaluation. The results of the carried‐out experiments suggest that the use of Prequential with the Sliding Window variation is the best alternative. 相似文献