首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Traditional semi‐supervised clustering uses only limited user supervision in the form of instance seeds for clusters and pairwise instance constraints to aid unsupervised clustering. However, user supervision can also be provided in alternative forms for document clustering, such as labeling a feature by indicating whether it discriminates among clusters. This article thus fills this void by enhancing traditional semi‐supervised clustering with feature supervision, which asks the user to label discriminating features during defining (labeling) the instance seeds or pairwise instance constraints. Various types of semi‐supervised clustering algorithms were explored with feature supervision. Our experimental results on several real‐world data sets demonstrate that augmenting the instance‐level supervision with feature‐level supervision can significantly improve document clustering performance.  相似文献   

2.
Software defect prediction is an important decision support activity in software quality assurance. The limitation of the labelled modules usually makes the prediction difficult, and the class‐imbalance characteristic of software defect data leads to negative influence on decision of classifiers. Semi‐supervised learning can build high‐performance classifiers by using large amount of unlabelled modules together with the labelled modules. Ensemble learning achieves a better prediction capability for class‐imbalance data by using a series of weak classifiers to reduce the bias generated by the majority class. In this paper, we propose a new semi‐supervised software defect prediction approach, non‐negative sparse‐based SemiBoost learning. The approach is capable of exploiting both labelled and unlabelled data and is formulated in a boosting framework. In order to enhance the prediction ability, we design a flexible non‐negative sparse similarity matrix, which can fully exploit the similarity of historical data by incorporating the non‐negativity constraint into sparse learning for better learning the latent clustering relationship among software modules. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that non‐negative sparse‐based SemiBoost learning outperforms several representative state‐of‐the‐art semi‐supervised software defect prediction methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
Semi-supervised model-based document clustering: A comparative study   总被引:4,自引:0,他引:4  
Semi-supervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semi-supervised clustering. Viewing semi-supervised learning from a clustering angle is useful in practical situations when the set of labels available in labeled data are not complete, i.e., unlabeled data contain new classes that are not present in labeled data. This paper analyzes several multinomial model-based semi-supervised document clustering methods under a principled model-based clustering framework. The framework naturally leads to a deterministic annealing extension of existing semi-supervised clustering approaches. We compare three (slightly) different semi-supervised approaches for clustering documents: Seeded damnl, Constrained damnl, and Feedback-based damnl, where damnl stands for multinomial model-based deterministic annealing algorithm. The first two are extensions of the seeded k-means and constrained k-means algorithms studied by Basu et al. (2002); the last one is motivated by Cohn et al. (2003). Through empirical experiments on text datasets, we show that: (a) deterministic annealing can often significantly improve the performance of semi-supervised clustering; (b) the constrained approach is the best when available labels are complete whereas the feedback-based approach excels when available labels are incomplete. Editor: Andrew Moore  相似文献   

4.
付治  王红军  李天瑞  滕飞  张继 《软件学报》2020,31(4):981-990
聚类是机器学习领域中的一个研究热点,弱监督学习是半监督学习中一个重要的研究方向,有广泛的应用场景.在对聚类与弱监督学习的研究中,提出了一种基于k个标记样本的弱监督学习框架.该框架首先用聚类及聚类置信度实现了标记样本的扩展.其次,对受限玻尔兹曼机的能量函数进行改进,提出了基于k个标记样本的受限玻尔兹曼机学习模型.最后,完成了对该模型的推理并设计相关算法.为了完成对该框架和模型的检验,选择公开的数据集进行对比实验,实验结果表明,基于k个标记样本的弱监督学习框架实验效果较好.  相似文献   

5.
6.
Composite kernels for semi-supervised clustering   总被引:3,自引:2,他引:1  
A critical problem related to kernel-based methods is how to select optimal kernels. A kernel function must conform to the learning target in order to obtain meaningful results. While solutions to the problem of estimating optimal kernel functions and corresponding parameters have been proposed in a supervised setting, it remains a challenge when no labeled data are available, and all we have is a set of pairwise must-link and cannot-link constraints. In this paper, we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. We propose a new optimization criterion for automatically estimating the optimal parameters of composite Gaussian kernels, directly from the data and given constraints. We combine our proposal with a semi-supervised kernel-based algorithm to demonstrate experimentally the effectiveness of our approach. The results show that our method is very effective for kernel-based semi-supervised clustering.  相似文献   

7.
Learning to rank is a supervised learning problem that aims to construct a ranking model for the given data. The most common application of learning to rank is to rank a set of documents against a query. In this work, we focus on point‐wise learning to rank, where the model learns the ranking values. Multivariate adaptive regression splines (MARS) and conic multivariate adaptive regression splines (CMARS) are supervised learning techniques that have been proven to provide successful results on various prediction problems. In this article, we investigate the effectiveness of MARS and CMARS for point‐wise learning to rank problem. The prediction performance is analyzed in comparison to three well‐known supervised learning methods, artificial neural network (ANN), support vector machine, and random forest for two datasets under a variety of metrics including accuracy, stability, and robustness. The experimental results show that MARS and ANN are effective methods for learning to rank problem and provide promising results.  相似文献   

8.
This paper is concerned with the variance‐constrained state estimation problem for a class of networked multi‐rate systems (NMSs) with network‐induced probabilistic sensor failures and measurement quantization. The stochastic characteristics of the sensor failures are governed by mutually independent random variables over the interval [0,1]. By applying the lifting technique, an augmented system model is established to facilitate the state estimation of the underlying NMSs. With the aid of the stochastic analysis approach, sufficient conditions are derived under which the exponential mean‐square stability of the augmented system is guaranteed, the prescribed H performance constraint is achieved, and the individual variance constraint on the steady‐state estimation error is satisfied. Based on the derived conditions, the addressed variance‐constrained state estimation problem of NMSs is recast as a convex optimization one that can be solved via the semi‐definite program method. Furthermore, the explicit expression of the desired estimator gains is obtained by means of the feasibility of certain matrix inequalities. Two additional optimization problems are considered with respect to the H performance index and the weighted error variances. Finally, a simulation example is utilized to illustrate the effectiveness of the proposed state estimation method. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
A bipartite graph is a powerful abstraction for modeling relationships between two collections. Visualizations of bipartite graphs allow users to understand the mutual relationships between the elements in the two collections, e.g., by identifying clusters of similarly connected elements. However, commonly‐used visual representations do not scale for the analysis of large bipartite graphs containing tens of millions of vertices, often resorting to an a‐priori clustering of the sets. To address this issue, we present the Who's‐Active‐On‐What‐Visualization (WAOW‐Vis) that allows for multiscale exploration of a bipartite social‐network without imposing an a‐priori clustering. To this end, we propose to treat a bipartite graph as a high‐dimensional space and we create the WAOW‐Vis adapting the multiscale dimensionality‐reduction technique HSNE. The application of HSNE for bipartite graph requires several modifications that form the contributions of this work. Given the nature of the problem, a set‐based similarity is proposed. For efficient and scalable computations, we use compressed bitmaps to represent sets and we present a novel space partitioning tree to efficiently compute similarities; the Sets Intersection Tree. Finally, we validate WAOW‐Vis on several datasets connecting Twitter‐users and ‐streams in different domains: news, computer science and politics. We show how WAOW‐Vis is particularly effective in identifying hierarchies of communities among social‐media users.  相似文献   

10.
在开放网络环境下软件容易受到攻击,导致软件故障,需要进行安全性测试,针对无监督类测试方法开销较大和复杂度较高的问题,提出一种基于半监督自适应学习算法的软件安全性测试方法;首先采用模糊度量原理构建软件安全测试的半监督学习数学模型,分析软件产生安全性故障的数组特征,然后通过软件故障的熵特征分布方法进行软件的可靠性度量,在开放式网络环境下建立软件可靠性云决策模型,实现安全性测试和故障定位;最后通过仿真实验进行性能验证,结果表明,采用该方法进行软件安全性测试,对软件故障定位的准确度较高,测试的实时性较好,保障了软件的安全可靠运行。  相似文献   

11.
Separating text lines in unconstrained handwritten documents remains a challenge because the handwritten text lines are often un-uniformly skewed and curved, and the space between lines is not obvious. In this paper, we propose a novel text line segmentation algorithm based on minimal spanning tree (MST) clustering with distance metric learning. Given a distance metric, the connected components (CCs) of document image are grouped into a tree structure, from which text lines are extracted by dynamically cutting the edges using a new hypervolume reduction criterion and a straightness measure. By learning the distance metric in supervised learning on a dataset of pairs of CCs, the proposed algorithm is made robust to handle various documents with multi-skewed and curved text lines. In experiments on a database with 803 unconstrained handwritten Chinese document images containing a total of 8,169 lines, the proposed algorithm achieved a correct rate 98.02% of line detection, and compared favorably to other competitive algorithms.  相似文献   

12.
The understanding of core concepts and processes of science in solving problems is important to successful learning in biology. We have designed and developed a Web‐based, self‐directed tutorial program, SOLVEIT, that provides various scaffolds (e.g., prompts, expert models, visual guidance) to help college students enhance their skills and abilities in solving problems in science. An initial version of SOLVEIT was used in this study. This paper details the features of SOLVEIT that are contextualized within the biological domains of evolution and ecology. A qualitative case study was conducted to evaluate the usability of the program. Selected students were recruited from an introductory biology course at a large public university in the south‐eastern United States. Data for this study were collected through the SOLVEIT database and semi‐structured interviews. The findings of this study demonstrate the potential of the program for improving students' problem solving in biology. Suggestions for the use of SOLVEIT and its further improvement and development are discussed, along with suggestions for future research. This study also provides more general guidance for researchers and practitioners who are interested in the design, development and evaluation of Web‐based tutorial programs in science education.  相似文献   

13.
Recently, automatic 3D caricature generation has attracted much attention from both the research community and the game industry. Machine learning has been proven effective in the automatic generation of caricatures. However, the lack of 3D caricature samples makes it challenging to train a good model. This paper addresses this problem by two steps. First, the training set is enlarged by reconstructing 3D caricatures. We reconstruct 3D caricatures based on some 2D caricature samples with a Principal Component Analysis (PCA)‐based method. Secondly, between the 2D real faces and the enlarged 3D caricatures, a regressive model is learnt by the semi‐supervised manifold regularization (MR) method. We then predict 3D caricatures for 2D real faces with the learnt model. The experiments show that our novel approach synthesizes the 3D caricature more effectively than traditional methods. Moreover, our system has been applied successfully in a massive multi‐user educational game to provide human‐like avatars.  相似文献   

14.
Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method called RBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such as AQ‐type rule induction programs, or extracted from a tree generated by another method, such as the ID3 or C4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably with AQDT‐1 and AQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable.  相似文献   

15.
In this work, we propose an iterative learning control scheme with a novel barrier composite energy function approach to deal with position constrained robotic manipulators with uncertainties under alignment condition. The classical assumption of initial resetting condition is removed. Through rigorous analysis, we show that uniform convergence is guaranteed for joint position and velocity tracking error. By introducing a novel tan‐type barrier Lyapunov function into barrier composite energy function and keeping it bounded in closed‐loop analysis, the constraint on joint position vector will not be violated. A simulation study has further demonstrated the efficacy of the proposed scheme. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
In this paper, we propose automatic image segmentation using constraint learning and propagation. Recently, kernel learning is receiving much attention because a learned kernel can fit the given data better than a predefined kernel. To effectively learn the constraints generated by initial seeds for image segmentation, we employ kernel propagation (KP) based on kernel learning. The key idea of KP is first to learn a small-sized seed-kernel matrix and then propagate it into a large-sized full-kernel matrix. By applying KP to automatic image segmentation, we design a novel segmentation method to achieve high performance. First, we generate pairwise constraints, i.e., must-link and cannot-link, from initially selected seeds to make the seed-kernel matrix. To select the optimal initial seeds, we utilize global k-means clustering (GKM) and self-tuning spectral clustering (SSC). Next, we propagate the seed-kernel matrix into the full-kernel matrix of the entire image, and thus image segmentation results are obtained. We test our method on the Berkeley segmentation database, and the experimental results demonstrate that the proposed method is very effective in automatic image segmentation.  相似文献   

17.
WHISK系统是一个半自动的IE系统,对结构化、半结构化的Web文本它都能使用生成的抽取规则进行信息抽取.但是它在规则学习过程中规则不能保证以最优的方式进行扩展,且生成规则集的时间较长.文中主要针对这些问题,提出利用遗传算法改进WHISK的监督式学习算法,并采用移除法生成规则集.实验结果表明此方法在效率和召回率上都得到提高.  相似文献   

18.
针对传统Mashup服务推荐算法在关键字聚合搜索和网络构建等方式中计算复杂度过高的问题,提出一种基于语义标签的植入引导式层次聚类Mashup服务推荐算法。首先,为提高聚类算法的收敛精度,提高算法运行效率来满足大型数据搜索对算法简化的需求,采用数据预处理和植入易于获取具有代表性的样本数据对聚类进行引导,防止层次聚类算法顶层集分类失败导致的算法聚类失败。其次,利用改进的聚类算法结合实际的Mashup服务数据库,设计了植入引导式层次聚类Mashup服务推荐算法。最后,通过通过仿真对比表明,基于语义的植入式半监督层次聚类Mashup服务推荐算法的精度要好于对比算法,验证所提算法的有效性。  相似文献   

19.
The problem of identifying the conditions under which semantic or behavioural dependences arise between different program statements has interesting applications in various areas such as program understanding, software maintenance, software audits and software testing. We present an extension to the program dependence graph (PDG), called the dependence condition graph (DCG), that enables identifying the conditions for dependence between program points. We show that these conditions are not only correct with respect to the program's semantics, but also more precise than identified by other known techniques. We also present evidence that the DCG is a practical representation and can be built for large programs, and sketch many different applications of the DCG.  相似文献   

20.
周丽娜  吕萌 《计算机应用》2011,31(2):416-419
脑磁信号(MEG)作为一种新的脑机接口(BCI)输入信号,含有手运动方向的模式信息。鉴于半监督聚类融合了训练数据先验知识的优势,提出一种基于训练中心的半监督模糊聚类算法。该算法分为降维和改进的半监督聚类,采用主成分分析和线性判别分析将高维数据降到低维,改进的半监督聚类在对训练数据进行模糊聚类的基础上,将得到的聚类中心加权到测试数据聚类过程中,以增加测试数据聚类中心的鲁棒性。结果表明,该算法识别率较高,平均识别率达到了55.1%,优于BCI竞赛Ⅳ的最好结果46.9%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号