首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Analysis of high dimensional data in modern applications, such as neuroscience, text mining, spectral analysis, chemometrices naturally requires tensor decomposition methods. The Tucker decompositions allow us to extract hidden factors (component matrices) with different dimension in each mode, and investigate interactions among various modalities. The alternating least squares (ALS) algorithms have been confirmed effective and efficient in most of tensor decompositions, especially Tucker with orthogonality constraints. However, for nonnegative Tucker decomposition (NTD), standard ALS algorithms suffer from unstable convergence properties, demand high computational cost for large scale problems due to matrix inverse, and often return suboptimal solutions. Moreover they are quite sensitive with respect to noise, and can be relatively slow in the special case when data are nearly collinear. In this paper, we propose a new algorithm for nonnegative Tucker decomposition based on constrained minimization of a set of local cost functions and hierarchical alternating least squares (HALS). The developed NTD-HALS algorithm sequentially updates components, hence avoids matrix inverse, and is suitable for large-scale problems. The proposed algorithm is also regularized with additional constraint terms such as sparseness, orthogonality, smoothness, and especially discriminant. Extensive experiments confirm the validity and higher performance of the developed algorithm in comparison with other existing algorithms.  相似文献   

2.
现有的非负矩阵分解方法既忽略数据的非局部结构,又难以有效应对噪声和野值点。为了解决上述问题,提出一种新的用于聚类的鲁棒结构正则化非负矩阵分解算法。所提出的算法分别构建一个近邻图和一个最大熵图描述数据的局部结构和非局部结构,并使用L2,1范数代价函数尝试解决噪声问题,从而学习到鲁棒具有判别力的表征。给出一个最优的迭代算法求解两个非负因子,该优化算法的收敛性已被理论和实验证明。在七个图像数据集上的聚类实验结果表明,所提出的算法在无噪声和有噪声情况下聚类均优于其他主流方法。  相似文献   

3.
胡学考  孙福明  李豪杰 《计算机科学》2015,42(7):280-284, 304
矩阵分解因可以实现大规模数据处理而具有十分广泛的应用。非负矩阵分解(Nonnegative Matrix Factorization,NMF)是一种在约束矩阵元素为非负的条件下进行的分解方法。利用少量已知样本的标注信息和大量未标注样本,并施加稀疏性约束,构造了一种新的算法——基于稀疏约束的半监督非负矩阵分解算法。推导了其有效的更新算法,并证明了该算法的收敛性。在常见的人脸数据库上进行了验证,实验结果表明CNMFS算法相对于NMF和CNMF等算法具有较好的稀疏性和聚类精度。  相似文献   

4.
In this paper, we propose a new semi-supervised co-clustering algorithm Orthogonal Semi-Supervised Nonnegative Matrix Factorization (OSS-NMF) for document clustering. In this new approach, the clustering process is carried out by incorporating both prior domain knowledge of data points (documents) in the form of pair-wise constraints and category knowledge of features (words) into the NMF co-clustering framework. Under this framework, the clustering problem is formulated as the problem of finding the local minimizer of objective function, taking into account the dual prior knowledge. The update rules are derived, and an iterative algorithm is designed for the co-clustering process. Theoretically, we prove the correctness and convergence of our algorithm and demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with those constraints.  相似文献   

5.
基于交替非负最小二乘算法的框架,提出一种非负矩阵分解的非单调自适应BB(Barzilai-Borwein)步长算法。虽然该算法的步长不是由线搜索取得的,但是满足非单调线搜索,从而保证了算法的全局收敛性。同时该算法使用自适应BB步长和梯度的Lipschitz常数来提高算法的收敛速度。最后在理论上证明了该算法是收敛的,同时数值试验和人脸识别的试验结果表明该算法是有效的且优于其他算法。  相似文献   

6.
Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing starstructured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive experiments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm.  相似文献   

7.
Non-negative matrix factorization for semi-supervised data clustering   总被引:9,自引:6,他引:3  
Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the domain knowledge can guide a clustering algorithm, consequently improving the quality of clustering. In this paper, we propose SS-NMF: a semi-supervised non-negative matrix factorization framework for data clustering. In SS-NMF, users are able to provide supervision for clustering in terms of pairwise constraints on a few data objects specifying whether they “must” or “cannot” be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the data similarity matrix to infer the clusters. Theoretically, we show the correctness and convergence of SS-NMF. Moveover, we show that SS-NMF provides a general framework for semi-supervised clustering. Existing approaches can be considered as special cases of it. Through extensive experiments conducted on publicly available datasets, we demonstrate the superior performance of SS-NMF for clustering.
Ming DongEmail:
  相似文献   

8.
协同聚类是对数据矩阵的行和列两个方向同时进行聚类的一类算法。本文将双层加权的思想引入协同聚类,提出了一种双层子空间加权协同聚类算法(TLWCC)。TLWCC对聚类块(co-cluster)加一层权重,对行和列再加一层权重,并且算法在迭代过程中自动计算块、行和列这三组权重。TLWCC考虑不同的块、行和列与相应块、行和列中心的距离,距离越大,认为其噪声越强,就给予小权重;反之噪声越弱,给予大权重。通过给噪声信息小权重,TLWCC能有效地降低噪声信息带来的干扰,提高聚类效果。本文通过四组实验展示TLWCC算法识别噪声信息的能力、参数选取对算法聚类结果的影响程度,算法的聚类性能和时间性能。  相似文献   

9.
协同过滤已成功用于为用户提供个性化的产品和服务,然而它面临数据稀疏和冷启动的问题。一种解决方案是结合辅助信息,另一种是从相关领域学习知识。综合考虑了这两个方面,提出一种深度融合辅助信息的跨域推荐算法CICDR,它集成了集体矩阵分解和深度迁移学习。该算法通过Semi-SDAE和矩阵分解(MF)在源域和目标域中进行建模,学习评分信息和辅助信息中的有效特征向量,并利用用户的隐式反馈信息来做出更准确的推荐。通过这种方式,在两个领域中学习到的用户和项目潜在因素为推荐保留了更多的语义信息。通过非完备正交非负矩阵三分解(IONMTF)产生桥接两个相关领域的公共潜在因素,以缓解目标域中的冷启动和数据稀疏问题。在三个真实数据集上与四种经典算法进行对比,验证了提出算法的有效性,进一步提高了推荐精度和用户满意度。  相似文献   

10.
Mixing matrix estimation in instantaneous blind source separation (BSS) can be performed by exploiting the sparsity and disjoint orthogonality of source signals. As a result, approaches for estimating the unknown mixing process typically employ clustering algorithms on the mixtures in a parametric domain, where the signals can be sparsely represented. In this paper, we propose two algorithms to perform discriminative clustering of the mixture signals for estimating the mixing matrix. For the case of overdetermined BSS, we develop an algorithm to perform linear discriminant analysis based on similarity measures and combine it with K-hyperline clustering. Furthermore, we propose to perform discriminative clustering in a high-dimensional feature space obtained by an implicit mapping, using the kernel trick, for the case of underdetermined source separation. Using simulations on synthetic data, we demonstrate the improvements in mixing matrix estimation performance obtained using the proposed algorithms in comparison to other clustering methods. Finally we perform mixing matrix estimation from speech mixtures, by clustering single source points in the time-frequency domain, and show that the proposed algorithms achieve higher signal to interference ratio when compared to other baseline algorithms.  相似文献   

11.
Nonnegative matrix factorization (NMF) is a data analysis technique used in a great variety of applications such as text mining, image processing, hyperspectral data analysis, computational biology, and clustering. In this letter, we consider two well-known algorithms designed to solve NMF problems: the multiplicative updates of Lee and Seung and the hierarchical alternating least squares of Cichocki et al. We propose a simple way to significantly accelerate these schemes, based on a careful analysis of the computational cost needed at each iteration, while preserving their convergence properties. This acceleration technique can also be applied to other algorithms, which we illustrate on the projected gradient method of Lin. The efficiency of the accelerated algorithms is empirically demonstrated on image and text data sets and compares favorably with a state-of-the-art alternating nonnegative least squares algorithm.  相似文献   

12.
Semi-supervised fuzzy co-clustering algorithm for document categorization   总被引:1,自引:1,他引:0  
In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms.  相似文献   

13.
陈露  张晓霞  于洪 《计算机应用》2022,42(3):671-675
非负矩阵三因子分解是潜在因子模型中的重要组成部分,由于能将原始数据矩阵分解为三个相互约束的潜因子矩阵,被广泛应用于推荐系统、迁移学习等研究领域,但目前还没有非负矩阵三因子分解的可解释性方面的研究工作.鉴于此,将用户评论文本信息当作先验知识,设计了一种基于先验知识的非负矩阵半可解释三因子分解(PE-NMTF)算法.首先利...  相似文献   

14.
基于非负矩阵分解(Nonnegative Matrix Factorization, NMF)的高光谱解混(Hyperspectral Unmixing,HU)方法引起了大家的关注,因为可以将一个非负高光谱图像(Hyperspectral Imagery, HSI)数据矩阵分解为两个非负矩阵的乘积,分别对应于端元矩阵和丰度系数矩阵。目前,图约束的NMF算法已经被证明对高光谱解混是有效的,因为它们可以捕获HSI的几何特性。为了挖掘数据在混合过程中的几何结构和稀疏性,提出了一种稀疏的Hessian图正则化NMF(SHGNMF)算法。SHGNMF算法是将丰度矩阵的L1/2正则化器和Hessian图正则化项都添加到每个NMF模型中,同时采用乘法更新规则。最后用模拟数据和真实数据进行实验,验证了所提出的SHGNMF算法相对于其他NMF算法的优越性。  相似文献   

15.
现有大多数的网络聚类方法都只是针对无向网络, 已有的有向网络聚类方法建立在传统聚类算法基础之上, 存在着一定的局限性。针对上述问题, 提出一种基于仿射传播的有向网络聚类算法, 该算法首先采用SimRank作为节点之间的相似度, 并将计算得到的结果转换为适应于仿射传播算法的负值; 然后将相似度矩阵作为输入, 利用具有更好性能的仿射传播算法对有向网络进行聚类。实验结果表明, 所提出算法的聚类性能优于其他几种具有代表性的有向网络聚类算法。  相似文献   

16.
Multi-view clustering has attracted much attention recently. Among all clustering approaches, spectral ones have gained much popularity thanks to an elaborated and solid theoretical foundation. A major limitation of spectral clustering based methods is that these methods only provide a non-linear projection of the data, to which an additional step of clustering is required. This can degrade the quality of the final clustering due to various factors such as the initialization process or outliers. To overcome these challenges, this paper presents a constrained version of a recent method called Multiview Spectral Clustering via integrating Nonnegative Embedding and Spectral Embedding. Besides retaining the advantages of this method, our proposed model integrates two types of constraints: (i) a consistent smoothness of the nonnegative embedding over all views and (ii) an orthogonality constraint over the columns of the nonnegative embedding. Experimental results on several real datasets show the superiority of the proposed approach.  相似文献   

17.
In recent years, nonnegative matrix factorization (NMF) has attracted significant amount of attentions in image processing, text mining, speech processing and related fields. Although NMF has been applied in several application successfully, its simple application on image processing has a few caveats. For example, NMF costs considerable computational resources when performing on large databases. In this paper, we propose two enhanced NMF algorithms for image processing to save the computational costs. One is modified rank-one residue iteration (MRRI) algorithm , the other is element-wisely residue iteration (ERI) algorithm. Here we combine CAPG (a NMF algorithm proposed by Lin), MRRI and ERI with two-dimensional nonnegative matrix factorization (2DNMF) for image processing. The main difference between NMF and 2DNMF is that the former first aligns images into one-dimensional (1D) vectors and then represents them with a set of 1D bases, while the latter regards images as 2D matrices and represents them with a set of 2D bases. The three combined algorithms are named CAPG-2DNMF, MRRI-2DNMF and ERI-2DNMF. The computational complexity and convergence analyses of proposed algorithms are also presented in this paper. Three public databases are used to test the three NMF algorithms and the three combinations, the results of which show the enhancement performance of our proposed algorithms (MRRI and ERI algorithms) over the CAPG algorithm. MRRI and ERI have similar performance. The three combined algorithms have better image reconstruction quality and less running time than their corresponding 1DNMF algorithms under the same compression ratio. We also do some experiments on a real-captured image database and get similar conclusions.  相似文献   

18.
准确而积极地向用户提供他们可能感兴趣的信息或服务是推荐系统的主要任务。协同过滤是采用得最广泛的推荐算法之一,而数据稀疏的问题往往严重影响推荐质量。为了解决这个问题,提出了基于二分图划分联合聚类的协同过滤推荐算法。首先将用户与项目构建成二分图进行联合聚类,从而映射到低维潜在特征空间;其次根据聚类结果改进2种相似性计算策略:簇偏好相似性和评分相似性,并将二者相结合。基于结合的相似性,分别采用基于用户和项目的方法来获得对未知目标评分的预测。最后,将这些预测结果进行融合。实验结果表明,所提算法比最新的联合聚类协同过滤推荐算法具有更好的性能。  相似文献   

19.
Co-clustering treats a data matrix in a symmetric fashion that a partitioning of rows can induce a partitioning of columns, and vice versa. It has been shown advantageous over tradition clustering. However, the computational complexity of most co-clustering algorithms are costly, and thus limit their e?ectiveness on large datasets. A recently proposed sampling-based matrix decomposition method can achieve a linear computational complexity, but selected rows and columns can not effectively represent a large sparse dataset, and many unselected rows and columns can not be mapped to the selected rows and columns because they do not share features in common, thus its performance is impaired. To address this problem, we propose a fast co-clustering framework by ranking and sampling that only representative samples are selected for co-clustering, and the remaining samples can be easily labeled by their neighbors in clustered samples. Extensive experiments on large text datasets show that our approach is able to use very few samples to achieve comparable results in linear time compared to state-of-the-art co-clustering algorithms of nonlinear computational complexity.  相似文献   

20.
在分析单词-文档谱聚类方法的基本步骤,找出其对初始值敏感的根本原因的基础上,提出一种基于模糊-调和均值的单词-文档谱聚类方法.首先从矩阵相似的角度对谱聚类中的Laplacian矩阵进行处理,使其满足对初始值不敏感的条件;然后通过加入模糊的概念,用模糊K-调和均值算法代替K-均值算法,使聚类结果对初始值不敏感.实验结果表明,所提出的方法不仅使聚类结果对初始值不敏感,而且在一定程度上提高了数据的鲁棒性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号