首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Nonnegative matrix factorization has been widely used in co-clustering tasks which group data points and features simultaneously. In recent years, several proposed co-clustering algorithms have shown their superiorities over traditional one-side clustering, especially in text clustering and gene expression. Due to the NP-completeness of the co-clustering problems, most existing methods relaxed the orthogonality constraint as nonnegativity, which often deteriorates performance and robustness as a result. In this paper, penalized nonnegative matrix tri-factorization is proposed for co-clustering problems, where three penalty terms are introduced to guarantee the near orthogonality of the clustering indicator matrices. An iterative updating algorithm is proposed and its convergence is proved. Furthermore, the high-order nonnegative matrix tri-factorization technique is provided for symmetric co-clustering tasks and a corresponding algorithm with proved convergence is also developed. Finally, extensive experiments in six real-world datasets demonstrate that the proposed algorithms outperform the compared state-of-the-art co-clustering methods.  相似文献   

2.
The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman–Kruskal’s τ, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend τ to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes τ by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.  相似文献   

3.
Analytical workloads in data warehouses often include heavy joins where queries involve multiple fact tables in addition to the typical star-patterns, dimensional grouping and selections. In this paper we propose a new processing and storage framework called bitwise dimensional co-clustering (BDCC) that avoids replication and thus keeps updates fast, yet is able to accelerate all these foreign key joins, efficiently support grouping and pushes down most dimensional selections. The core idea of BDCC is to cluster each table on a mix of dimensions, each possibly derived from attributes imported over an incoming foreign key and this way creating foreign key connected tables with partially shared clusterings. These are later used to accelerate any join between two tables that have some dimension in common and additionally permit to push down and propagate selections (reduce I/O) and accelerate aggregation and ordering operations. Besides the general framework, we describe an algorithm to derive such a physical co-clustering database automatically and describe query processing and query optimization techniques that can easily be fitted into existing relational engines. We present an experimental evaluation on the TPC-H benchmark in the Vectorwise system, showing that co-clustering can significantly enhance its already high performance and at the same time significantly reduce the memory consumption of the system.  相似文献   

4.
In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms.  相似文献   

5.
International Journal on Software Tools for Technology Transfer - Deep Neural Networks (DNNs) are rapidly gaining popularity in a variety of important domains. Unfortunately, modern DNNs have been...  相似文献   

6.
It is essential to take into account the service quality assessment made by the passengers of a public transportation system, as well as the weight or relative importance assigned to each one of the attributes considered, in order to know its strengths and weaknesses. This paper proposes using Artificial Neural Networks (ANN) to analyze the service quality perceived by the passengers of a public transportation system. This technique is characterized by its high capability for prediction and for capturing highly non-lineal intrinsic relations between the study variables without requiring a pre-defined model. First, an ANN model was developed using the data gathered in a Customer Satisfaction Survey conducted on the Granada bus metropolitan transit system in 2007. Next, three different methods were used to determine the relative contribution of the attributes. Finally, a statistical analysis was applied to the outcomes of each method to identify groups of attributes with significant differences in their relative importance. The results show that statistical significant differences exist among several categories of attributes that have a greater or lesser impact on service quality and satisfaction. All the methods agree that Frequency is the most influential attribute in the service quality, and that other attributes such as Speed, Information and Proximity are also important.  相似文献   

7.
A general framework for assessing future impacts of technology on society and environment is presented. The dynamics between human activity and technological systems impact upon many processes in society and nature. This involves non-linear dynamics requiring an understanding of how technology and human behaviour influence each other and co-evolve. Conventionally, technological and behavioural systems are analyzed as separate entities. We develop an integrated theoretical and methodological approach termed techno-behavioural dynamics focussing on networked interactions between technology and behaviour across multiple system states. We find that positive feedback between technology learning, evolving preferences and network effects can lead to tipping points in complex sociotechnical systems. We also demonstrate how mean-field and agent-based models are complimentary for capturing a hierarchy of analytical resolutions in a common problem domain. Assessing and predicting co-evolutionary dynamics between technology and human behaviour can help avoid systems lock-in and inform a range of adaptive responses to environmental and societal risk.  相似文献   

8.
Cluster analysis, or clustering, refers to the analysis of the structural organization of a data set. This analysis is performed by grouping together objects of the data that are more similar among themselves than to objects of different groups. The sampled data may be described by numerical features or by a symbolic representation, known as categorical features. These features often require a transformation into numerical data in order to be properly handled by clustering algorithms. The transformation usually assigns a weight for each feature calculated by a measure of importance (i.e., frequency, mutual information). A problem with the weight assignment is that the values are calculated with respect to the whole set of objects and features. This may pose as a problem when a subset of the features have a higher degree of importance to a subset of objects but a lower degree with another subset. One way to deal with such problem is to measure the importance of each subset of features only with respect to a subset of objects. This is known as co-clustering that, similarly to clustering, is the task of finding a subset of objects and features that presents a higher similarity among themselves than to other subsets of objects and features. As one might notice, this task has a higher complexity than the traditional clustering and, if not properly dealt with, may present an scalability issue. In this paper we propose a novel co-clustering technique, called HBLCoClust, with the objective of extracting a set of co-clusters from a categorical data set, without the guarantees of an enumerative algorithm, but with the compromise of scalability. This is done by using a probabilistic clustering algorithm, named Locality Sensitive Hashing, together with the enumerative algorithm named InClose. The experimental results are competitive when applied to labeled categorical data sets and text corpora. Additionally, it is shown that the extracted co-clusters can be of practical use to expert systems such as Recommender Systems and Topic Extraction.  相似文献   

9.
Data co-clustering refers to the problem of simultaneous clustering of two data types. Typically, the data is stored in a contingency or co-occurrence matrix C where rows and columns of the matrix represent the data types to be co-clustered. An entry C ij of the matrix signifies the relation between the data type represented by row i and column j. Co-clustering is the problem of deriving sub-matrices from the larger data matrix by simultaneously clustering rows and columns of the data matrix. In this paper, we present a novel graph theoretic approach to data co-clustering. The two data types are modeled as the two sets of vertices of a weighted bipartite graph. We then propose Isoperimetric Co-clustering Algorithm (ICA)—a new method for partitioning the bipartite graph. ICA requires a simple solution to a sparse system of linear equations instead of the eigenvalue or SVD problem in the popular spectral co-clustering approach. Our theoretical analysis and extensive experiments performed on publicly available datasets demonstrate the advantages of ICA over other approaches in terms of the quality, efficiency and stability in partitioning the bipartite graph.  相似文献   

10.
Low-rank matrix factorization is one of the most useful tools in scientific computing, data mining and computer vision. Among of its techniques, non-negative matrix factorization (NMF) has received considerable attention due to producing a parts-based representation of the data. Recent research has shown that not only the observed data are found to lie on a nonlinear low dimensional manifold, namely data manifold, but also the features lie on a manifold, namely feature manifold. In this paper, we propose a novel algorithm, called graph dual regularization non-negative matrix factorization (DNMF), which simultaneously considers the geometric structures of both the data manifold and the feature manifold. We also present a graph dual regularization non-negative matrix tri-factorization algorithm (DNMTF) as an extension of DNMF. Moreover, we develop two iterative updating optimization schemes for DNMF and DNMTF, respectively, and provide the convergence proofs of our two optimization schemes. Experimental results on UCI benchmark data sets, several image data sets and a radar HRRP data set demonstrate the effectiveness of both DNMF and DNMTF.  相似文献   

11.
A chaotic neural network proposed (CNN) by Aihara et al. is able to recollect stored patterns dynamically. But there are difficult cases such as its long time processing of association, and difficult to recall a specific stored pattern during the dynamical associations. We have proposed to find the optimal parameters using meta-heuristics methods to improve association performance, for example, the shorter recalling time and higher recollection rates of stored patterns in our previous works. However, the relationship between the different values of parameters of chaotic neurons and the association performance of CNN was not investigated clearly. In this paper, we propose a method to analyze the spatiotemporal changes of internal states in CNN and, by the method, analyze how the change of values of internal parameters of chaotic neurons affects the characteristics of chaotic neurons when multiple patterns are stored in the CNN. Quantile–Quantile plot, least square approximation, hierarchical clustering, and Hilbert transform are used to investigate the similarity of internal states of chaotic neurons, and to classify the neurons. Simulation results showed that how different values of an internal parameter yielded different behaviors of chaotic neurons and it suggests the optimal parameter which generates higher association performance may concern with the stored patterns of the CNN.  相似文献   

12.
Applied Intelligence - Multi-view clustering has gained importance in recent times due to the large-scale generation of data, often from multiple sources. Multi-view clustering refers to clustering...  相似文献   

13.
The paper focuses on mining clusters that are characterized by a lagged relationship between the data objects. We call such clusters lagged co-clusters. A lagged co-cluster of a matrix is a submatrix determined by a subset of rows and their corresponding lag over a subset of columns. Extracting such subsets may reveal an underlying governing regulatory mechanism. Such a regulatory mechanism is quite common in real-life settings. It appears in a variety of fields: meteorology, seismic activity, stock market behavior, neuronal brain activity, river flow, and navigation, but a limited list of examples. Mining such lagged co-clusters not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal lagged co-cluster is NP-complete problem. We present a polynomial-time Monte-Carlo algorithm for mining lagged co-clusters. We prove that, with fixed probability, the algorithm mines a lagged co-cluster which encompasses the optimal lagged co-cluster by a maximum 2 ratio columns overhead and completely no rows overhead. Moreover, the algorithm handles noise, anti-correlations, missing values, and overlapping patterns. The algorithm is extensively evaluated using both artificial and real-world test environments. The first enable the evaluation of specific, isolated properties of the algorithm. The latter (river flow and topographic data) enable the evaluation of the algorithm to efficiently mine relevant and coherent lagged co-clusters in environments that are temporal, i.e., time reading data and non-temporal.  相似文献   

14.
In traditional co-clustering, the only basis for the clustering task is a given relationship matrix, describing the strengths of the relationships between pairs of elements in the different domains. Relying on this single input matrix, co-clustering discovers relationships holding among groups of elements from the two input domains. In many real life applications, on the other hand, other background knowledge or metadata about one or more of the two input domain dimensions may be available and, if leveraged properly, such metadata might play a significant role in the effectiveness of the co-clustering process. How additional metadata affects co-clustering, however, depends on how the process is modified to be context-aware. In this paper, we propose, compare, and evaluate three alternative strategies (metadata-driven, metadata-constrained, and metadata-injected co-clustering) for embedding available contextual knowledge into the co-clustering process. Experimental results show that it is possible to leverage the available metadata in discovering contextually-relevant co-clusters, without significant overheads in terms of information theoretical co-cluster quality or execution cost.  相似文献   

15.
Current theories about the dynamics of neural networks with nonlinear characteristics and parameterized by set of parameters are mostly based on approximations in one way or another. In this paper we first introduce a rigorous approach which allows us to check in which parameter region a given saturated state is an attractor of the dynamics: a saturated state w=(w i , i=1,...,N){-1,1} N is an attractor of the dynamics if and only if there is a local field gap between neurons in J + (w)={i, w i =1} and J - (w)={i, w i =–1}. Then we apply the result to analyze several models in neural networks. In particular in the Hopfield model we calculate the capacity and give an exact relation between the capacity and the threshold.  相似文献   

16.
17.
Collaborative filtering is one of the most popular recommendation techniques, which provides personalised recommendations based on users’ tastes. In spite of its huge success, it suffers from a range of problems, the most fundamental being that of data sparsity. Sparsity in ratings makes the formation of inaccurate neighbourhood, thereby resulting in poor recommendations. To address this issue, in this article, we propose a novel collaborative filtering approach based on information-theoretic co-clustering. The proposed approach computes two types of similarities: cluster preference and rating, and combines them. Based on the combined similarity, the user-based and item-based approaches are adopted, respectively, to obtain individual predictions for an unknown target rating. Finally, the proposed approach fuses these resultant predictions. Experimental results show that the proposed approach is superior to existing alternatives.  相似文献   

18.
As networks become larger, scalability and QoS-awareness become important issues that have to be resolved. A large network can be effectively formed as a hierarchical structure, such as the inter/intra-domain routing hierarchy in the Internet and the Private Network-to-Network Interface (PNNI) standard, to resolve these critical issues. Methods of modeling and analyzing the performance of QoS-capable hierarchical networks become an open issue. Although the reduced load approximation technique has been extensively applied to flat networks, the feasibility of applying it to the hierarchical network model has seldom been investigated. Furthermore, most of the research in this area has focused on the performance evaluation with fixed routing. This work proposes an analytical model for evaluating the performance of adaptive hierarchical networks with multiple classes of traffic. We first study the reduced load approximation model for multirate loss networks, and then propose a novel performance evaluation model for networks with hierarchical routing. This model is based on a decomposition of a hierarchical route into several analytic hierarchical segments; therefore the blocking probability of the hierarchical path can be determined from the blocking probabilities of these segments. Numerical results demonstrate that the proposed model for adaptive hierarchical routing yields accurate blocking probabilities. We also investigate the convergence of the analysis model in both the originating-destination (O-D) pair and the alternative hierarchical path. Finally, the blocking probability of the adaptive hierarchical O-D pair is demonstrated to depend on the blocking of all hierarchical paths but not on the order of the hierarchical path of the same O-D pair.  相似文献   

19.
细胞的行为是随机性的,学习细胞中的随机性有助于理解细胞的组织,设计和进化。建立、确认和分析随机的生化网络模型是当前计算系统生物学领域的一个重要研究主题。当前,标准的Petri网模型已经成为生化网络模拟和定性分析的有力工具。尝试使用随机Petri网对生化网络进行建模与分析,简单描述了随机Petri网理论对标准Petri网的扩充,通过对二聚作用和肌动蛋白这两个典型例子的建模与演化模拟,介绍、论证了随机Petri网理论的新应用。  相似文献   

20.
The preferences adopted by individuals are constantly modified as these are driven by new experiences, natural life evolution and, mainly, influence from friends. Studying these temporal dynamics of user preferences has become increasingly important for personalization tasks in information retrieval and recommendation systems domains. However, existing models are too constrained for capturing the complexity of the underlying phenomenon. Online social networks contain rich information about social interactions and relations. Thus, these become an essential source of knowledge for the understanding of user preferences evolution. In this work, we investigate the interplay between user preferences and social networks over time. First, we propose a temporal preference model able to detect preference change events of a given user. Following this, we use temporal networks concepts to analyze the evolution of social relationships and propose strategies to detect changes in the network structure based on node centrality. Finally, we look for a correlation between preference change events and node centrality change events over Twitter and Jam social music datasets. Our findings show that there is a strong correlation between both change events, specially when modeling social interactions by means of a temporal network.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号