期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王荣李晋宏宋威《计算机工程与设计》2012,33(9):3553-3557,3568

为了得到准确有效的用户聚类,提出了一种基于关键字的用户聚类算法.该算法是在传统Rock算法的基础上进行了改进,提出了相似权重和平均邻居的概念,并且将用户关键字事务集的平均邻居数定义为用户访问模式相似性的标准.在不产生离群用户点的基础上,缩小了用户聚类的范围,将一个大的用户聚类更加精确的划分为几个小的用户聚类.利用用户之间的相似度阈值对数据进行过滤,减小了用户聚类的计算量.经过实验验证该算法有效的提高了相似用户聚类的准确性和运行效率. 相似文献

2.

基于多重特征的双层Web用户聚类方法

王钊樊钊《计算机应用研究》2018,35(1)

通过对Web日志的聚类分析,可以发现用户的群体特征,甚至可以预测用户将来的访问模式,进而为不同的用户群提供个性化服务。针对现有方法的一般缺陷,包括特征选择单一无法充分体现用户兴趣偏好和传统Hierarchical算法在用户聚类时存在的收敛效率低、易受用户访问多样性影响的问题,提出了基于多重特征的双层用户聚类方法。该方法采用多重特征对用户相似性进行度量,并在此基础上进行双层聚类。首先采用基于密度的DBSCAN算法来排除用户会话中的离群对象和发现不规则簇,然后再采用自底向上的Hierarchical方法对第一层的聚类结果进行聚类。实验结果表明,本文方法具有良好的稳定性和聚类效果。相似文献

3.

CrossClus: user-guided multi-relational clustering 总被引：2，自引：0，他引：2

Xiaoxin Yin Jiawei Han Philip S. Yu 《Data mining and knowledge discovery》2007,15(3):321-348

Most structured data in real-life applications are stored in relational databases containing multiple semantically linked relations. Unlike clustering in a single table, when clustering objects in relational databases there are usually a large number of features conveying very different semantic information, and using all features indiscriminately is unlikely to generate meaningful results. Because the user knows her goal of clustering, we propose a new approach called CrossClus, which performs multi-relational clustering under user’s guidance. Unlike semi-supervised clustering which requires the user to provide a training set, we minimize the user’s effort by using a very simple form of user guidance. The user is only required to select one or a small set of features that are pertinent to the clustering goal, and CrossClus searches for other pertinent features in multiple relations. Each feature is evaluated by whether it clusters objects in a similar way with the user specified features. We design efficient and accurate approaches for both feature selection and object clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of CrossClus. The work was supported in part by the U.S. National Science Foundation NSF IIS-03-13678 and NSF BDI-05-15813, and an IBM Faculty Award. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect views of the funding agencies. 相似文献

4.

具有用户特征约束的多关系聚类

下载免费PDF全文

王志超张磊《计算机工程与应用》2011,47(23):124-129

多数聚类算法都是针对数据本身,往往忽略了用户聚类目的以及聚类过程中用户的参与指导,这样从数据本身出发的聚类结果准确性往往不太理想。针对这个问题,提出具有用户特征约束的多关系聚类算法。在多关系关联数据中进行用户参与的特征选择,用Must特征集和Can’t特征集描述用户聚类目的,通过领域本体进行特征集合扩充,得到聚类特征集合进行聚类。实验表明,该算法能较好地描述用户聚类目的,实现用户参与的聚类指导,获得了较好的聚类结果。相似文献

5.

基于改进CURE算法的不确定性移动用户数据聚类

高长元王海晶王京《计算机工程与科学》2016,38(4):768-774

随着云计算、大数据以及移动互联网的发展,移动终端用户数据呈现出数据量大、噪声大、动态性及不确定性增强的趋势,影响了移动用户数据聚类准确率与效率。针对上述问题,提出了一种改进的层次聚类算法CURE。该算法将原有算法中抽样处理数据的方式用Map Reduce函数实现并行化处理,同时结合区间数的概念,将移动用户数据用一个区间表示,计算其区间距离来适应移动用户数据的不确定性特点,从而提高聚类效率与准确率。最后利用MIT Reality项目数据集进行仿真,仿真结果表明了该方法的有效性及可行性,为移动用户数据的进一步利用及用户的个性化推荐提供支持。相似文献

6.

一种面向空间多变量数据聚类的可视分析方法

吴斐然陈海东黄劲陈为《软件学报》2014,25(S2):111-118

聚类是研究空间多变量数据的重要工具之一.但是自动聚类算法通常需要用户预设参数然后生成结果,缺乏一种有效的交互机制将用户介入到聚类的过程,使之动态改变参数并对结果进行调整和评估.为此提出一种面向空间多变量数据聚类的可视分析流程,首先运用自动聚类算法对原始三维空间进行聚类,针对三维空间不易交互的缺陷将数据点投影到二维平面进行交互选择和可视编码,设置多种视图使用户实时而全面地理解数据分布和模式,交互地修正聚类结果,并根据一些编码的统计信息来判断结果的合理性和正确性.整个流程是渐进式的,即用户通过迭代逐步细化结果,最终抽取兴趣域.案例分析表明,新的可视分析流程能够有效地提高空间自动聚类算法的精度,也极大地缩短了用户交互的时间. 相似文献

7.

细菌觅食算法与K-means结合的Web用户会话聚类

凌海峰王浩《计算机工程与应用》2012,48(36):121-124,176

Web用户会话聚类是电子商务领域的NP-难问题,目的是发现相似的用户访问行为模式。该问题难度在于对大规模的Web会话进行聚类,且每个会话都表示为高维向量。提出一种细菌觅食算法和K-means相结合的优化算法,用知名的数据集测试其有效性。对Web会话进行聚类,与流行的聚类算法进行比较,实验结果显示该算法高效且性能更优。相似文献

8.

基于混合概率潜在语义分析模型的Web聚类

王治和王凌云党辉潘丽娜《计算机应用》2012,32(11):3018-3022

在电子商务应用中,为了更好地了解用户的内在特征,制定有效的营销策略,提出一种基于混合概率潜在语义分析（H PLSA）模型的Web聚类算法。利用概率潜在语义分析（PLSA）技术分别对用户浏览数据、页面内容信息及内容增强型用户事务数据建立PLSA模型, 通过对数—似然函数对三个PLSA模型进行合并得到用户聚类的H PLSA模型和页面聚类的H PLSA模型。聚类分析中以潜在主题与用户、页面以及站点之间的条件概率作为相似度计算依据,聚类算法采用基于距离的k medoids 算法。设计并构建了H PLSA模型,在该模型上对Web聚类算法进行验证,表明该算法是可行的。相似文献

9.

基于用户模糊聚类的协同过滤推荐研究 总被引：1，自引：1，他引：0

李华张宇孙俊华《计算机科学》2012,39(12):83-86

传统的协同过滤算法没有考虑用户的自身信息对评分的影响,存在的数据稀疏性、扩展性差等弊端直接影响了推荐系统的推荐质量。对此提出了一种基于用户情景模糊聚类的协同过滤推荐算法。首先根据用户情景信息利用模糊聚类算法得到情景相似的用户群分类,然后在进行协同过滤前预先通过Slope One算法填充用户一项目评分矩阵,以有效改善数据稀疏性和实时性。实验结果表明,改进后的算法在推荐精度上有较大提高。相似文献

10.

Document Clustering With Dual Supervision Through Feature Reweighting

下载免费PDF全文

Yeming Hu Evangelos E. Milios James Blustein 《Computational Intelligence》2016,32(3):480-513

Traditional semi‐supervised clustering uses only limited user supervision in the form of instance seeds for clusters and pairwise instance constraints to aid unsupervised clustering. However, user supervision can also be provided in alternative forms for document clustering, such as labeling a feature by indicating whether it discriminates among clusters. This article thus fills this void by enhancing traditional semi‐supervised clustering with feature supervision, which asks the user to label discriminating features during defining (labeling) the instance seeds or pairwise instance constraints. Various types of semi‐supervised clustering algorithms were explored with feature supervision. Our experimental results on several real‐world data sets demonstrate that augmenting the instance‐level supervision with feature‐level supervision can significantly improve document clustering performance. 相似文献

11.

基于Chameleon算法的用户聚类的设计与实现 总被引：6，自引：0，他引：6

陈小松崔志明《微机发展》2005,15(4):48-50

用户聚类是Web挖掘的重要部分,而Chameleon算法是一种通用的聚类算法。文中把Chameleon算法应用于Web挖掘,设计了Web用户的聚类,采用J2EE体系架构实现了这一算法,并在此基础上做了改进,实验结果表明,该算法取得了良好的效果。相似文献

12.

Clustering techniques: The user''s dilemma 总被引：2，自引：0，他引：2

Richard Dubes Anil K. Jain 《Pattern recognition》1976,8(4):247-260

Numerous papers on clustering techniques and their applications in engineering, medical, and biological areas have appeared in pattern recognition literature during the past decade. This paper attempts to set some guidelines for a potential user of a clustering technique. We examine eight clustering programs which are representative of the various available techniques and compare their performances from several points of view. A formal comparative analysis is also performed with a portion of Munson's handprinted character data set. We believe that an understanding of the intrinsic characteristics of a clustering technique is essential to the intelligent application of the technique. Further, the output of a clustering program, along with whatever information a user may have about the data set, should be used together to form hypotheses about the structure of the data set. 相似文献

13.

基于改进Canopy聚类的协同过滤推荐算法

唐泽坤《计算机应用研究》2020,37(9):2615-2619,2639

推荐系统通过建立用户和信息产品之间的二元关系,利用用户行为产生的数据挖掘每个用户感兴趣的对象并进行推荐,基于用户的协同过滤是近年来的主流办法,但存在一定局限性：推荐时需要考虑全部用户,而单个用户往往只与少部分用户类似。为了解决这个问题,提出了基于改进Canopy聚类的协同过滤推荐算法,将用户模型数据密度、距离与用户活跃度结合,计算用户数据权值,对用户模型数据进行聚类。由于结合了Canopy的聚类思想,同一用户可以属于不同的类,符合用户可能对多领域感兴趣的情况。最后对每个Canopy中的用户进行相应的推荐,根据聚类结果与用户评分预测用户可能感兴趣的对象。通过在数据集MovieLens和million songs上与对比算法进行MAE、RMSE、NDGG三个指标的比较,验证了该算法能显著提高推荐系统预测与推荐的准确度。相似文献

14.

一种分层聚类模型及其在电信行业的应用

苏进张佑生《计算机工程》2005,31(22):110-112

提出一种分层聚类算法,该算法可识别任意形状、大小的类,在某电信企业的客户分析中取得了较好的结果。算法首先从不同的角度对电信客户进行聚类或分类,然后以这些类为基础,实行自底向上的层次聚类得到最终的聚类结果。算法执行效率高,适合大规模数据的聚类问题。相似文献

15.

一种基于改进的层次聚类的协同过滤用户推荐算法研究

张峻玮杨洲《计算机科学》2014,41(12):176-178

为了降低组用户推荐的计算时间,提出了一种改进的层次聚类协同过滤用户推荐算法。由于数据的稀疏性,传统的聚类方法在尝试划分用户群时效果不理想。考虑到传统聚类算法的聚类中心不变组内用户间相关度不高等问题,将用户进行聚类,然后按照分类计算出每个用户的推荐结果,在进行聚类的同时充分利用用户间的信息传递来增强组内用户的信息共享,最后将组内所有的用户的推荐结果进行聚合。最后仿真实验表明,本方法能够有效地提高推荐的准确度,比传统的协同过滤算法具有更高的执行效率。相似文献

16.

SignatureClust: a tool for landmark gene-guided clustering

Pankaj?Chopra Hanjun?Shin Jaewoo?Kang Email author Sunwon?Lee 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(3):411-418

Over the last several years, many clustering algorithms have been applied to gene expression data. However, most clustering algorithms force the user into having one set of clusters, resulting in a restrictive biological interpretation of gene function. It would be difficult to interpret the complex biological regulatory mechanisms and genetic interactions from this restrictive interpretation of microarray expression data. The software package SignatureClust allows users to select a group of functionally related genes (called ‘Landmark Genes’), and to project the gene expression data onto these genes. Compared to existing algorithms and software in this domain, our software package offers two unique benefits. First, by selecting different sets of landmark genes, it enables the user to cluster the microarray data from multiple biological perspectives. This encourages data exploration and discovery of new gene associations. Second, most packages associated with clustering provide internal validation measures, whereas our package validates the biological significance of the new clusters by retrieving significant ontology and pathway terms associated with the new clusters. SignatureClust is a free software tool that enables biologists to get multiple views of the microarray data. It highlights new gene associations that were not found using a traditional clustering algorithm. The software package ‘SignatureClust’ and the user manual can be downloaded from . 相似文献

17.

基于分段、聚类和时序关联分析的用户行为分析

常慧君单洪满毅《计算机应用研究》2014,31(2):526-531

分析用户行为对网络用户的管理控制有着重要意义。用户行为实质上是一系列的数据交换过程, 最终会体现为业务流, 且这些业务流在时间上表现出一定的规律性。通过研究业务流的时序关系来分析用户行为的规律, 提出一种用户行为的分析方法。该方法分为三个阶段, 分别基于分形模型、改进的最大距离聚类法和Apriori算法进行分段、聚类和时序分析, 最终从用户的数据交换中获知用户的行为规律。实验表明, 该方法在无法获知用户消息的具体内容的前提下, 仍能较为准确地区分各类报文序列, 并能有效发现用户信息发送行为的规律。相似文献

18.

服务器日志挖掘在电力业务系统功能推荐中的应用

胡扬波陈咏秋周红林《计算机系统应用》2015,24(3):256-259

提出了一种基于服务器日志挖掘的电力业务系统功能推荐服务,首先从电力业务系统服务器日志中获取用户日志数据,然后对含有"脏"数据的用户日志数据进行预处理,以适应数据挖掘与处理;接着由待处理的数据计算用户访问兴趣度,并基于改进的K均值聚类算法将用户访问兴趣度数据集划分为多个具有相近兴趣度的用户集合,最终为用户提供功能个性化推荐服务.实验结果证明该方法在实现电力业务系统信息推荐方面具有较好的效果. 相似文献

19.

一种新的Web用户行为模式挖掘算法的研究

何尧赵跃龙《计算机测量与控制》2005,13(6):600-602

从Web日志文件中挖掘出用户行为模式,是所有Web站点管理者的迫切需要,但由于web日志数据量大,存有大量的干扰和不完整的数据,导致无法准确的抽取出用户行为的模式。小环境无监督聚类算法适合挖掘具有噪音和不完整数据的大量数据集,但它是基于欧几里德空间的二维模型,数据表示不直观。我们对UNC进行改进,提出了具有层次结构的UNC(简称LUNC)。性能测试实验证明,该模型具有较好的整体性能。相似文献

20.

Cover geometry design using multiple convex hulls

Yuki Igarashi Hiromasa Suzuki 《Computer aided design》2011,(9):1154-1162

We present a design method to create close-fitting customized covers for given three-dimensional (3D) objects such as cameras, toys and figurines. The system first computes clustering of the input vertices using multiple convex hulls, then generates multiple convex hulls using the results. It then outputs a cover geometry to set union operation of these hulls, and the resulting intersection curves are set as seam lines. However, as some of the regions created are not necessarily suitable for flattening, the user can design seam lines by drawing and erasing. The system flattens the patches of the target cover geometry after segmentation, allowing the user to obtain a corresponding 2D pattern and sew the shapes in actual fabric. This paper’s contribution lies in its proposal of a clustering method to generate multiple convex hulls, i.e., a set of convex hulls that individually cover part of the input mesh and together cover all of it. The method is based on vertex clustering to allow handling of mesh models with poor vertex connectivity such as those obtained by 3D scanning, and accommodates conventional meshes with multiple connected components and point-based models with no connectivity information. Use of the system to design actual covers confirmed that it functions as intended. 相似文献