共查询到20条相似文献,搜索用时 46 毫秒
1.
个性化推荐系统中使用最广泛的算法是协同过滤算法,针对该算法存在的数据稀疏和扩展性差问题,提出了一种基于用户兴趣和社交信任的聚类推荐算法。该算法首先基于聚类技术根据用户评分信息将具有相同兴趣的用户聚为一类,并建立基于用户兴趣相近的邻居集合。为了提高兴趣相似度计算的准确性,采用了修正余弦计算公式来消除评分标准的差异问题。然后,引入信任机制,通过定义直接信任、间接信任、传递路径和计算方法来度量社交网络用户之间隐含的信任值,将社交网络转换为信任网络,依据信任程度来创建基于社交信任的邻居集合。通过加权的方式将基于两种邻居集合的预测值融合起来为用户产生项目的推荐。在Douban数据集上进行仿真实验,确定了最优的协调因子值和分类数值,并与基于用户的协同过滤算法和基于信任的推荐算法进行对比,实验结果表明,所提算法的平均绝对误差(MAE)减少了6.7%,准确率(precision)、覆盖(recall)和F1值分别增加了25%、40%和37%,有效提高了推荐系统的推荐质量。 相似文献
2.
基于用户聚类的异构社交网络推荐算法 总被引:11,自引:0,他引:11
相比传统的社交网络,基于弱关系的微博类社交网络具有显著的异构特征.根据特征可以将节点分为用户(消息订阅者)和主题(消息发布者)两类,面向用户推荐其感兴趣的主题成为了该类社交网络中推荐系统的主要目标之一,同时该类社交网络中普遍存在的数据稀疏性和冷启动现象成为了推荐系统面临的主要问题.文中提出一种基于两阶段聚类的推荐算法GCCR,将图摘要方法和基于内容相似度的算法结合,实现基于用户兴趣的主题推荐.与以往方法相比,该方法在稀疏数据和冷启动的情况下具有更好的推荐效果,此外,通过对数据集进行大量的离线处理,使得其较以往推荐方法具有更好的在线推荐效率.最后通过真实社交网络的数据对本方法进行了验证,同时分析了各参数对推荐效果的影响. 相似文献
3.
4.
6.
针对社交网络中社交关系的有向性与多样性,提出了一种基于图聚类与蚁群算法的社交网络聚类算法。首先,在网络覆盖率的约束下为社交网络建立有向、非全连接的二维图模型;然后,采用K-medoids算法搜索用户分组的中心用户,采用人工蚁群算法在2D图中搜索各个用户与中心用户的相似性,将满足相似性阈值的用户分为同一个用户组。设计了低活跃用户的预测机制解决网络的稀疏性问题与冷启动问题。此外,通过网络覆盖率的约束条件权衡聚类准确率与覆盖率两个指标。仿真实验结果表明,该算法实现了较好的社交网络聚类性能,并且有效地缓解了稀疏性问题与冷启动问题。 相似文献
7.
针对大规模社交群体中查询结果过于复杂等问题,将个性化定制和可视化联系起来,能够帮助开发者分析海量数据中的有用信息。本文以泰文版的Facebook为研究对象,结合当前社交网络的OAuth认证、Graph Search社交图谱搜索等原理,对其用户行为可视化方法进行了探讨。考虑到防火墙对Facebook的限制,对自由构建可视化模型的相关泰文文本处理技术还不够成熟。本文利用JJT(Java Scipt Info Vis Toolkit)工具,查询定制了RGraph可视化模型构建的相关参数,并通过Visual.ly数据可视化平台将程序脚本打包成可视化定制模版,实现了基于泰文社交网络行为的可视化图谱。 相似文献
8.
针对权重社交网络差分隐私保护算法中噪声添加量过大以及隐私保护不均衡问题,提出了一种结合谱聚类算法与差分隐私保护模型的隐私保护算法SCDP.首先针对传统差分隐私保护算法直接向社交网络边权重添加噪声方式带来的噪声添加量过大的问题,结合谱聚类算法,将权重社交网络聚类成为不同的簇,对不同的簇采取随机添加噪声的方式,降低噪声的添... 相似文献
9.
依据基因表达数据的特点,提出一种基于弹簧模型的基因表达数据可视化聚类方法,将多维空间的基因表达数据映射到二维空间中,较好地保持了原始多维数据间的时空相似性。实验结果表明,该方法能发现基因表达数据集中隐含的类簇结构以及共表达基因模式。 相似文献
10.
近几年来,文本聚类技术作为机器学习领域一种无监督学习的方法,也越来越成为数据挖掘领域备受关注的技术之一。将小规模的文本数据聚为几类,在一定程度上说是一件比较容易实现的工作。可是,当面对大量高维的中文文本数据时,由于在这种情况下对文本聚类,面对的将是高维和稀疏的数据,在保证聚类质量的情况下,提高聚类的速度和可视化效果也成为聚类研究的课题之一。该文提出一种结合词频反文档频率算法(term frequency, inverse document frequency, TFIDF)和潜在语义分析算法(latent semantic analysis, LSA)相结合的方法,来提高kmeans中文文本聚类的速度和可视化效果。将从网页上采集到的11 456条新闻作为实验对象,通过基于TFIDF聚类和基于TFIDF+LSA聚类进行实验对比,根据聚类指标轮廓系数(Silhouette coefficient, SC)、卡林斯基-原巴斯指数(Calinski-Harabasz index, CHI)和戴维斯-堡丁指数(Davies-Bouldin index, DBI)的值表明,该方法不仅能保证文本聚类... 相似文献
11.
Clustering is one of the most important unsupervised learning problems and it consists of finding a common structure in a collection of unlabeled data. However, due to the ill-posed nature of the problem, different runs of the same clustering algorithm applied to the same data-set usually produce different solutions. In this scenario choosing a single solution is quite arbitrary. On the other hand, in many applications the problem of multiple solutions becomes intractable, hence it is often more desirable to provide a limited group of “good” clusterings rather than a single solution. In the present paper we propose the least squares consensus clustering. This technique allows to extrapolate a small number of different clustering solutions from an initial (large) ensemble obtained by applying any clustering algorithm to a given data-set. We also define a measure of quality and present a graphical visualization of each consensus clustering to make immediately interpretable the strength of the consensus. We have carried out several numerical experiments both on synthetic and real data-sets to illustrate the proposed methodology. 相似文献
12.
随着社交网络的快速发展、社交网络用户规模的不断扩大,如何为用户推荐感兴趣的信息变得越发困难。传统的推荐方法利用用户兴趣的历史数据来预测用户未来感兴趣的项目,忽视了社交网络中的信任关系,导致推荐方法的推荐质量不高。针对上述问题,提出了基于社会信任潜在因子模型的推荐方法。该方法引入社会信任来度量社交网络中朋友之间的隐含信任关系,根据社会信任程度来选择用户信任的朋友,对用户信任的朋友与目标用户的共同兴趣进行潜在因子分析,构建基于社会信任的潜在因子模型,实现目标用户的前k个项目推荐。真实数据集上的对比实验结果表明,基于社会信任潜在因子模型的推荐方法在推荐质量上优于现有的推荐方法。 相似文献
13.
We develop a new algorithm for clustering search results. Differently from many other clustering systems that have been recently proposed as a post-processing step for Web search engines, our system is not based on phrase analysis inside snippets, but instead uses latent semantic indexing on the whole document content. A main contribution of the paper is a novel strategy – called dynamic SVD clustering – to discover the optimal number of singular values to be used for clustering purposes. Moreover, the algorithm is such that the SVD computation step has in practice good performance, which makes it feasible to perform clustering when term vectors are available. We show that the algorithm has very good classification performance, and that it can be effectively used to cluster results of a search engine to make them easier to browse by users. The algorithm has being integrated into the Noodles search engine, a tool for searching and clustering Web and desktop documents. 相似文献
14.
15.
Social learning analytics introduces tools and methods that help improving the learning process by providing useful information about the actors and their activity in the learning system. This study examines the relation between SNA parameters and student outcomes, between network parameters and global course performance, and it shows how visualizations of social learning analytics can help observing the visible and invisible interactions occurring in online distance education.The findings from our empirical study show that future research should further investigate whether there are conditions under which social network parameters are reliable predictors of academic performance, but also advises against relying exclusively in social network parameters for predictive purposes. The findings also show that data visualization is a useful tool for social learning analytics, and how it may provide additional information about actors and their behaviors for decision making in online distance learning. 相似文献
16.
随着Facebook、Twitter、微博等社交网站的迅速普及,好友推荐系统逐渐成为各大社交网站的重要组成部分。好友推荐系统通过主动为用户推荐新的潜在好友来有效地扩大用户的社交圈规模并改善用户的社交体验,因而受到了广泛关注。然而,如何针对用户的个性化需求,为用户推荐真正意义上的好友,一直是个性化好友推荐的难点之一。对此,提出一种基于用户潜在特征的社交网络好友推荐方法(SNFRLF)。首先,通过隐语义模型挖掘用户的潜在属性特征;然后,通过用户的潜在特征计算用户间的相似度;最后,将计算得到的相似度引入到随机游走模型中以获得好友推荐列表。实验结果表明,文中所提好友推荐方法较已有的好友推荐方法在性能上有显著提升。 相似文献
17.
Machine learning techniques for business blog search and mining 总被引:3,自引:1,他引:3
Weblogs, or blogs, have rapidly gained in popularity over the past few years. In particular, the growth of business blogs that are written by or provide commentary on businesses and companies opens up new opportunities for developing blog-specific search and mining techniques. In this paper, we propose probabilistic models for blog search and mining using two machine learning techniques, latent semantic analysis (LSA) and probabilistic latent semantic analysis (PLSA). We implement the models in our database of business blogs, BizBlogs07, with the aim of achieving higher precision and recall. The probabilistic model is able to segment the business blogs into separate topic areas, which is useful for keywords detection on the blogosphere. Various term-weighting schemes and factor values were also studied in detail, which reveal interesting patterns in our database of business blogs. Our multi-functional business blog system is indeed found to be very different from existing blog search engines, as it aims to provide better relevance and precision of the search. 相似文献
18.
Richard J. Hathaway Author Vitae Author Vitae Jacalyn M. Huband Author Vitae 《Pattern recognition》2006,39(7):1315-1324
The problem of determining whether clusters are present in a data set (i.e., assessment of cluster tendency) is an important first step in cluster analysis. The visual assessment of cluster tendency (VAT) tool has been successful in determining potential cluster structure of various data sets, but it can be computationally expensive for large data sets. In this article, we present a new scalable, sample-based version of VAT, which is feasible for large data sets. We include analysis and numerical examples that demonstrate the new scalable VAT algorithm. 相似文献
19.
An information granulation based data mining approach for classifying imbalanced data 总被引:2,自引:0,他引:2
Recently, the class imbalance problem has attracted much attention from researchers in the field of data mining. When learning from imbalanced data in which most examples are labeled as one class and only few belong to another class, traditional data mining approaches do not have a good ability to predict the crucial minority instances. Unfortunately, many real world data sets like health examination, inspection, credit fraud detection, spam identification and text mining all are faced with this situation. In this study, we present a novel model called the “Information Granulation Based Data Mining Approach” to tackle this problem. The proposed methodology, which imitates the human ability to process information, acquires knowledge from Information Granules rather then from numerical data. This method also introduces a Latent Semantic Indexing based feature extraction tool by using Singular Value Decomposition, to dramatically reduce the data dimensions. In addition, several data sets from the UCI Machine Learning Repository are employed to demonstrate the effectiveness of our method. Experimental results show that our method can significantly increase the ability of classifying imbalanced data. 相似文献
20.
研究在已知目标团伙中某节点以及目标团伙特征的前提下,基于通讯痕迹特征寻找社会网络团伙。研究过程中引入了社会圈、节点中心度和事件集合关联矩阵等概念,重点将聚类分析方法与社会团伙发现相结合,以期得到一种基于通讯痕迹的社会网络团伙分析模型。 相似文献