首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 746 毫秒
1.
李改  陈强  李磊  潘进财 《计算机科学》2017,44(2):88-92, 116
单类个性化协同排序算法的研究的核心思想是把单类协同过滤问题当成排序问题来看待。之前的研究仅仅使用了隐式反馈数据来对推荐对象进行排序,这限制了推荐的准确度。随着在线社交网络的出现,为了进一步提高单类个性化协同排序算法的准确度,提出了一种新的融合社交网络的单类个性化协同排序算法。在真实的包含社交网络的2个数据集上的实验验证了该算法在各个评价指标下的性能均优于几个经典的单类协同过滤算法。实验证明,社交网络信息对于提高单类个性化协同排序算法的性能具有重要作用。  相似文献   

2.
何海江  龙跃进 《计算机应用》2011,31(11):3108-3111
针对标记训练集不足的问题,提出了一种协同训练的多样本排序学习算法,从无标签数据挖掘隐含的排序信息。算法使用了两类多样本排序学习机,从当前已有的标记数据集分别构造两个不同的排序函数。相应地,每一个无标签查询都有两个不同的文档排列,由似然损失来计算这两个排列的相似性,为那些文档排列相似度低的查询贴上标签,使两个多样本排序学习机新增了训练数据。在排序学习公开数据集LETOR上的实验结果证实,协同训练的排序算法很有效。另外,还讨论了标注比例对算法的影响。  相似文献   

3.
协同过滤是推荐系统中应用最为广泛的方法.提出一类基于二部图一维投影与排序相结合的协同过滤算法,文中采用结构相似进行二部图投影并利用随机游走对节点排序.该方法不仅可以防止冷启动,具有较高准确度,且可扩展性良好.另外,该算法可以避免低覆盖率造成的推荐不准确.算法可以有两类不同的实现,分别是基于项协同过滤的项排序算法和基于用户协同过滤的用户排序算法,在标准数据集MovieLens上的测试表明了算法的有效性.  相似文献   

4.
针对推荐系统中单类协同过滤(OCCF)可解释性差、数据噪声多的缺陷,提出了一种基于置信度加权的单类协同过滤推荐算法。算法通过置信度函数将用户隐性反馈映射为置信概率,并将该函数集成到隐性反馈推荐模型(IFRM)框架中,形成了隐性反馈置信度加权推荐模型(CWIFRM);在此基础上,针对CWIFRM基于随机梯度下降提出了异构置信度优化算法。实验结果表明,该模型在多个数据集上都具有更好的推荐效果,异构置信度优化算法使推荐质量得到了进一步提高,验证了CWIFRM具有较强的适用性、可解释性和抗噪声能力。  相似文献   

5.
李改 《计算机应用》2015,35(5):1328-1332
之前有关协同排序算法的研究没有充分利用数据集中信息的问题,要么只侧重于研究显式评分数据,要么只侧重于研究隐式评分数据,目前还没有人运用排序学习的思想把二者结合起来进行研究.针对之前研究的不足,在最新的扩展的少即是好协同过滤(xCLiMF)模型和最经典的变形的奇异值分解(SVD++)算法的基础上,提出了一种融合显/隐式反馈的协同排序算法MERR_SVD++来直接优化排序学习的评价指标ERR.在实际数据集上实验验证,与经典的xCLiMF、Cofi排序(CofiRank)、PopRec、Random算法相比,MERR_SVD++算法在归一化折损累积增益(NDCG)和预期的相关性排序(ERR)这两个评价指标下性能均提高了25.9%以上,而且算法运算时间与评分点个数线性相关.由于MERR_SVD++算法推荐精度高、可扩展性好,因此适用于处理大数据,在互联网信息推荐领域具有广泛的应用前景.  相似文献   

6.
刘海洋  王志海  黄丹  孙艳歌 《软件学报》2015,26(11):2981-2993
协同过滤方法是当今大多数推荐系统的核心.传统的协同过滤方法专注于评分预测的准确性,然而实际推荐系统的推荐结果往往是项目的排序.针对这一问题,将排名学习领域的知识引入推荐算法,设计了一种基于评分矩阵局部低秩假设的成列协同排名算法.选择直接使用计算复杂度较低的成列损失函数来优化矩阵分解模型,并通过实验验证了其在运算速度上的显著提升.在3个实际推荐系统数据集上,与当下主流推荐算法的比较实验结果表明,该算法具有良好的性能.  相似文献   

7.
传统的协同过滤推荐算法存在数据稀疏情况下分类准确性低的问题,针对于此提出一种基于改进余弦相似度的协同过滤推荐算法,将数据经嵌入层转换为特征矩阵,将对其计算后得到的改进余弦相似度矩阵和单位矩阵之间的均方误差作为损失函数,从而提高推荐算法在数据稀疏情况下的分类准确性。实验结果表明,该算法的AUC和对数损失函数指标均优于基线模型FM、FFM和DeepFM模型。  相似文献   

8.
针对传统协同过滤算法中面临稀疏项目评分矩阵计算耗时不准确、同等对待不同时间段用户的项目评分这些影响推荐精度的问题,提出了基于项目聚类和评分的时间加权协同过滤推荐算法(TCF).该算法将项目评分与项目属性特征综合相似度高的聚到一个类别里,能有效解决数据稀疏性问题,降低生成最近邻居集合时间.引入时间加权函数赋予项目评分按时间递减的权重,根据加权后的评分寻找目标用户的最近邻居集合.实验从平均绝对误差、平均排序分和命中率三个指标来表明改进算法能有效提高推荐的准确性.  相似文献   

9.
已知的面向排序的协同过滤算法主要有两个缺点:计算用户相似度时只考虑用户对同一产品对的偏好是否一致,而忽略了用户对产品对的偏好程度以及该偏好在用户间的流行度; 进行偏好融合和排序时需要中间步骤来构建价值函数然后才能利用贪婪算法产生推荐列表。为解决上述问题: 我们利用类TF-IDF加权策略对用户的偏好程度及偏好流行度进行综合考量,使用加权的Kendall Tau相关系数计算用户间的相似度;进行偏好融合与排序时则使用基于投票的舒尔茨方法直接产生推荐列表。在两个电影数据集上,本文提出的算法在评测指标NDCG上的效果要明显优于其他流行的协同过滤算法。  相似文献   

10.
已知的面向排序的协同过滤算法主要有两个缺点:计算用户相似度时只考虑用户对同一产品对的偏好是否一致,而忽略了用户对产品对的偏好程度以及该偏好在用户间的流行度;进行偏好融合和排序时需要中间步骤来构建价值函数然后才能利用贪婪算法产生推荐列表。为解决上述问题:我们利用类TF-IDF加权策略对用户的偏好程度及偏好流行度进行综合考量,使用加权的Kendall Tau相关系数计算用户间的相似度;进行偏好融合与排序时则使用基于投票的舒尔茨方法直接产生推荐列表。在两个电影数据集上,本文提出的算法在评测指标NDCG上的效果要明显优于其他流行的协同过滤算法。  相似文献   

11.
排序学习是当前信息检索领域研究热点之一。为了避免训练集中噪音的影响,当前排序学习算法较多关注鲁棒性。已有的工作发现相同的排序学习方法的性能在不同的数据集上会有截然不同的噪音敏感度。模型改变是导致性能下降的直接原因,而模型又是从训练集学习到的,因此根源在于训练数据的某些特性。该文根据具体排序学习场景分析得出影响噪音敏感度的根本原因在于训练集中文档对分布的结论,并在LETOR3.0上的实验验证了这一结论。  相似文献   

12.
Learning from imperfect (noisy) information sources is a challenging and reality issue for many data mining applications. Common practices include data quality enhancement by applying data preprocessing techniques or employing robust learning algorithms to avoid developing overly complicated structures that overfit the noise. The essential goal is to reduce noise impact and eventually enhance the learners built from noise-corrupted data. In this paper, we propose a novel corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. C2 differs from existing classifier ensembling or robust learning algorithms in two aspects. On one hand, a set of diverse base learners of C2 constituting the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce noise impact. Being corrective, the classifier ensemble is built from data preprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons demonstrate that C2 is not only more accurate than the learner built from original noisy sources, but also more reliable than Bagging [4] or aggressive classifier ensemble (ACE) [56], which are two degenerated components/variants of C2. The comparisons also indicate that C2 is more stable than Boosting and DECORATE, which are two state-of-the-art ensembling methods. For real-world imperfect information sources (i.e. noisy training and/or test data), C2 is able to deliver more accurate and reliable prediction models than its other peers can offer.  相似文献   

13.
In temporal data analysis, noisy data is inevitable in both testing and training. This noise can seriously influence the performance of the temporal data analysis. To address this problem, we propose a novel method, termed Selective Temporal Filtering that builds a noise-free model for classification during training and identifies key-feature vectors that are noise-filtered data from the input sequence during testing. The use of these key-feature vectors makes the classifier robust to noise within the input space. The proposed method is validated on a synthetic-dataset and a database of American Sign Language. Using key-feature vectors results in robust performance with respect to the noise content. Futhermore, we are able to show that the proposed method not only outperforms Conditional Random Fields and Hidden Markov Models in noisy environments, but also in a well-controlled environment where we assume no significant noise vectors exist.  相似文献   

14.
In this paper, we make an effort to overcome the sensitivity of traditional clustering algorithms to noisy data points (noise and outliers). A novel pruning method, in terms of information theory, is therefore proposed to phase out noisy points for robust data clustering. This approach identifies and prunes the noisy points based on the maximization of mutual information against input data distributions such that the resulting clusters are least affected by noise and outliers, where the degree of robustness is controlled through a separate parameter to make a trade-off between rejection of noisy points and optimal clustered data. The pruning approach is general, and it can improve the robustness of many existing traditional clustering methods. In particular, we apply the pruning approach to improve the robustness of fuzzy c-means clustering and its extensions, e.g., fuzzy c-spherical shells clustering and kernel-based fuzzy c-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. The effectiveness of the proposed pruning approach is supported by experimental results.  相似文献   

15.
A robust convex optimization approach is proposed for support vector regression (SVR) with noisy input data. The data points are assumed to be uncertain, but bounded within given hyper-spheres of radius η. The proposed Robust SVR model is equivalent to a Second Order Cone Programming (SOCP) problem. SOCP formulation with Gaussian noise models assumption is discussed. Computational results are presented both on real world and synthetic data sets. The robust SOCP approach is compared with several other regression algorithms such as SVR, least-square SVR, and artificial neural networks by injecting Gaussian noise to each of the data points. The proposed approach out performs the other regression algorithms for some data sets. Moreover, the generalization behavior of the SOCP method is better than the traditional SVR with increasing the uncertainty level η until a threshold value.  相似文献   

16.
In machine learning, class noise occurs frequently and deteriorates the classifier derived from the noisy data set. This paper presents two promising classifiers for this problem based on a probabilistic model proposed by Lawrence and Schölkopf (2001). The proposed algorithms are able to tolerate class noise, and extend the earlier work of Lawrence and Schölkopf in two ways. First, we present a novel incorporation of their probabilistic noise model in the Kernel Fisher discriminant; second, the distribution assumption previously made is relaxed in our work. The methods were investigated on simulated noisy data sets and a real world comparative genomic hybridization (CGH) data set. The results show that the proposed approaches substantially improve standard classifiers in noisy data sets, and achieve larger performance gain in non-Gaussian data sets and small size data sets.  相似文献   

17.
为了减少协同过滤算法存在的噪音数据以及数据稀疏性问题,提高算法准确性,本文提出一种基于信息熵和改进相似度的协同过滤算法,使用用户信息熵模型来判断噪音数据,排除噪音数据对实验结果的干扰;使用面向稀疏数据的改进相似度计算方法,使用全部评分数据而不是依靠共同的评分项来计算,对缓解稀疏数据对推荐结果的精确性影响有很大帮助。实验结果表明,该算法能在一定程度上排除噪音数据对结果的影响,缓解数据稀疏对推荐结果精确性的干扰,提高该推荐算法的精确性,且缓解了传统推荐系统算法中常见的一些问题,与传统的协同过滤算法相比,该算法的精确性更高。  相似文献   

18.
Previous work on the one-class collaborative filtering (OCCF) problem can be roughly categorized into pointwise methods, pairwise methods, and content-based methods. A fundamental assumption of these approaches is that all missing values in the user-item rating matrix are considered negative. However, this assumption may not hold because the missing values may contain negative and positive examples. For example, a user who fails to give positive feedback about an item may not necessarily dislike it; he may simply be unfamiliar with it. Meanwhile, content-based methods, e.g. collaborative topic regression (CTR), usually require textual content information of the items, and thus their applicability is largely limited when the text information is not available. In this paper, we propose to apply the latent Dirichlet allocation (LDA) model on OCCF to address the above-mentioned problems. The basic idea of this approach is that items are regarded as words, users are considered as documents, and the user-item feedback matrix constitutes the corpus. Our model drops the strong assumption that missing values are all negative and only utilizes the observed data to predict a user’s interest. Additionally, the proposed model does not need content information of the items. Experimental results indicate that the proposed method outperforms previous methods on various ranking-oriented evaluation metrics. We further combine this method with a matrix factorization-based method to tackle the multi-class collaborative filtering (MCCF) problem, which also achieves better performance on predicting user ratings.  相似文献   

19.
对于基于关键词的图像检索,利用检索结果的视觉相似性学习二分类器有望成为改善检索结果的最有效途径之一. 为改善搜索引擎的搜索结果,本文提出一种算法框架并且基于此框架着重研究训练数据选择这一关键问题. 训练数据选择过程由两个阶段组成:1)训练数据初始化以开始分类器学习过程;2)分类器迭代学习过程中的动态数据选择. 对于初始训练数据的选择,我们探讨了基于聚类和基于排序两种方法,并且对比了自动训练数据选择与人工标注的结果. 对于动态数据选择,我们比较了支持向量机和基于最大最小后验伪概率的贝叶斯分类器的分类效果. 组合上述两个阶段的不同方法,我们得到了8种不同的算法,并将其用于谷歌搜索引擎进行基于关键词的图像检索. 实验结果证明,如何从含有噪声的搜索结果中选择训练数据是搜索结果改善的关键问题. 实验显示我们的方法能够有效的改善谷歌搜索的结果,尤其是排序在前的结果. 尽早为用户提供更相关的结果能够更大程度的减少用户逐个翻页查看结果的工作. 另外,如何使自动训练数据选择与人工标注媲美仍是需要继续研究的一个问题.  相似文献   

20.
目的 利用深度卷积神经网络(deep convolutional neural network,DCNN)构建的非开关型随机脉冲噪声(random-valued impulse noise,RVIN)降噪模型在降噪效果和执行效率上均比主流的开关型RVIN降噪算法更有优势,但在实际应用中,这类基于训练(数据驱动)的降噪模型,其性能却受制于能否对待降噪图像受噪声干扰的严重程度进行准确的测定(即存在数据依赖问题)。为此,提出了一种基于浅层卷积神经网络的快速RVIN噪声比例预测(noise ratio estimation,NRE)模型。方法 该预测模型的主要任务是检测待降噪图像中的噪声比例值并将其作为反映图像受噪声干扰严重程度的指标,依据NRE预测模型的检测结果可以自适应调用相应预先训练好的特定区间DCNN降噪模型,从而快速且高质量地完成图像降噪任务。结果 分别在10幅常用图像和50幅纹理图像两个测试集上进行测试,并与现有的主流RVIN降噪算法中的检测模块进行对比。在常用图像测试集上,本文所提出的NRE预测模型的预测准确性最高。相比于噪声比例预测精度排名第2的算法, NRE预测模型在噪声比例预测值均方根误差上低0.6% 2.4%。在50幅纹理图像测试集上,NRE模型的均方根误差波动范围最小,表明其稳定性最好。通过在1幅大小为512×512像素图像上的总体平均执行时间来比较各个算法执行效率的优劣,NRE模型执行时间仅为0.02 s。实验数据表明:所提出的NRE预测模型在受各种不同噪声比例干扰的自然图像上均可以快速而稳定地测定图像中受RVIN噪声干扰的严重程度,非盲的DCNN降噪模型与其联用后即可无缝地转化为盲降噪算法。结论 本文RVIN噪声比例预测模型在各个噪声比例下具有鲁棒的预测准确性,与基于DCNN的非开关型RVIN深度降噪模型配合使用后能妥善解决DCNN网络模型固有的数据依赖问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号