首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Collaborative filtering (CF) is a technique commonly used for personalized recommendation and Web service quality-of-service (QoS) prediction. However, CF is vulnerable to shilling attackers who inject fake user profiles into the system. In this paper, we first present the shilling attack problem on CF-based QoS recommender systems for Web services. Then, a robust CF recommendation approach is proposed from a user similarity perspective to enhance the resistance of the recommender systems to the shilling attack. In the approach, the generally used similarity measures are analyzed, and the DegSim (the degree of similarities with top k neighbors) with those measures is selected for grouping and weighting the users. Then, the weights are used to calculate the service similarities/differences and predictions.We analyzed and evaluated our algorithms using WS-DREAM and Movielens datasets. The experimental results demonstrate that shilling attacks influence the prediction of QoS values, and our proposed features and algorithms achieve a higher degree of robustness against shilling attacks than the typical CF algorithms.  相似文献   

2.
Accurate classification of microarray data plays a vital role in cancer prediction and diagnosis. Previous studies have demonstrated the usefulness of naïve Bayes classifier in solving various classification problems. In microarray data analysis, however, the conditional independence assumption embedded in the classifier itself and the characteristics of microarray data, e.g. the extremely high dimensionality, may severely affect the classification performance of naïve Bayes classifier. This paper presents a sequential feature extraction approach for naïve Bayes classification of microarray data. The proposed approach consists of feature selection by stepwise regression and feature transformation by class-conditional independent component analysis. Experimental results on five microarray datasets demonstrate the effectiveness of the proposed approach in improving the performance of naïve Bayes classifier in microarray data analysis.  相似文献   

3.
针对支持向量机方法在标记用户数据不充分的情况下无法有效实现托攻击检测的不足,提出一种基于SVM-KNN的半监督托攻击检测方法。根据少量标记用户数据训练一个初始SVM分类器,利用初始SVM对大量未标记用户数据进行分类,挑选出分类边界附近有可能成为支持向量的样本点,利用KNN分类器优化边界向量的标记质量,再将重新标注过的边界向量融入训练集,迭代训练逐步改善SVM的分类边界,最终获得系统决策函数。实验结果表明在标记用户数据较少的情况下,方法能有效提高托攻击的检测精度和效率,具有较强的推广能力。  相似文献   

4.
Previous studies have shown that the classification accuracy of a Naïve Bayes classifier in the domain of text-classification can often be improved using binary decompositions such as error-correcting output codes (ECOC). The key contribution of this short note is the realization that ECOC and, in fact, all class-based decomposition schemes, can be efficiently implemented in a Naïve Bayes classifier, so that—because of the additive nature of the classifier—all binary classifiers can be trained in a single pass through the data. In contrast to the straight-forward implementation, which has a complexity of O(n?t?g), the proposed approach improves the complexity to O((n+t)?g). Large-scale learning of ensemble approaches with Naïve Bayes can benefit from this approach, as the experimental results shown in this paper demonstrate.  相似文献   

5.
Naïve Bayes learners are widely used, efficient, and effective supervised learning methods for labeled datasets in noisy environments. It has been shown that naïve Bayes learners produce reasonable performance compared with other machine learning algorithms. However, the conditional independence assumption of naïve Bayes learning imposes restrictions on the handling of real-world data. To relax the independence assumption, we propose a smooth kernel to augment weights for the likelihood estimation. We then select an attribute weighting method that uses the mutual information metric to cooperate with the proposed framework. A series of experiments are conducted on 17 UCI benchmark datasets to compare the accuracy of the proposed learner against that of other methods that employ a relaxed conditional independence assumption. The results demonstrate the effectiveness and efficiency of our proposed learning algorithm. The overall results also indicate the superiority of attribute-weighting methods over those that attempt to determine the structure of the network.  相似文献   

6.
Comparison of generative and discriminative classifiers is an ever-lasting topic. As an important contribution to this topic, based on their theoretical and empirical comparisons between the naïve Bayes classifier and linear logistic regression, Ng and Jordan (NIPS 841–848, 2001) claimed that there exist two distinct regimes of performance between the generative and discriminative classifiers with regard to the training-set size. In this paper, our empirical and simulation studies, as a complement of their work, however, suggest that the existence of the two distinct regimes may not be so reliable. In addition, for real world datasets, so far there is no theoretically correct, general criterion for choosing between the discriminative and the generative approaches to classification of an observation x into a class y; the choice depends on the relative confidence we have in the correctness of the specification of either p(y|x) or p(x, y) for the data. This can be to some extent a demonstration of why Efron (J Am Stat Assoc 70(352):892–898, 1975) and O’Neill (J Am Stat Assoc 75(369):154–160, 1980) prefer normal-based linear discriminant analysis (LDA) when no model mis-specification occurs but other empirical studies may prefer linear logistic regression instead. Furthermore, we suggest that pairing of either LDA assuming a common diagonal covariance matrix (LDA-Λ) or the naïve Bayes classifier and linear logistic regression may not be perfect, and hence it may not be reliable for any claim that was derived from the comparison between LDA-Λ or the naïve Bayes classifier and linear logistic regression to be generalised to all generative and discriminative classifiers.  相似文献   

7.
Collaborative and content-based filtering are the recommendation techniques most widely adopted to date. Traditional collaborative approaches compute a similarity value between the current user and each other user by taking into account their rating style, that is the set of ratings given on the same items. Based on the ratings of the most similar users, commonly referred to as neighbors, collaborative algorithms compute recommendations for the current user. The problem with this approach is that the similarity value is only computable if users have common rated items. The main contribution of this work is a possible solution to overcome this limitation. We propose a new content-collaborative hybrid recommender which computes similarities between users relying on their content-based profiles, in which user preferences are stored, instead of comparing their rating styles. In more detail, user profiles are clustered to discover current user neighbors. Content-based user profiles play a key role in the proposed hybrid recommender. Traditional keyword-based approaches to user profiling are unable to capture the semantics of user interests. A distinctive feature of our work is the integration of linguistic knowledge in the process of learning semantic user profiles representing user interests in a more effective way, compared to classical keyword-based profiles, due to a sense-based indexing. Semantic profiles are obtained by integrating machine learning algorithms for text categorization, namely a naïve Bayes approach and a relevance feedback method, with a word sense disambiguation strategy based exclusively on the lexical knowledge stored in the WordNet lexical database. Experiments carried out on a content-based extension of the EachMovie dataset show an improvement of the accuracy of sense-based profiles with respect to keyword-based ones, when coping with the task of classifying movies as interesting (or not) for the current user. An experimental session has been also performed in order to evaluate the proposed hybrid recommender system. The results highlight the improvement in the predictive accuracy of collaborative recommendations obtained by selecting like-minded users according to user profiles.  相似文献   

8.
An increasing number of computational and statistical approaches have been used for text classification, including nearest-neighbor classification, naïve Bayes classification, support vector machines, decision tree induction, rule induction, and artificial neural networks. Among these approaches, naïve Bayes classifiers have been widely used because of its simplicity. Due to the simplicity of the Bayes formula, the naïve Bayes classification algorithm requires a relatively small number of training data and shorter time in both the training and classification stages as compared to other classifiers. However, a major short coming of this technique is the fact that the classifier will pick the highest probability category as the one to which the document is annotated too. Doing this is tantamount to classifying using only one dimension of a multi-dimensional data set. The main aim of this work is to utilize the strengths of the self organizing map (SOM) to overcome the inadvertent dimensionality reduction resulting from using only the Bayes formula to classify. Combining the hybrid system with new ranking techniques further improves the performance of the proposed document classification approach. This work describes the implementation of an enhanced hybrid classification approach which affords a better classification accuracy through the utilization of two familiar algorithms, the naïve Bayes classification algorithm which is used to vectorize the document using a probability distribution and the self organizing map (SOM) clustering algorithm which is used as the multi-dimensional unsupervised classifier.  相似文献   

9.
The abundance of unlabelled data alongside limited labelled data has provoked significant interest in semi-supervised learning methods. “Naïve labelling” refers to the following simple strategy for using unlabelled data in on-line classification. A new data point is first labelled by the current classifier and then added to the training set together with the assigned label. The classifier is updated before seeing the subsequent data point. Although the danger of a run-away classifier is obvious, versions of naïve labelling pervade in on-line adaptive learning. We study the asymptotic behaviour of naïve labelling in the case of two Gaussian classes and one variable. The analysis shows that if the classifier model assumes correctly the underlying distribution of the problem, naïve labelling will drive the parameters of the classifier towards their optimal values. However, if the model is not guessed correctly, the benefits are outweighed by the instability of the labelling strategy (run-away behaviour of the classifier). The results are based on exact calculations of the point of convergence, simulations, and experiments with 25 real data sets. The findings in our study are consistent with concerns about general use of unlabelled data, flagged up in the recent literature.  相似文献   

10.
托攻击是当前推荐系统面临的严峻挑战之一。由于推荐系统的开放性,恶意用户可轻易对其注入精心设计的评分从而影响推荐结果,降低用户体验。基于属性优化结构化噪声矩阵补全技术,提出一种鲁棒的抗托攻击个性化推荐(SATPR)算法,将攻击评分视为评分矩阵中的结构化行噪声并采用L2,1范数进行噪声建模,同时引入用户与物品的属性特征以提高托攻击检测精度。实验表明,SATPR算法在托攻击下可取得比传统推荐算法更精确的个性化评分预测效果。  相似文献   

11.
In a large number of experimental problems, high dimensionality of the search area and economical constraints can severely limit the number of experimental points that can be tested. Within these constraints, classical optimization techniques perform poorly, in particular, when little a priori knowledge is available. In this work we investigate the possibility of combining approaches from statistical modeling and bio-inspired algorithms to effectively explore a huge search space, sampling only a limited number of experimental points. To this purpose, we introduce a novel approach, combining ant colony optimization (ACO) and naïve Bayes classifier (NBC) that is, the naïve Bayes ant colony optimization (NACO) procedure. We compare NACO with other similar approaches developing a simulation study. We then derive the NACO procedure with the goal to design artificial enzymes with no sequence homology to the extant one. Our final aim is to mimic the natural fold of 200 amino acids 1AGY serine esterase from Fusarium solani.  相似文献   

12.
It is well known that call centers suffer from high levels of employee turnover; however, call centers are services that have excellent operational records of telemarketing activities performed by each employee. With this information, we propose to use the Random Forest and the naïve Bayes algorithms to build classifiers and predict turnover of the sales agents. The results of 2407 sales agents’ operational performance records showed that, although the naïve Bayes is much simpler than Random Forest, both classifiers performed similarly, achieving interesting accuracy rates in turnover prediction. Moreover, evidence was found that incorporating performance differences over time increases significantly the accuracy of the predictive models up to 85%, with the naïve Bayes being quite competitive with the Random Forest classifier when the amount of information is increased. The results obtained in this study could be useful for management decision-making to monitor and identify potential turnover due to poor performance, and therefore, to take a preventive action.  相似文献   

13.
Collaborative filtering (CF), the most successful and widely used technique, recommends items based on the preferences of similar users. The main potentials of CF are its cross‐genre recommendation ability, and that it is completely independent of representation of the items being recommended. However, CF suffers from sparsity and cold start problems. On the other hand, a highly effective variant of content‐based filtering (CBF), reclusive methods (RMs) based on the preference of the single individual for whom recommendations to be made, provides a methodology that considers uncertainty and the multivalued nature of item features as well as user preferences in a content‐based framework using fuzzy logic approaches. The adoption of RM paradigm has several advantages when compared to CF such as sparsity and new item problem, but it suffers from overspecialization and limited content analysis. In view of the complementary nature of CF and RM, we develop a hybrid recommender system (RS) that helps in alleviating aforementioned problems in each approach. First, we propose fuzzy naïve Bayesian classifier based CF (FNB‐CF) and RM (FNB‐RM) for handling correlation‐based similarity problems. To overcome individual weaknesses of FNB‐CF and FNB‐RM, we develop a hybrid RS, FNB‐CF‐RM. Effectiveness of our proposed hybrid RS is demonstrated through experimental results using the MovieLens and IMDb data sets.  相似文献   

14.
Recommender systems (RS) have been found supportive and practical in e-commerce and been established as useful aiding services. Despite their great adoption in the user communities, RS are still vulnerable to unscrupulous producers who try to promote their products by shilling the systems. With the advent of social networks new sources of information have been made available which can potentially render RS more resistant to attacks. In this paper we explore the information provided in the form of social links with clustering for diminishing the impact of attacks. We propose two algorithms, CluTr and WCluTr, to combine clustering with "trust" among users. We demonstrate that CluTr and WCluTr enhance the robustness of RS by experimentally evaluating them on data from a public consumer recommender system Epinions.com.  相似文献   

15.
基于数据非随机缺失机制的推荐系统托攻击探测   总被引:1,自引:0,他引:1  
李聪  骆志刚 《自动化学报》2013,39(10):1681-1690
协同过滤推荐系统极易受到托攻击的侵害. 开发托攻击探测技术已成为保障推荐系统可靠性与鲁棒性的关键. 本文以数据非随机缺失机制为依托,对导致评分缺失的潜在因素进行解析, 并在概率产生模型框架内将这些潜在因素与Dirichlet过程相融合, 提出了用于托攻击探测的缺失评分潜在因素分析(Latent factor analysis for missing ratings, LFAMR)模型. 实验表明,与现有探测技术相比, LFAMR具备更强的普适性和无监督性, 即使缺乏系统相关先验知识,仍可有效探测各种常见托攻击.  相似文献   

16.
黄光球  刘嘉飞 《计算机工程》2012,38(5):25-29,34
提出一种基于记忆原理的推荐系统托攻击检测模型。利用短时记忆元和长时记忆元所描述的记忆增强和衰减规律,以及这2种记忆元与综合记忆元的联系,对托攻击进行检测。该模型的特征记忆库可及时更新,由此节省系统开销。实验结果证明,基于该模型的推荐系统具有较高的托攻击检测正确率。  相似文献   

17.
We present a simple and yet effective approach for document classification to incorporate rationales elicited from annotators into the training of any off-the-shelf classifier. We empirically show on several document classification datasets that our classifier-agnostic approach, which makes no assumptions about the underlying classifier, can effectively incorporate rationales into the training of multinomial naïve Bayes, logistic regression, and support vector machines. In addition to being classifier-agnostic, we show that our method has comparable performance to previous classifier-specific approaches developed for incorporating rationales and feature annotations. Additionally, we propose and evaluate an active learning method tailored specifically for the learning with rationales framework.  相似文献   

18.
This paper proposes an approach that detects surface defects with three-dimensional characteristics on scale-covered steel blocks. The surface reflection properties of the flawless surface changes strongly. Light sectioning is used to acquire the surface range data of the steel block. These sections are arbitrarily located within a range of a few millimeters due to vibrations of the steel block on the conveyor. After the recovery of the depth map, segments of the surface are classified according to a set of extracted features by means of Bayesian network classifiers. For establishing the structure of the Bayesian network, a floating search algorithm is applied, which achieves a good tradeoff between classification performance and computational efficiency for structure learning. This search algorithm enables conditional exclusions of previously added attributes and/or arcs from the network. The experiments show that the selective unrestricted Bayesian network classifier outperforms the naïve Bayes and the tree-augmented naïve Bayes decision rules concerning the classification rate. More than 98% of the surface segments have been classified correctly.  相似文献   

19.
基于用户声誉的鲁棒协同推荐算法   总被引:2,自引:0,他引:2  
随着推荐系统在电子商务界的快速发展以及取得的巨大经济收益, 有目的性的托攻击是目前协同过滤系统面临的重大安全威胁, 研究一种可抵御攻击的鲁棒推荐技术已成为目前推荐系统领域的重要课题.本文利用历史记录得到用户声誉, 建立声誉推荐系统, 并结合协同过滤推荐领域内的隐语义模型, 提出基于用户声誉的隐语义模型鲁棒协同算法.本文提出的算法从人为攻击和自然噪声两个方面对系统的鲁棒性进行了改善.在真实的数据集 Movielens 1M 上的实验表明, 与现有的鲁棒性推荐算法相比, 这种算法具有形式简单、可解释性强、稳定的特点, 且在精度得到一定提升的情况下大大增强了系统抵御攻击的能力.  相似文献   

20.
Socially important locations are places that are frequently visited by social media users in their social media life. Discovering socially interesting, popular or important locations from a location based social network has recently become important for recommender systems, targeted advertisement applications, and urban planning, etc. However, discovering socially important locations from a social network is challenging due to the data size and variety, spatial and temporal dimensions of the datasets, the need for developing computationally efficient approaches, and the difficulty of modeling human behavior. In the literature, several studies are conducted for discovering socially important locations. However, majority of these studies focused on discovering locations without considering historical data of social media users. They focused on analysis of data of social groups without considering each user’s preferences in these groups. In this study, we proposed a method and interest measures to discover socially important locations that consider historical user data and each user’s (individual’s) preferences. The proposed algorithm was compared with a naïve alternative using real-life Twitter dataset. The results showed that the proposed algorithm outperforms the naïve alternative.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号