首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe an information filtering system using independent component analysis (ICA). A document–word matrix is generally sparse and has an ambiguity of synonyms. To solve this problem, we propose a method to use document vectors represented by independent components. An independent component generated by ICA is considered as a topic. In practice, we map the document vectors into a topics space. Since some independent components are useless for recommendation, we select the necessary components from all independent components by a maximum distance algorithm (MDA). Although Euclidean distance is usually used by MDA, we propose topic selection by cosine-distance-based MDA to solve the mismatch of similarities in information filtering. We create a user profile from the transformed data with a genetic algorithm (GA). Finally, we recommend documents with the user profile and evaluate the accuracy by imputation precision. We have carried out an evaluation experiment to confirm the practicality of the proposed method.This work was presented, in part, at the 9th International Symposium on Artificial Life and Robotics, Oita, Japan, January 28–30, 2004  相似文献   

2.
一种基于奇异值分解的双语信息过滤算法   总被引:1,自引:0,他引:1  
本文提出了一种基于SVD(奇异值分解)的双语信息过滤算法,将双语文档进行了统一的表示,使得适应于单语过滤的算法可以方便地用于双语过滤,同时对文档向量进行了压缩,滤去了噪声。在应用方面,将双语过滤算法用于互联网上的个性化主动信息过滤。  相似文献   

3.
随着社交网络的快速发展,用户在使用社交应用时会产生大量有价值的数据。通过对社交网络进行数据挖掘,发现隐藏在数据中关联用户与物品之间的偏好关系。然后对用户建模分析,选择合适的推荐引擎进行个性化物品推荐,这是一个非常有价值的研究方向。该文重点研究矩阵分解算法对处理大规模用户与物品评分矩阵的推荐效果,为了提高推荐的准确度展开了对用户社交关系和隐性反馈的研究,在组合预测模型中加入社交关系、人口统计学信息配置项、用户的消费记录等隐因子项,通过实验验证了扩展之后的混合预测模型在RMSE值上比SVD算法降低了0.259 475,在推荐性能有较大幅度的提高。  相似文献   

4.
Ontology-based user profile learning   总被引:4,自引:4,他引:0  
Personal agents gather information about users in a user profile. In this work, we propose a novel ontology-based user profile learning. Particularly, we aim to learn context-enriched user profiles using data mining techniques and ontologies. We are interested in knowing to what extent data mining techniques can be used for user profile generation, and how to utilize ontologies for user profile improvement. The objective is to semantically enrich a user profile with contextual information by using association rules, Bayesian networks and ontologies in order to improve agent performance. At runtime, we learn which the relevant contexts to the user are based on the user’s behavior observation. Then, we represent the relevant contexts learnt as ontology segments. The encouraging experimental results show the usefulness of including semantics into a user profile as well as the advantages of integrating agents and data mining using ontologies.  相似文献   

5.
电子商务环境下信息过滤中用户偏好调整算法   总被引:5,自引:0,他引:5  
徐博艺  姜丽红 《计算机工程》2001,27(10):102-104
对信息过滤过程进行了分析,包括定义用户偏好、接受信息输入流、过滤以及用户反馈环节。在此基础上,分析了网络环境下群体决策信息收集与过滤的特点,提出决策信息过滤中用户偏好生成及自适应调整算法。  相似文献   

6.
In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into account.  相似文献   

7.
个性化网络信息过滤Agent的反馈评价机制   总被引:3,自引:1,他引:3  
文章描述了信息过滤的作用,并介绍了一个基于Agent的万维网文档信息过滤系统。文中提出了个性化网络信息过滤Agent的结构及其实现方案,并讨论了用相关反馈评价机制更新用户兴趣模型的问题,建议用决策树从用户分类的文档集中学习用户的信息兴趣。  相似文献   

8.
This paper proposed a new improved method for back propagation neural network, and used an efficient method to reduce the dimension and improve the performance. The traditional back propagation neural network (BPNN) has the drawbacks of slow learning and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the learning phase evaluation back propagation neural network (LPEBP) to improve the traditional BPNN. We adopt a singular value decomposition (SVD) technique to reduce the dimension and construct the latent semantics between terms. Experimental results show that the LPEBP is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. The SVD technique cannot only greatly reduce the high dimensionality but also enhance the performance. So SVD is to further improve the document classification systems precisely and efficiently.  相似文献   

9.
Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

10.
《Information & Management》2016,53(8):978-986
With the rapid proliferation of Web 2.0, the identification of emotions embedded in user-contributed comments at the social web is both valuable and essential. By exploiting large volumes of sentimental text, we can extract user preferences to enhance sales, develop marketing strategies, and optimize supply chain for electronic commerce. Pieces of information in the social web are usually short, such as tweets, questions, instant messages, messages, and news headlines. Short text differs from normal text because of its sparse word co-occurrence patterns, which hampers efforts to apply social emotion classification models. Most existing methods focus on either exploiting the social emotions of individual words or the association of social emotions with latent topics learned from normal documents. In this paper, we propose a topic-level maximum entropy (TME) model for social emotion classification over short text. TME generates topic-level features by modeling latent topics, multiple emotion labels, and valence scored by numerous readers jointly. The overfitting problem in the maximum entropy principle is also alleviated by mapping the features to the concept space. An experiment on real-world short documents validates the effectiveness of TME on social emotion classification over sparse words.  相似文献   

11.
We tackle the problem of new users or documents in collaborative filtering. Generalization over users by grouping them into user groups is beneficial when a rating is to be predicted for a relatively new document having only few observed ratings. Analogously, generalization over documents improves predictions in the case of new users. We show that if either users and documents or both are new, two-way generalization becomes necessary. We demonstrate the benefits of grouping of users, grouping of documents, and two-way grouping, with artificial data and in two case studies with real data. We have introduced a probabilistic latent grouping model for predicting the relevance of a document to a user. The model assumes a latent group structure for both users and items. We compare the model against a state-of-the-art method, the User Rating Profile model, where only the users have a latent group structure. We compute the posterior of both models by Gibbs sampling. The Two-Way Model predicts relevance more accurately when the target consists of both new documents and new users. The reason is that generalization over documents becomes beneficial for new documents and at the same time generalization over users is needed for new users.  相似文献   

12.
商品评论挖掘在商品推荐领域取得了越来越多的成果。传统的评论挖掘方法只集中在挖掘评论中隐含的浅层语义,其语义表达效果不理想。因此,目前商品推荐领域的一大挑战是如何挖掘商品评论的深层语义,提升语义表达能力,以及最大化地利用商品评论来提升商品的推荐效果。文中使用深度学习中的跨思维向量模型(Skip-Thought Vectors,STV)来学习评论的潜在语义特征。为了提升评论的语义表达能力,把深度学习中的长短记忆模型(Long Short-Term Memory,LSTM)应用于STV,结合双向信息流挖掘方法、用户情感偏好挖掘方法以及深度层级模型,引入了一种深层语义特征挖掘模型。该模型不仅能挖掘评论的深层语义特征,还能挖掘发表评论的用户的情感偏好。然后,将深层语义特征挖掘模型与矩阵分解模型(Singular Value Decomposition,SVD)相结合来实现商品推荐。在两个亚马逊数据集上的实验结果证明,所提模型在深度语义挖掘能力上优于传统的评论挖掘模型,相比使用传统评论挖掘模型的商品推荐系统提升了商品推荐的效果。  相似文献   

13.
基于聚类分析策略的用户偏好挖掘   总被引:5,自引:0,他引:5  
利用训练文档集准确高效地挖掘隐藏的用户文本偏好和概念向量是文本信息过滤和多文档自动文摘等自然语言处理应用的关键技术之一。针对训练文本集中往往存在多个主题类别的问题,提出一种基于聚类分析策略的文本偏好挖掘方法。其基本思路是对训练文档集进行聚类处理,然后对同主题文档进行共性分析,并经过特征权值调整和特征约简,获得表示用户不同主题偏好的概念向量。实验结果表明该方法具有对用户的文本偏好刻画更加精确,对相关阈值变化不敏感等优点,可以与Rocchio等算法结合来进行用户兴趣建模。  相似文献   

14.
Adaptive Bayesian Latent Semantic Analysis   总被引:1,自引:0,他引:1  
Due to the vast growth of data collections, the statistical document modeling has become increasingly important in language processing areas. Probabilistic latent semantic analysis (PLSA) is a popular approach whereby the semantics and statistics can be effectively captured for modeling. However, PLSA is highly sensitive to task domain, which is continuously changing in real-world documents. In this paper, a novel Bayesian PLSA framework is presented. We focus on exploiting the incremental learning algorithm for solving the updating problem of new domain articles. This algorithm is developed to improve document modeling by incrementally extracting up-to-date latent semantic information to match the changing domains at run time. By adequately representing the priors of PLSA parameters using Dirichlet densities, the posterior densities belong to the same distribution so that a reproducible prior/posterior mechanism is activated for incremental learning from constantly accumulated documents. An incremental PLSA algorithm is constructed to accomplish the parameter estimation as well as the hyperparameter updating. Compared to standard PLSA using maximum likelihood estimate, the proposed approach is capable of performing dynamic document indexing and modeling. We also present the maximum a posteriori PLSA for corrective training. Experiments on information retrieval and document categorization demonstrate the superiority of using Bayesian PLSA methods.  相似文献   

15.
随着互联网和移动应用平台的快速发展,围绕移动应用所产生的海量用户数据已经成为精确分析用户需求偏好的重要数据源.尽管已有不少学者从这些数据中分析和挖掘用户需求,但现有的方法通常只研究了数据的少数维度的特征,未能有效地挖掘多维移动应用信息以及他们之间的关联.提出一种基于元路径嵌入的移动应用需求偏好分析方法,能够为用户进行个性化移动应用推荐.具体地,首先分析移动应用的文本信息中的语义主题,挖掘用户需求偏好的分析维度.其次,将移动应用信息的语义特征构建了一个融合移动应用多维信息的概念模型,涵盖了能够表征用户需求偏好的多维度数据.基于概念模型的语义,设计了一组有意义的元路径集合,以精确地捕捉用户需求偏好的语义.最后,通过使用元路径嵌入技术进行用户行为画像,进而实现个性化的移动应用推荐.使用苹果应用商店包括1507个移动应用和153501条用户评论的真实数据集进行实验评估.实验结果表明所提的方法在各指标上均优于现有模型,其中平均F1值提升0.02,平均归一化折损累计增益(normalized discounted cumulative gain,NDCG)提升0.1.  相似文献   

16.
PersonalTV     
This paper presents an approach to build a TV recommendation system called PersonalTV that enables the use of multiple classifiers, each one specialized on selected attributes of detailed program information. For generating adequate recommendations, the system makes use of content filtering and the preferences directly specified by the user within an MPEG-7 profile. By tracking user actions and interpreting their semantics, the system is able to individually weight these actions and dynamically adjusts the process to the user’s evolving preferences. We show how specialized spam fighting methods can successfully be transferred to the area of recommendation systems and adapted accordingly. Being lightweight, these methods are especially applicable in resource-constrained environments such as TV set-top boxes or mobile devices. Moreover, the use of the variance of the beta-distribution as a confidence value for each recommendation is presented.  相似文献   

17.
The aim in information filtering is to provide users with a personalised selection of information, based on their interest profile. In adaptive information filtering, this profile partially or completely acquired by automatic means. This paper investigates if profile generation can be partially acquired by automatic methods and partially by direct user involvement. The issue is explored through an empirical study of a simulated filtering system that mixes automatic and manual profile generation. The study covers several issues involved in mixed control. The first issue concerns if a machine-learned profile can provide better filtering performance if generated from an initial explicit user profile. The second issue concerns if user involvement can improve on a system-generated or adapted profile. Finally, the relationship between filtering performance and user ratings is investigated. In this particular study the initial setup of a personal profile was effective and yielded performance improvements that persisted after substantiate training. However, the study showed no correlation between users’ ratings of profiles and profile filtering performance, and only weak indications that users could improve profiles that already had been trained on feedback.  相似文献   

18.
基于概念扩充的文本过滤模型   总被引:8,自引:1,他引:7  
该文在介绍文本过滤的背景及向量空间模型的同时,提出了基于语义词典对用户模板进行扩充的文本过滤模型,该模型首先对文本进行分析,把文本表示成向量空间中的向量形式,在形成用户初始模板之后,对用户模板进行同义词扩充,形成扩充后的用户模板,以此模板来进行文本过滤。在用户反馈的基础上,自适应地修改该模板,以适应用户变化的需求及改善系统过滤性能。实验表明,这样的确可以提高系统覆盖面,提高系统效率。  相似文献   

19.
20.
基于协同过滤Attention机制的情感分析模型   总被引:1,自引:0,他引:1  
该文主要研究在评论性数据中用户个性及产品信息对数据情感类别的影响。在影响数据情感类型的众多因素中,该文认为评价的主体即用户以及被评价的对象等信息对评论数据的情感至关重要。该文提出一种基于协同过滤Attention机制的情感分析方法(LSTM-CFA),使用协同过滤(CF)算法计算出用户兴趣分布矩阵,再将矩阵利用SVD分解后加入层次LSTM模型,作为模型注意力机制提取文档特征、实现情感分类。实验表明LSTM-CFA方法能够高效提取用户个性与产品属性信息,显著提升了情感分类的准确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号