首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

A user of a recommender system is more likely to be satisfied by one or more of the recommendations if each individual recommendation is relevant to her but additionally if the set of recommendations is diverse. The most common approach to recommendation diversification uses re-ranking: the recommender system scores a set of candidate items for relevance to the user; it then re-ranks the candidates so that the subset that it will recommend achieves a balance between relevance and diversity. Ordinarily, we expect a trade-off between relevance and diversity: the diversity of the set of recommendations increases by including items that have lower relevance scores but which are different from the items already in the set. In early work, the diversity of a set of recommendations was given by the average of their distances from one another, according to some semantic distance metric defined on item features such as movie genres. More recent intent-aware diversification methods formulate diversity in terms of coverage and relevance of aspects. The aspects are most commonly defined in terms of item features. By trying to ensure that the aspects of a set of recommended items cover the aspects of the items in the user’s profile, the level of diversity is more personalized. In offline experiments on pre-collected datasets, intent-aware diversification using item features as aspects sometimes defies the relevance/diversity trade-off: there are configurations in which the recommendations exhibits increases in both relevance and diversity. In this paper, we present a new form of intent-aware diversification, which we call SPAD (Subprofile-Aware Diversification), and a variant called RSPAD (Relevance-based SPAD). In SPAD, the aspects are not item features; they are subprofiles of the user’s profile. We present and compare a number of different ways to extract subprofiles from a user’s profile. None of them is defined in terms of item features. Therefore, SPAD is useful even in domains where item features are not available or are of low quality. On three pre-collected datasets from three different domains (movies, music artists and books), we compare SPAD and RSPAD to intent-aware methods in which aspects are item features. We find on these datasets that SPAD and RSPAD suffer even less from the relevance/diversity trade-off: across all three datasets, they increase both relevance and diversity for even more configurations than other approaches to diversification. Moreover, we find that SPAD and RSPAD are the most accurate systems across all three datasets.

  相似文献   

2.
Current recommender systems attempt to identify appealing items for a user by applying syntactic matching techniques, which suffer from significant limitations that reduce the quality of the offered suggestions. To overcome this drawback, we have developed a domain-independent personalization strategy that borrows reasoning techniques from the Semantic Web, elaborating recommendations based on the semantic relationships inferred between the user’s preferences and the available items. Our reasoning-based approach improves the quality of the suggestions offered by the current personalization approaches, and greatly reduces their most severe limitations. To validate these claims, we have carried out a case study in the Digital TV field, in which our strategy selects TV programs interesting for the viewers from among the myriad of contents available in the digital streams. Our experimental evaluation compares the traditional approaches with our proposal in terms of both the number of TV programs suggested, and the users’ perception of the recommendations. Finally, we discuss concerns related to computational feasibility and scalability of our approach.  相似文献   

3.
《Information and Computation》2006,204(8):1264-1294
The paper deals with the following problem: is returning to wrong conjectures necessary to achieve full power of algorithmic learning? Returning to wrong conjectures complements the paradigm of U-shaped learning when a learner returns to old correct conjectures. We explore our problem for classical models of learning in the limit from positive data: explanatory learning (when a learner stabilizes in the limit on a correct grammar) and behaviourally correct learning (when a learner stabilizes in the limit on a sequence of correct grammars representing the target concept). In both cases we show that returning to wrong conjectures is necessary to achieve full learning power. In contrast, one can modify learners (without losing learning power) such that they never show inverted U-shaped learning behaviour, that is, never return to old wrong conjecture with a correct conjecture in-between. Furthermore, one can also modify a learner (without losing learning power) such that it does not return to old “overinclusive” conjectures containing non-elements of the target language. We also consider our problem in the context of vacillatory learning (when a learner stabilizes on a finite number of correct grammars) and show that each of the following four constraints is restrictive (that is, reduces learning power): the learner does not return to old wrong conjectures; the learner is not inverted U-shaped; the learner does not return to old overinclusive conjectures; the learner does not return to old overgeneralizing conjectures. We also show that learners that are consistent with the input seen so far can be made decisive: on any text, they do not return to any old conjectures—wrong or right.  相似文献   

4.
《Knowledge》2000,13(5):285-296
Machine-learning techniques play the important roles for information filtering. The main objective of machine-learning is to obtain users' profiles. To decrease the burden of on-line learning, it is important to seek suitable structures to represent user information needs. This paper proposes a model for information filtering on the Web. The user information need is described into two levels in this model: profiles on category level, and Boolean queries on document level. To efficiently estimate the relevance between the user information need and documents, the user information need is treated as a rough set on the space of documents. The rough set decision theory is used to classify the new documents according to the user information need. In return for this, the new documents are divided into three parts: positive region, boundary region, and negative region. An experimental system JobAgent is also presented to verify this model, and it shows that the rough set based model can provide an efficient approach to solve the information overload problem.  相似文献   

5.
In the era of big data, the vast majority of the data are not from the surface Web, the Web that is interconnected by hyperlinks and indexed by most general purpose search engines. Instead, the trove of valuable data often reside in the deep Web, the Web that is hidden behind query interfaces. Since numerous applications, like data integration and vertical portals, require deep Web data, various crawling methods were developed for exhaustively harvesting a deep Web data source with the minimal (or near-minimal) cost. Most existing crawling methods assume that all the documents matched by queries are returned. In practice, data sources often return the top k matches. This makes exhaustive data harvesting difficult: highly ranked documents will be returned multiple times, while documents ranked low have small chance being returned. In this paper, we decompose this problem into two orthogonal sub-problems, i.e., query and ranking bias problems, and propose a document frequency based crawling method to overcome the ranking bias problem. The rational of our method is to use the queries whose document frequencies are within the specified range to avoid the effect of search ranking plus return limit and significantly reduce the difficulty of crawling ranked data source. The method is extensively tested on a variety of datasets and compared with two existing methods. The experimental result demonstrates that our method outperforms the two algorithms by 58 % and 90 % on average respectively.  相似文献   

6.
The presence of a large number of degrees of freedom enables redundant manipulators to have some desirable features like reaching difficult areas and avoiding obstacles. These manipulators in the form of In-Vivo robots will enhance the dexterity and capacity of a surgeon to explore the internal cavity when inserted in the existing tool channel of the endoscope to take a biopsy from the stomach. This paper presents a simple geometric approach, to solve the problem of multiple inverse kinematic solutions of redundant manipulators, to find a single optimum solution and to easily switch from one solution to another depending upon the path and the environment. A simulation model of this approach has been developed and experiments have been conducted on the In-Vivo robot to judge its effectiveness.  相似文献   

7.
In the last few years, social media systems have experienced a fast growth. The amount of content shared in these systems increases fast, leading users to face the well known “interaction overload” problem, i.e., they are overwhelmed by content, so it becomes difficult to come across interesting items. To overcome this problem, social recommender systems have been recently designed and developed in order to filter content and recommend to users only interesting items. This type of filtering is usually affected by the “over-specialization” problem, which is related to recommendations that are too similar to the items already considered by the users. This paper proposes a friend recommender system that operates in the social bookmarking application domain and is based on behavioral data mining, i.e., on the exploitation of the users activity in a social bookmarking system. Experimental results show how this type of mining is able to produce accurate friend recommendations, allowing users to get to know bookmarked resources that are both novel and serendipitous. Using this approach, the impact of the “interaction overload” and the “over-specialization” problems is strongly reduced.  相似文献   

8.
Recommender systems usually employ techniques like collaborative filtering for providing recommendations on items/services. Maximum Margin Matrix Factorization (MMMF) is an effective collaborative filtering approach. MMMF suffers from the data sparsity problem, i.e., the number of items rated by the users are very small as compared to the very large item space. Recently, techniques like cross-domain collaborative filtering (transfer learning) is suggested for addressing the data sparsity problem. In this paper, we propose a model for transfer learning in collaborative filtering through MMMF to address the data sparsity issue. The latent feature matrices involved in MMMF are clustered and combined to generate a cluster-level rating pattern called codebook and a codebook transfer is used for transfer of information. Transferring of codebook and finding the predicted rating matrix is done in a novel way by introducing a softness constraint into the optimization function. We have experimented our methods with different levels of sparsity using benchmark datasets. Results from experiments show that our model approximates the target matrix well.  相似文献   

9.
Many studies on developing technologies have been published as articles, papers, or patents. We use and analyze these documents to find scientific and technological trends. In this paper, we consider document clustering as a method of document data analysis. In general, we have trouble analyzing documents directly because document data are not suitable for statistical and machine learning methods of analysis. Therefore, we have to transform document data into structured data for analytical purposes. For this process, we use text mining techniques. The structured data are very sparse, and hence, it is difficult to analyze them. This study proposes a new method to overcome the sparsity problem of document clustering. We build a combined clustering method using dimension reduction and K-means clustering based on support vector clustering and Silhouette measure. In particular, we attempt to overcome the sparseness in patent document clustering. To verify the efficacy of our work, we first conduct an experiment using news data from the machine learning repository of the University of California at Irvine. Second, using patent documents retrieved from the United States Patent and Trademark Office, we carry out patent clustering for technology forecasting.  相似文献   

10.
11.
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of automatically creating a compressed version of a given document that provides useful information to users, and multi-document summarization is to produce a summary delivering the majority of information content from a set of documents about an explicit or implicit main topic. In our study we focus on sentence based extractive document summarization. We propose the generic document summarization method which is based on sentence clustering. The proposed approach is a continue sentence-clustering based extractive summarization methods, proposed in Alguliev [Alguliev, R. M., Aliguliyev, R. M., Bagirov, A. M. (2005). Global optimization in the summarization of text documents. Automatic Control and Computer Sciences 39, 42–47], Aliguliyev [Aliguliyev, R. M. (2006). A novel partitioning-based clustering method and generic document summarization. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI–IAT 2006 Workshops) (WI–IATW’06), 18–22 December (pp. 626–629) Hong Kong, China], Alguliev and Alyguliev [Alguliev, R. M., Alyguliev, R. M. (2007). Summarization of text-based documents with a determination of latent topical sections and information-rich sentences. Automatic Control and Computer Sciences 41, 132–140] Aliguliyev, [Aliguliyev, R. M. (2007). Automatic document summarization by sentence extraction. Journal of Computational Technologies 12, 5–15.]. The purpose of present paper to show, that summarization result not only depends on optimized function, and also depends on a similarity measure. The experimental results on an open benchmark datasets from DUC01 and DUC02 show that our proposed approach can improve the performance compared to sate-of-the-art summarization approaches.  相似文献   

12.
传统的推荐算法多以优化推荐列表的精确度为目标,而忽略了推荐算法的另一个重要指标——多样性。提出了一种新的提高推荐列表多样性的方法。该方法将列表生成步骤转换为N次概率选择过程,每次概率选择通过两个步骤完成:类型选择与项目选择。在类型选择中,引入项目的类型信息,根据用户对不同项目类型的喜好计算概率矩阵,并依照该概率矩阵选择一个类型;在项目选择中,根据项目的预测评分、项目的历史流行度、项目的推荐流行度3个因素重新计算项目的最终得分,选择得分最高的项目推荐给用户。通过阈值TR来调节多样性与精确度之间的折中。最后,通过对比实验证明了该方法的有效性。  相似文献   

13.
Clustering XML documents is extensively used to organize large collections of XML documents in groups that are coherent according to structure and/or content features. The growing availability of distributed XML sources and the variety of high-demand environments raise the need for clustering approaches that can exploit distributed processing techniques. Nevertheless, existing methods for clustering XML documents are designed to work in a centralized way. In this paper, we address the problem of clustering XML documents in a collaborative distributed framework. XML documents are first decomposed based on semantically cohesive subtrees, then modeled as transactional data that embed both XML structure and content information. The proposed clustering framework employs a centroid-based partitional clustering method that has been developed for a peer-to-peer network. Each peer in the network is allowed to compute a local clustering solution over its own data, and to exchange its cluster representatives with other peers. The exchanged representatives are used to compute representatives for the global clustering solution in a collaborative way. We evaluated effectiveness and efficiency of our approach on real XML document collections varying the number of peers. Results have shown that major advantages with respect to the corresponding centralized clustering setting are obtained in terms of runtime behavior, although clustering solutions can still be accurate with a moderately low number of nodes in the network. Moreover, the collaborativeness characteristic of our approach has revealed to be a convenient feature in distributed clustering as found in a comparative evaluation with a distributed non-collaborative clustering method.  相似文献   

14.
Social online communities and platforms play a significant role in the activities of software developers either as an integral part of the main activities or through complimentary knowledge and information sharing. As such techniques become more prevalent resulting in a wealth of shared information, the need to effectively organize and sift through the information becomes more important. Top-down approaches such as formal hierarchical directories have shown to lack scalability to be applicable to these circumstanes. Light-weight bottom-up techniques such as community tagging have shown promise for better organizing the available content. However, in more focused communities of practice, such as software engineering and development, community tagging can face some challenges such as tag explosion, locality of tags and interpretation differences, to name a few. To address these challenges, we propose a semantic tagging approach that benefits from the information available in Wikipedia to semantically ground the tagging process and provide a methodical approach for tagging social software engineering content. We have shown that our approach is able to provide high quality tags for social software engineering content that can be used not only for organizing such content but also for making meaningful and relevant content recommendation to the users both within a local community and also across multiple social online communities. We have empirically validated our approach through four main research questions. The results of our observations show that the proposed approach is quite effective in organizing social software engineering content and making relevant, helpful and novel content recommendations to software developers and users of social software engineering communities.  相似文献   

15.
We are currently developing a vision-based system aiming to perform a fully automatic pipeline for in situ photorealistic three-dimensional (3D) modeling of previously unknown, complex and unstructured underground environments. Since in such environments navigation sensors are not reliable, our system embeds only passive (camera) and active (laser) 3D vision senors. Laser Range Finders are particularly well suited for generating dense 3D maps by aligning multiples scans acquired from different viewpoints. Nevertheless, nowadays Iteratively Closest Point (ICP)-based scan matching techniques rely on heavy human operator intervention during a post-processing step. Since a human operator cannot access the site, these techniques are not suitable in high-risk underground environments. This paper presents an automatic on-line scan matcher able to cope with the nowadays 3D laser scanners’ architecture and to process either intensity or depth data to align scans, providing robustness with respect to the capture device. The proposed implementation emphasizes the portability of our algorithm on either single or multi-core embedded platforms for on-line mosaicing onboard 3D scanning devices. The proposed approach addresses key issues for in situ 3D modeling in difficult-to-access and unstructured environments and solves for the 3D scan matching problem within an environment-independent solution. Several tests performed in two prehistoric caves illustrate the reliability of the proposed method.  相似文献   

16.
The simplicity of the hypertext model behind the World Wide Web is a factor in its success, but this simplicity brings limitations. One of these limitations is embedding links in documents. Open Hypermedia addresses this by instead storing them in separate link databases. Meanwhile, the Adaptive Hypermedia approach seeks to enhance a user's experience by inserting personalised additional content and links on the web page. However, these techniques do not offer the user any control over the adaptation. In this paper, we propose the concept of a multi-dimensional linkbase for adaptive links presentation. Links are created and stored in a single, multi-dimensional, linkbase that provides presentation links based on the user's preferences and profile. We present a web-based system Inquiry-led Personalised Navigation System that implements this multi-dimensional concept for controlling its personalisation of hyperlinks. We give the results of our evaluation, which confirm that user-controlled adaptation is a satisfactory approach to providing users with control over personalisation, and can alleviate the link overload problem.  相似文献   

17.
We propose a user model to support personalized learning paths through online material. Our approach is a variant of student modeling using the computer tutoring concept of knowledge tracing. Knowledge tracing involves representing the knowledge required to master a domain, and, from traces of online user behavior, diagnosing user knowledge states as a profile over those elements. The user model is induced from documents tagged by an expert in a social tagging system. Tags identified with “expertise” in a domain can be used to identify a corpus of domain documents. That corpus can be fed to an automated process that distills a topic model representation characteristic of the domain. As a learner navigates and reads online material, inferences can be made about the degree to which topics in the target domain have been learned. We validate this knowledge tracing approach against data from a social tagging study. As part of this evaluation, we match the predictions of the knowledge-tracing model to individual participant responses made to individual question items used to test domain knowledge.  相似文献   

18.
We present new primal–dual algorithms for several network design problems. The problems considered are the generalized Steiner tree problem (GST), the directed Steiner tree problem (DST), and the set cover problem (SC) which is a subcase of DST. All our problems are NP-hard; so we are interested in their approximation algorithms. First, we give an algorithm for DST which is based on the traditional approach of designing primal–dual approximation algorithms. We show that the approximation factor of the algorithm is k, where k is the number of terminals, in the case when the problem is restricted to quasi-bipartite graphs. We also give pathologically bad examples for the algorithm performance. To overcome the problems exposed by the bad examples, we design a new framework for primal–dual algorithms which can be applied to all of our problems. The main feature of the new approach is that, unlike the traditional primal–dual algorithms, it keeps the dual solution in the interior of the dual feasible region. The new approach allows us to avoid including too many arcs in the solution, and thus achieves a smaller-cost solution. Our computational results show that the interior-point version of the primal–dual most of the time performs better than the original primal–dual method.  相似文献   

19.
Many organizations use business policies to govern their business processes, often resulting in huge amounts of policy documents. As new regulations arise such as Sarbanes-Oxley, these business policies must be modified to ensure their correctness and consistency. Given the large amounts of business policies, manually analyzing policy documents to discover process information is very time-consuming and imposes excessive workload. In order to provide a solution to this information overload problem, we propose a novel approach named Policy-based Process Mining (PBPM) to automatically extracting process information from policy documents. Several text mining algorithms are applied to business policy texts in order to discover process-related policies and extract such process components as tasks, data items, and resources. Experiments are conducted to validate the extracted components and the results are found to be very promising. To the best of our knowledge, PBPM is the first approach that applies text mining towards discovering business process components from unstructured policy documents. The initial research results presented in this paper will require more research efforts to make PBPM a practical solution.  相似文献   

20.
The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1?\(\frac{1}{e}\) for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号