首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 404 毫秒
1.
《Information Systems》2006,31(4-5):247-265
As more information becomes available on the Web, there has been a crescent interest in effective personalization techniques. Personal agents providing assistance based on the content of Web documents and the user interests emerged as a viable alternative to this problem. Provided that these agents rely on having knowledge about users contained into user profiles, i.e., models of user preferences and interests gathered by observation of user behavior, the capacity of acquiring and modeling user interest categories has become a critical component in personal agent design. User profiles have to summarize categories corresponding to diverse user information interests at different levels of abstraction in order to allow agents to decide on the relevance of new pieces of information. In accomplishing this goal, document clustering offers the advantage that an a priori knowledge of categories is not needed, therefore the categorization is completely unsupervised. In this paper we present a document clustering algorithm, named WebDCC (Web Document Conceptual Clustering), that carries out incremental, unsupervised concept learning over Web documents in order to acquire user profiles. Unlike most user profiling approaches, this algorithm offers comprehensible clustering solutions that can be easily interpreted and explored by both users and other agents. By extracting semantics from Web pages, this algorithm also produces intermediate results that can be finally integrated in a machine-understandable format such as an ontology. Empirical results of using this algorithm in the context of an intelligent Web search agent proved it can reach high levels of accuracy in suggesting Web pages.  相似文献   

2.
We are focusing on information access tasks characterized by large volume of hypermedia connected technical documents, a need for rapid and effective access to familiar information, and long-term interaction with evolving information. The problem for technical users is to build and maintain a personalized task-oriented model of the information to quickly access relevant information. We propose a solution which provides user-centered adaptive information retrieval and navigation. This solution supports users in customizing information access over time. It is complementary to information discovery methods which provide access to new information, since it lets users customize future access to previously found information. It relies on a technique, called Adaptive Relevance Network, which creates and maintains a complex indexing structure to represent personal user's information access maps organized by concepts. This technique is integrated within the Adaptive HyperMan system, which helps NASA Space Shuttle flight controllers organize and access large amount of information. It allows users to select and mark any part of a document as interesting, and to index that part with user-defined concepts. Users can then do subsequent retrieval of marked portions of documents. This functionality allows users to define and access personal collections of information, which are dynamically computed. The system also supports collaborative review by letting users share group access maps. The adaptive relevance network provides long-term adaptation based both on usage and on explicit user input. The indexing structure is dynamic and evolves over time. Learning and generalization support flexible retrieval of information under similar concepts. The network is geared towards more recent information access, and automatically manages its size in order to maintain rapid access when scaling up to large hypermedia space. We present results of simulated learning experiments.Dr. Mathé and Dr. Chen are contractors with Recom Technologies, Inc.  相似文献   

3.
Many text searches are meant to identify one particular fact or one particular section of a document. Unfortunately, predominant search paradigms focus mostly on identifying relevant documents and leave the burden of within-document searching on the user. This research explores term distribution visualizations as a means to more clearly identify both the relevance of documents and the location of specific information within them. We present a set of term distribution visualizations, introduce a Focus+Context model for within-document search and navigation, and describe the design and results of a 34-subject user study. This user study shows that these visualizations—with the exception of the grey scale histogram variant—are comparable in usability to our Grep interface. This is impressive given the substantial experience of our users with Grep functionality. Overall, we conclude that user do not find this visualization model difficult to use and understand.  相似文献   

4.
Rigorous analysis of user interest in web documents is essential for the development of recommender systems. This paper investigates the relationship between the implicit parameters and user explicit rating during their search and reading tasks. The objective of this paper is therefore three-fold: firstly, the paper identifies the implicit parameters which are statistically correlated with the user explicit rating through user study 1. These parameters are used to develop a predictive model which can be used to represent users’ perceived relevance of documents. Secondly, it investigates the reliability and validity of the predictive model by comparing it with eye gaze during a reading task through user study 2. Our findings suggest that there is no significant difference between the predictive model based on implicit indicators and eye gaze within the context examined. Thirdly, we measured the consistency of user explicit rating in both studies and found significant consistency in user explicit rating of document relevance and interest level which further validates the predictive model. We envisage that the results presented in this paper can help to develop recommender and personalised systems for recommending documents to users based on their previous interaction with the system.  相似文献   

5.
6.
Collaborative editing enables a group of people to edit documents collaboratively over a computer network. Customisation of the collaborative environment to different subcommunities of users at different points in time is an important issue. The model of the document is an important factor in achieving customisation. We have chosen a tree representation encompassing a large class of documents, such as text, XML and graphical documents and here we propose a multi-level editing approach for maintaining consistency over hierarchical-based documents. The multi-level editing approach involves logging edit operations that refer to each node. Keeping operations associated with the tree nodes to which they refer offers support for tracking user activity performed on various units of the document. This facilitates the computation of awareness information and the handling of conflicting changes referring to units of the document. Moreover, increased efficiency is obtained compared to existing approaches that use a linear structure for representing documents. The multi-level editing approach involves the recursive application of any linear merging algorithm over the document structure and we show how the approach was applied for real-time and asynchronous modes of collaboration.  相似文献   

7.
With the rapid growth of web, automatic tagging that detects informative terms from a document becomes an important problem for information aggregation and sharing services. In particular, automatic tagging for short documents becomes more interesting as many users are increasingly publishing information through social media services which encourage users to create the documents of short length. In this paper, we propose a novel automatic tagging model for short text documents from social media services, following the framework of supervised learning. We redefine traditional frequency-based term features so that they can address the properties of the documents created under length limitation and consider sequential dependencies between successive terms in a document based on a structural support vector machine. In addition, our proposed approach incorporates composition patterns by which users put informative terms into their documents. Extensive experiments have been conducted to validate the presented approach, and it was found that the proposed term features were effective for extracting tags, and the tag extractor trained by considering the sequential dependencies and composition patterns achieved superior performance results over the existing alternative methods.  相似文献   

8.
A new customized document categorization scheme using rough membership   总被引:1,自引:0,他引:1  
One of the problems that plague document ranking is the inherent ambiguity, which arises due to the nuances of natural language. Though two documents may contain the same set of words, their relevance may be very different to a single user, since the context of the words usually determines the relevance of a document. Context of a document is very difficult to model mathematically other than through user preferences. Since it is difficult to perceive all possible user interests a priori and install filters for the same at the server side, we propose a rough-set-based document filtering scheme which can be used to build customized filters at the user end. The documents retrieved by a traditional search engine can then be filtered automatically by this agent and the user is not flooded with a lot of irrelevant material. A rough-set-based classificatory analysis is used to learn the user's bias for a category of documents. This is then used to filter out irrelevant documents for the user. To do this we have proposed the use of novel rough membership functions for computing the membership of a document to various categories.  相似文献   

9.
A web-based search engine responds to a user’s query with a list of documents. This list can be viewed as the engine’s model of the user’s idea of relevance—the engine ‘believes’ that the first document is the most likely to be relevant, the second is slightly less likely, and so on. We extend this idea to an interactive setting where the system accepts the user’s feedback and adjusts its relevance model. We develop three specific models that are integrated as part of a system we call Lighthouse. The models incorporate document clustering and a spring-embedding visualization of inter-document similarity. We show that if a searcher were to use Lighthouse in ways consistent with the model, the expected effectiveness improves—i.e., the relevant documents are found more quickly in comparison to existing methods.  相似文献   

10.
In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into account.  相似文献   

11.
对于加密云数据的搜索,传统的关键词模糊搜索方案虽然能搜索到相关文档,但是搜索的结果并不令人满意。在用户输入正确的情况下,无法完成近似搜索,当用户出现拼写错误时,返回的结果中包含大量无关关键词文档,严重浪费了带宽资源。针对目前在加密云数据下关键词模糊搜索的缺陷,提出了一种新型的关键词模糊搜索方案,通过对关键词计算相关度分数并对文档根据相关度分数进行排序,将top-k(即相关度最高的k个文档)个文档返回给搜索用户,减少了不必要的带宽浪费和用户寻找有效文档的时间消耗,提供了更加有效的搜索结果,并且通过引入虚假陷门集,增大了云服务器对文档关键词的分析难度,增加了系统的隐私性保护。  相似文献   

12.

Given an information need and the corresponding set of documents retrieved, it is known that user assessments for such documents differ from one user to another. One frequent reason that is put forward is the discordance between text complexity and user reading fluency. We explore this relationship from three different dimensions: quantitative features, subjective-assessed difficulty, and reader/text factors. In order to evaluate quantitative features, we wondered whether it is possible to find differences between documents that are evaluated by the user and those that are ignored according to the complexity of the document. Secondly, a task related to the evaluation of the relevance of short texts is proposed. For this end, users evaluated the relevance of these short texts by answering 20 queries. Documents complexity and relevance assessments were done previously by some human experts. Then, the relationship between participants assessments, experts assessments and document complexity is studied. Finally, a third experimentation was performed under the prism of neuro-Information Retrieval: while the participants were monitored with an electroencephalogram (EEG) headset, we tried to find a correlation among EEG signal, text difficulty and the level of comprehension of texts being read during the EEG recording. In light of the results obtained, we found some weak evidence showing that users responded to queries according to text complexity and user’s reading fluency. For the second and third group of experiments, we administered a sub-test from the Woodcock Reading Mastery Test to ensure that participants had a roughly average reading fluency. Nevertheless, we think that additional variables should be studied in the future in order to achieve a sound explanation of the interaction between text complexity and user profile.

  相似文献   

13.

Document filtering is increasingly deployed in Web environments to reduce information overload of users. We formulate online information filtering as a reinforcement learning problem, i.e., TD(0). The goal is to learn user profiles that best represent information needs and thus maximize the expected value of user relevance feedback. A method is then presented that acquires reinforcement signals automatically by estimating user's implicit feedback from direct observations of browsing behaviors. This "learning by observation" approach is contrasted with conventional relevance feedback methods which require explicit user feedbacks. Field tests have been performed that involved 10 users reading a total of 18,750 HTML documents during 45 days. Compared to the existing document filtering techniques, the proposed learning method showed superior performance in information quality and adaptation speed to user preferences in online filtering.  相似文献   

14.
Collaborative filtering (CF) recommender systems have emerged in various applications to support item recommendation, which solve the information-overload problem by suggesting items of interest to users. Recently, trust-based recommender systems have incorporated the trustworthiness of users into CF techniques to improve the quality of recommendation. They propose trust computation models to derive the trust values based on users' past ratings on items. A user is more trustworthy if s/he has contributed more accurate predictions than other users. Nevertheless, conventional trust-based CF methods do not address the issue of deriving the trust values based on users' various information needs on items over time. In knowledge-intensive environments, users usually have various information needs in accessing required documents over time, which forms a sequence of documents ordered according to their access time. We propose a sequence-based trust model to derive the trust values based on users' sequences of ratings on documents. The model considers two factors – time factor and document similarity – in computing the trustworthiness of users. The proposed model enhanced with the similarity of user profiles is incorporated into a standard collaborative filtering method to discover trustworthy neighbors for making predictions. The experiment result shows that the proposed model can improve the prediction accuracy of CF method in comparison with other trust-based recommender systems.  相似文献   

15.
The text recommendation task involves delivering sets of documents to users on the basis of user models. These models are improved over time, given feedback on the delivered documents. When selecting documents to recommend, a system faces an instance of the exploration/exploitation tradeoff: whether to deliver documents about which there is little certainty, or those which are known to match the user model learned so far. In this paper, a simulation is constructed to investigate the effects of this tradeoff on the rate of learning user models, and the resulting compositions of the sets of recommended documents, in particular World-Wide Web pages. Document selection strategies are developed which correspond to different points along the tradeoff. Using an exploitative strategy, our results show that simple preference functions can successfully be learned using a vector-space representation of a user model in conjunction with a gradient descent algorithm, but that increasingly complex preference functions lead to a slowing down of the learning process. Exploratory strategies are shown to increase the rate of user model acquisition at the expense of presenting users with suboptimal recommendations; in addition they adapt to user preference changes more rapidly than exploitative strategies. These simulated tests suggest an implementation for a simple control that is exposed to users, allowing them to vary a system's document selection behavior depending on individual circumstances.  相似文献   

16.
A web portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. Each document consists of facsimile images of the original pages plus hidden OCRed text. The inclusion of images yields large file sizes of which less than 2% is the actual text. The search user interface of the portal provides poor ranking and not very informative document summaries (snippets). Thus, users are instrumental in weeding out non-relevant results. For that, they must assess the complete documents. This is a time-consuming and frustrating process because of long download and processing times of the large files. Instead of using the scanned images for relevance assessment, we propose to use a reconstruction of the original document from a purely semantic representation. Evaluation on the Dutch dataset shows that these reconstructions become two orders of magnitude smaller and still resemble the original to a high degree. In addition, they are easier to speed-read and evaluate for relevance, due to added hyperlinks and a presentation optimized for reading from a terminal. We describe the reconstruction process and evaluate the costs, the benefits, and the quality.  相似文献   

17.
Current document retrieval methods use a vector space similarity measure to give scores of relevance to documents when related to a specific query. The central problem with these methods is that they neglect any spatial information within the documents in question. We present a new method, called Fourier Domain Scoring (FDS), which takes advantage of this spatial information, via the Fourier transform, to give a more accurate ordering of relevance to a document set. We show that FDS gives an improvement in precision over the vector space similarity measures for the common case of Web like queries, and it gives similar results to the vector space measures for longer queries.  相似文献   

18.
Kwong  Linus W.  Ng  Yiu-Kai 《World Wide Web》2003,6(3):281-303
To retrieve Web documents of interest, most of the Web users rely on Web search engines. All existing search engines provide query facility for users to search for the desired documents using search-engine keywords. However, when a search engine retrieves a long list of Web documents, the user might need to browse through each retrieved document in order to determine which document is of interest. We observe that there are two kinds of problems involved in the retrieval of Web documents: (1) an inappropriate selection of keywords specified by the user; and (2) poor precision in the retrieved Web documents. In solving these problems, we propose an automatic binary-categorization method that is applicable for recognizing multiple-record Web documents of interest, which appear often in advertisement Web pages. Our categorization method uses application ontologies and is based on two information retrieval models, the Vector Space Model (VSM) and the Clustering Model (CM). We analyze and cull Web documents to just those applicable to a particular application ontology. The culling analysis (i) uses CM to find a virtual centroid for the records in a Web document, (ii) computes a vector in a multi-dimensional space for this centroid, and (iii) compares the vector with the predefined ontology vector of the same multi-dimensional space using VSM, which we consider the magnitudes of the vectors, as well as the angle between them. Our experimental results show that we have achieved an average of 90% recall and 97% precision in recognizing Web documents belonged to the same category (i.e., domain of interest). Thus our categorization discards very few documents it should have kept and keeps very few it should have discarded.  相似文献   

19.
We present a virtual reality application called VR-VIBE which is intended to support the co-operative browsing and filtering of large document stores. VR-VIBE extends a visualisation approach proposed in a previous two dimensional system called VIBE into three dimensions, allowing more information to be visualised at one time and supporting more powerful styles of interaction, The essence of VR-VIBE is that multiple users can explore the results of applying several simultaneous queries to a corpus of documents. By arranging the queries into a spatial framework, the system shows the relative attraction of each document to each query by its spatial position and also shows the absolute relevance of each document to all of the queries. Users may then navigate the space, select individual documents, control the display according to a dynamic relevance threshold and dynamically drag the queries to new positions to see the effect on the document space. Co-operative browsing is supported by directly embodying users and providing them with the ability to interact over live audio connections and to attach brief textual annotations to individual documents. Finally, we conclude with some initial observations gleaned from our experience of constructing VR-VIBE and using it in the laboratory setting.  相似文献   

20.
《Knowledge》2000,13(5):285-296
Machine-learning techniques play the important roles for information filtering. The main objective of machine-learning is to obtain users' profiles. To decrease the burden of on-line learning, it is important to seek suitable structures to represent user information needs. This paper proposes a model for information filtering on the Web. The user information need is described into two levels in this model: profiles on category level, and Boolean queries on document level. To efficiently estimate the relevance between the user information need and documents, the user information need is treated as a rough set on the space of documents. The rough set decision theory is used to classify the new documents according to the user information need. In return for this, the new documents are divided into three parts: positive region, boundary region, and negative region. An experimental system JobAgent is also presented to verify this model, and it shows that the rough set based model can provide an efficient approach to solve the information overload problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号