首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although Web Search Engines index and provide access to huge amounts of documents, user queries typically return only a linear list of hits. While this is often satisfactory for focalized search, it does not provide an exploration or deeper analysis of the results. One way to achieve advanced exploration facilities exploiting the availability of structured (and semantic) data in Web search, is to enrich it with entity mining over the full contents of the search results. Such services provide the users with an initial overview of the information space, allowing them to gradually restrict it until locating the desired hits, even if they are low ranked. This is especially important in areas of professional search such as medical search, patent search, etc. In this paper we consider a general scenario of providing such services as meta-services (that is, layered over systems that support keywords search) without a-priori indexing of the underlying document collection(s). To make such services feasible for large amounts of data we use the MapReduce distributed computation model on a Cloud infrastructure (Amazon EC2). Specifically, we show how the required computational tasks can be factorized and expressed as MapReduce functions. A key contribution of our work is a thorough evaluation of platform configuration and tuning, an aspect that is often disregarded and inadequately addressed in prior work, but crucial for the efficient utilization of resources. Finally we report experimental results about the achieved speedup in various settings.  相似文献   

2.
林子雨  邹权  赖永炫  林琛 《软件学报》2014,25(3):528-546
关键词查询可以帮助用户从数据库中快速获取感兴趣的内容,它不需要用户掌握专业的数据库结构化查询语言,降低了使用门槛.针对基于关键词的数据库查询,基于数据图的方法是一种比较常见的方法,它把数据库转换成数据图,然后从数据图中计算最小Steiner树.但是,已有的方法无法根据不断变化的用户查询兴趣而动态优化查询结果.提出采用蚁群优化算法解决数据库中的关键词查询问题,并提出了基于概念漂移理论的用户查询兴趣突变探查方法,可以及时发现用户兴趣的突变.在此基础上,提出了基于概念漂移理论和蚁群优化算法的查询结果动态优化算法ACOKS*,可以根据突变的用户兴趣,动态地优化查询结果,使其更加符合用户查询预期.在原型系统上得到的大量实验结果表明,该方法具有很好的可扩展性,并且可以比已有的方法取得更好的性能.  相似文献   

3.
ABSTRACT

Understanding the search behaviour of online users is among the long-tail practices of Interactive Information Retrieval that helps identify the user information needs. The Interactive Social Book Search (SBS), under the umbrella of Interactive Information Retrieval (IIR), aims to understand the user interactions with book collections and the associated professionally-curated and socially-constructed metadata on the baseline and multistage user interfaces (UIs). This paper reports on the book search behaviour of users by reviewing research publications related to the Interactive SBS published during the last two decades. It presents a holistic view of the overall progress of Interactive SBS by summarising and visualising the experimental structure, search systems, datasets, demographics of participants, and findings to identify the research trends and possible future directions. Based on the collected evidence, it attempts to answer how the search system, user interface (UI), and the nature of tasks affect the book search behaviour of users. The article is the first of its kind that attempts to understand the book search behaviour of users in the context of Social Book Search with implications for usability experts and others working in UI design, web search engines, book search engines, digital libraries, collaborative social cataloguing websites, and e-Commerce applications.  相似文献   

4.
Efficient discovery of interesting statements in databases   总被引:3,自引:0,他引:3  
The Explora system supportsDiscovery in Databases by large scale search for interesting instances of statistical patterns. In this paper we describe how Explora assessesinterestingness and achievescomputational efficiency. These problems arise because of the variety of patterns and the immense combinatorial possibilities of generating instances when studying relations between variables in subsets of data. First, the user must be saved from getting overwhelmed with a deluge of findings. To restrict the search with respect to the analysis goals, the user can focus each discovery task performed during an interactive and iterative exploration process. Some basic organization principles of search can further limit the search effort. One principle is to organize search hierarchically and to evaluate first the statistical or information theoretic evidence of the general hypotheses. Then more special hypotheses can be eliminated from further search, if a more general hypothesis was already verified. But this approach alone has some drawbacks and even in moderately sized data does not prevent large sets of findings. Therefore, in a second evaluation phase, further aspects of interestingness are assessed. A refinement strategy selects the most interesting of the statistically significant statements. A second problem for discovery systems is efficiency. Each hypothesis evaluation requires many data accesses. We describe strategies that reduce data accesses and speed up computation.  相似文献   

5.
Experienced users who query search engines have a complex behavior. They explore many topics in parallel, experiment with query variations, consult multiple search engines, and gather information over many sessions. In the process they need to keep track of search context — namely useful queries and promising result links, which can be hard. We present an extension to search engines called SearchPad that makes it possible to keep track of ‘search context' explicitly. We describe an efficient implementation of this idea deployed on four search engines: AltaVista, Excite, Google and Hotbot. Our design of SearchPad has several desirable properties: (i) portability across all major platforms and browsers; (ii) instant start requiring no code download or special actions on the part of the user; (iii) no server side storage; and (iv) no added client–server communication overhead. An added benefit is that it allows search services to collect valuable relevance information about the results shown to the user. In the context of each query SearchPad can log the actions taken by the user, and in particular record the links that were considered relevant by the user in the context of the query. The service was tested in a multi-platform environment with over 150 users for 4 months and found to be usable and helpful. We discovered that the ability to maintain search context explicitly seems to affect the way people search. Repeat SearchPad users looked at more search results than is typical on the Web, suggesting that availability of search context may partially compensate for non-relevant pages in the ranking.  相似文献   

6.

There are limited studies that are addressing the challenges of visually impaired (VI) users when viewing search results on a search engine interface by using a screen reader. This study investigates the effect of providing an overview of search results to VI users. We present a novel interactive search engine interface called InteractSE to support VI users during the results exploration stage in order to improve their interactive experience and web search efficiency. An overview of the search results is generated using an unsupervised machine learning approach to present the discovered concepts via a formal concept analysis that is domain-independent. These concepts are arranged in a multi-level tree following a hierarchical order and covering all retrieved documents that share maximal features. The InteractSE interface was evaluated by 16 legally blind users and compared with the Google search engine interface for complex search tasks. The evaluation results were obtained based on both quantitative (as task completion time) and qualitative (as participants’ feedback) measures. These results are promising and indicate that InteractSE enhances the search efficiency and consequently advances user experience. Our observations and analysis of the user interactions and feedback yielded design suggestions to support VI users when exploring and interacting with search results.

  相似文献   

7.
8.
Searching a digital library is typically a tedious task. A system can improve information access by building on knowledge about a user acquired in a user profile in order to customize information access both in terms of the information returned in response to a query (query personalization) as well as in terms of the presentation of the results (presentation personalization). In this paper, we focus on query personalization in digital libraries; in particular, we address structured queries involving metadata stored in relational databases. We describe the specification of user preferences at the level of a user profile and the process of query personalization with the use of query-rewriting rules.  相似文献   

9.
Novice users often do not have enough domain knowledge to create good queries for searching information on-line. To help alleviate the situation, exploration techniques have been used to increase the diversity of the search results so that not only those explicitly asked will be returned, but also those potentially relevant ones will be returned too. Most existing approaches, such as collaborative filtering, do not allow the level of exploration to be controlled. Consequently, the search results can be very different from what is expected. We propose an exploration strategy that performs intelligent query processing by first searching usable old queries, and then utilising them to adapt the current query, with the hope that the adapted query will be more relevant to the user’s areas of interest. We applied the proposed strategy to the implementation of a personal information assistant (PIA) set up for user evaluation for 3 months. The experimental results showed that the proposed exploration method outperformed collaborative filtering, and mutation and crossover methods by around 25% in terms of the elimination of off-topic results.  相似文献   

10.
互联网上存在大量"同质"的网站或服务,以其封装而成的Mashup构件往往提供相同或相似的功能,为了获取最佳服务,用户不得不逐一浏览每个Mashup构件,花费大量的时间和精力。从用户视角出发,提出了一种同质Mashup构件动态聚合机制。基于元数据搜索和表单匹配技术,该机制将多个同质Mashup构件组成一个构件池,再按照用户当前的数据请求,动态发现最佳服务,从而实现服务资源的整合和优化,提高整体服务质量。最后,基于富客户端构件组装与验证框架iMashup实现了上述机制,并进行了实验验证。  相似文献   

11.

This paper contributes to the efficient visualization and management of 3D content for e-commerce purposes. The main objective of this research is to improve the multimedia management of complex 3D models, such as CAD or BIM models, by simply dragging a CAD/BIM file into a web application. Our developments and tests show that it is possible to convert these models into web compatible formats. The platform we present performs this task requiring no extra intervention from the user. This process makes sharing 3D content on the web immediate and simple, offering users an easy way to create rich accessible multiplatform catalogues. Furthermore, the platform enables users to view and interact with the uploaded models on any WebGL compatible browser favouring collaborative environments. Despite not being the main objective of this work, an interface with search engines has also been designed and tested. It shows that users can easily search for 3D products in a catalogue. The platform stores metadata of the models and uses it to narrow the search queries. Therefore, more precise results are obtained.

  相似文献   

12.
When users want to continue an analysis performed in the past, done by themselves or by a collaborator, they need an overview of what has been done and found so far. Such an overview helps them to gain a shared knowledge about each otherspsila analysis strategy and continue the analysis. We aim to support users in this process, and thereby support their exploration awareness. We present an information visualization framework with three linked processes: overview, search and retrieve for this purpose. First, we present a userpsilas information interest model that captures key aspects of the exploration process. Exploration overview, and keyword and similarity based search mechanisms are designed based on these key aspects. A metadata view is used to visualize the search results and help users to retrieve specific visualizations from past analysis. Finally, we present three case studies and discuss the support offered by the framework for developing exploration awareness.  相似文献   

13.
It is now well-established that the so-called focalization property plays a central role in the design of programming languages based on proof search, and more generally in the proof theory of linear logic. We present here a sequent calculus for non-commutative logic (NL) which enjoys the focalization property. In the multiplicative case, we give a focalized sequentialization theorem, and in the general case, we show that our focalized sequent calculus is equivalent to the original one by studying the permutabilities of rules for NL and showing that all permutabilities of linear logic involved in focalization can be lifted to NL permutabilities. These results are based on a study of the partitions of partially ordered sets modulo entropy.  相似文献   

14.
Recommender systems combine ideas from information retrieval, user modelling, and artificial intelligence to focus on the provision of more intelligent and proactive information services. As such, recommender systems play an important role when it comes to assisting the user during both routine and specialised information retrieval tasks. Like any good assistant it is important that users can trust in the ability of a recommender system to respond with timely and relevant suggestions. In this paper, we will look at a collaborative recommendation system operating in the domain of Web search. We will show how explicit models of trust can help to inform more reliable recommendations that translate into more relevant search results. Moreover, we demonstrate how the availability of this trust-model facilitates important interface enhancements that provide a means to declare the provenance of result recommendations in a way that will allow searchers to evaluate their likely relevance based on the reputation and trustworthiness of the recommendation partners behind these suggestions.  相似文献   

15.
The popularity of Web Search Engines (WSEs) enables them to generate a lot of data in form of query logs. These files contain all search queries submitted by users. Economical benefits could be earned by means of selling or releasing those logs to third parties. Nevertheless, this data potentially expose sensitive user information. Removing direct identifiers is not sufficient to preserve the privacy of the users. Some existing privacy-preserving approaches use log batch processing but, as logs are generated and consumed in a real-time environment, a continuous anonymization process would be more convenient. In this way, in this paper we propose: (i) a new method to anonymize query logs, based on k-anonymity; and (ii) some de-anonymization tools to determine possible privacy problems, in case that an attacker gains access to the anonymized query logs. This approach preserves the original user interests, but spreads possible semi-identifier information over many users, preventing linkage attacks. To assess its performance, all the proposed algorithms are implemented and an extensive set of experiments are conducted using real data.  相似文献   

16.
主要研究了基于深度学习技术挖掘用户搜索主题相关的感兴趣内容。通过深度挖掘算法分析用户搜索记录、查询历史以及用户感兴趣的相关文档视为用户搜索主题数据的来源,进而挖掘兴趣主题。挖掘模型主要采用向量空间模型,将用户搜索主题模型表示成用户搜索主题向量形式。形成主题和用户兴趣关系网,用户搜索主题向量的构造过程:选择一组用户查询词,并对它们进行深度挖掘分类,最后用它们构造用户搜索主题特征向量,进而分析用户兴趣点。结合用户随着时间的变化,以及过程中有不用的搜索词,以及无关的搜索噪声词去掉,调整兴趣度,用户搜索主题需要具有更新学习机制,动态跟踪了用户兴趣变化趋势。该用户搜索主题研究过程克服了数据稀疏、类别偏差、扩展性差等缺点。实验结果表明,该模型识别用户搜索主题准确率良好。  相似文献   

17.
Novice users often do not have enough domain knowledge to create good queries for searching information on-line. To help alleviate the situation, exploration techniques have been used to increase the diversity of the search results so that not only those explicitly asked will be returned, but also those potentially relevant ones will be returned too. Most existing approaches, such as collaborative filtering, do not allow the level of exploration to be controlled. Consequently, the search results can be very different from what is expected. We propose an exploration strategy that performs intelligent query processing by first searching usable old queries, and then utilising them to adapt the current query, with the hope that the adapted query will be more relevant to the user’s areas of interest. We applied the proposed strategy to the implementation of a personal information assistant (PIA) set up for user evaluation for 3 months. The experimental results showed that the proposed exploration method outperformed collaborative filtering, and mutation and crossover methods by around 25% in terms of the elimination of off-topic results.  相似文献   

18.
A critical reality in integration is that knowledge obtained from different sources may often be conflicting. Conflict-resolution, whether performed during the design phase or during run-time, can be costly and, if done without a proper understanding of the usage context, can be ineffective. In this paper, we propose a novel exploration and feedback-based approach [FICSR (Pronounced as “fixer”)] to conflict-resolution when integrating metadata from different sources. Rather than relying on purely automated conflict-resolution mechanisms, FICSR brings the domain expert in the conflict-resolution process and informs the integration based on the expert’s feedback. In particular, instead of relying on traditional model based definition of consistency (which, whenever there are conflicts, picks a possible world among many), we introduce a ranked interpretation of the metadata and statements about the metadata. This not only enables FICSR to avoid committing to an interpretation too early, but also helps in achieving a more direct correspondence between the experts’ (subjective) interpretation of the data and the system’s (objective) treatment of the available alternatives. Consequently, the ranked interpretation leads to new opportunities for exploratory feedback for conflict-resolution: within the context of a given statement of interest, (a) a preliminary ranking of candidate matches, representing different resolutions of the conflicts, informs the user about the alternative interpretations of the metadata, while (b) user feedback regarding the preferences among alternatives is exploited to inform the system about the expert’s relevant domain knowledge. The expert’s feedback, then, is used for resolving not only the conflicts among different sources, but also possible mis-alignments due to the initial matching phase. To enable this feedback process, we develop data structures and algorithms for efficient off-line conflict/agreement analysis of the integrated metadata. We also develop algorithms for efficient on-line query processing, candidate result enumeration, validity analysis, and system feedback. The results are brought together and evaluated in the Feedback-based InConSistency Resolution (FICSR) system. This research has been funded with NSF Grant, AOC: Archaeological Data Integration for the Study of Long-Term Human and Social Dynamics, 2007–2009. This work was done while the M. L. Sapino was at ASU for sabbatical.  相似文献   

19.
This paper investigates user interpretation of search result displays on small screen devices. Such devices present interesting design challenges given their limited display capabilities, particularly in relation to screen size. Our aim is to provide users with succinct yet useful representations of search results that allow rapid and accurate decisions to be made about the utility of result documents, yet minimize user actions (such as scrolling), the use of device resources, and the volume of data to be downloaded. Our hypothesis is that keyphrases that are automatically extracted from documents can support this aim. We report on a user study that compared how accurately users categorized result documents on small screens when the document surrogates consisted of either keyphrases only, or document titles. We found no significant performance differences between the two conditions. In addition to these encouraging results, keyphrases have the benefit that they can be extracted and presented when no other document metadata can be identified.  相似文献   

20.
Several large-scale Grid infrastructures are currently in operation around the world, federating an impressive collection of computational resources, a wide variety of application software, and hundreds of user communities. To better serve the current and prospective users of Grid infrastructures, it is important to develop advanced software retrieval services that could help users locate software components suitable to their needs. In this paper, we present the design and implementation of Minersoft, a distributed, multi-threaded harvester for application software located in large-scale Grid infrastructures. Minersoft crawls the sites of a Grid infrastructure, discovers installed software resources, annotates them with keyword-rich metadata, and creates inverted indexes that can be used to support full-text software retrieval. We present insights derived from the implementation and deployment of Minersoft on EGEE, one of the largest Grid production services currently in operation. Experimental results show that Minersoft achieves a high performance in crawling EGEE sites and discovering software-related files, and a high efficiency in supporting software retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号