首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Spink  A. Jansen  B.J. Wolfram  D. Saracevic  T. 《Computer》2002,35(3):107-109
The Web has become a worldwide source of information and a mainstream business tool. Are human information needs and searching behaviors evolving along with Web content? As part of a body of research studying this question, we have analyzed three data sets culled from more than one million queries submitted by more than 200,000 users of the Excite Web search engine, collected in September 1997, December 1999, and May 2001. This longitudinal benchmark study shows that public Web searching is evolving in certain directions. Specifically, search topics have shifted from entertainment and sex to commerce and people, but there is little change in query lengths or frequency per user. Search topics have shifted, but there is little change in user search behaviors  相似文献   

2.
We present an effective technique for automatic extraction, representation, and classification of digital video, and a visual language for formulation of queries to access the semantic information contained in digital video. We have devised an algorithm that extracts motion information from a video sequence. This algorithm provides a low-cost extension to the motion compensation component of the MPEG compression algorithm. In this paper, we present a visual language called VEVA for querying multimedia information in general, and video semantic information in particular. Unlike many other proposals that concentrate on browsing the data, VEVA offers a complete set of capabilities for specifying relationships between the image components and formulating queries that search for objects, their motions and their other associated characteristics. VEVA has been shown to be very expressive in this context mainly due to the fact that many types of multimedia information are inherently visual in nature.  相似文献   

3.
Web Search is increasingly entity centric; as a large fraction of common queries target specific entities, search results get progressively augmented with semi-structured and multimedia information about those entities. However, search over personal web browsing history still revolves around keyword-search mostly. In this paper, we present a novel approach to answer queries over web browsing logs that takes into account entities appearing in the web pages, user activities, as well as temporal information. Our system, B-hist, aims at providing web users with an effective tool for searching and accessing information they previously looked up on the web by supporting multiple ways of filtering results using clustering and entity-centric search. In the following, we present our system and motivate our User Interface (UI) design choices by detailing the results of a survey on web browsing and history search. In addition, we present an empirical evaluation of our entity-based approach used to cluster web pages.  相似文献   

4.
Databases deepen the Web   总被引:2,自引:0,他引:2  
Ghanem  T.M. Aref  W.G. 《Computer》2004,37(1):116-117
The Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that users access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers. The number of database-driven Web sites is increasing exponentially, and each site is creating pages dynamically-pages that are hard for traditional search engines to reach. Such search engines crawl and index static HTML pages; they do not send queries to Web databases. The information hidden inside Web databases is called the "deep Web" in contrast to the "surface Web" that traditional search engines access easily. We expect deep Web search engines and technologies to improve rapidly and to dramatically affect how the Web is used by providing easy access to many more information resources.  相似文献   

5.
高明  黄哲学 《集成技术》2012,1(3):47-54
随着Deep Web数量和规模的快速增长,通过对其发起查询请求以得到存储在后台数据库中的相关信息,日渐成为用户获取信息的主要方式。为了方便用户有效地利用Deep Web中的信息,越来越多的研究者致力于这一领域的研究,重点之一是Deep Web后台数据库的数据集成。由于Deep Web后台数据库存储的主要是文本信息,使得从文本处理角度出发,针对Deep Web中存储的内容进行查询与检索的研究具有十分广阔的应用前景。本文对Deep Web的研究现状进行了较为详细的分析,同时对研究的发展方向进行了展望。  相似文献   

6.
《Computer Networks》1999,31(11-16):1467-1479
When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach to Web searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related Web pages. A related Web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers.We describe two algorithms to identify related Web pages. These algorithms use only the connectivity information in the Web (i.e., the links between pages) and not the content of pages or usage information. We have implemented both algorithms and measured their runtime performance. To evaluate the effectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's `What's Related' service (http://home.netscape.com/escapes/related/). Our study showed that the precision at 10 for our two algorithms are 73% better and 51% better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.  相似文献   

7.
The World Wide Web, with its paradigms of surfing and searching for information, has become the predominant system for computer-based information retrieval. Media resources, however information-rich, only play a minor role in providing information to Web users. While bandwidth (or the lack thereof) may be an excuse for this situation, the lack of surfing and searching capabilities on media resources are the real issue. We present an architecture that extends the Web to media, enabling existing Web infrastructures to provide seamless search and hyperlink capabilities for time-continuous Web resources, with only minor extensions. This makes the Web a true distributed information system for multimedia data. The article provides an overview of the specifications that have been developed and submitted to the IETF for standardization. It also presents experimental results with prototype applications.  相似文献   

8.
Web搜索引擎是Internet上非常有用的信息检索工具.但是,目前搜索引擎检索出的信息量庞大.如何能够快速和精确地在这种海洋信息中检索到用户所需信息已成为重要的研究课题.提出基于元搜索引擎理论上的专业课程信息搜索系统,着重研究结果页面信息的提取技术和向量空间模型算法.  相似文献   

9.
Encryption ensures confidentiality of the data outsourced to cloud storage services. Searching the encrypted data enables subscribers of a cloud storage service to access only relevant data, by defining trapdoors or evaluating search queries on locally stored indexes. However, these approaches do not consider access privileges while executing search queries. Furthermore, these approaches restrict the searching capability of a subscriber to a limited number of trapdoors defined during data encryption. To address the issue of privacy-aware data search, we propose Oblivious Term Matching (OTM). Unlike existing systems, OTM enables authorized subscribers to define their own search queries comprising of arbitrary number of selection criterion. OTM ensures that cloud service provider obliviously evaluates encrypted search queries without learning any information about the outsourced data. Our performance analysis has demonstrated that search queries comprising of 2 to 14 distinct search criteria cost only 0.03 to 1.09 $ per 1000 requests.  相似文献   

10.
汤迪斌  王劲林  倪宏 《计算机应用》2008,28(8):1991-1993
CDN对于动态Web应用的加速通常采用数据缓存或复制技术。针对论坛、博客服务提供商等为注册用户提供个人信息发布平台的网站,提出了一种基于用户的数据分割方法:将数据按所属注册用户进行分割,分布到离该用户最近的数据库系统中。将数据库UID操作分散到多个数据库系统,消除了单个数据库系统的I/O瓶颈。  相似文献   

11.
The impetus behind Semantic Web research remains the vision of supplementing availability with utility; that is, the World Wide Web provides availability of digital media, but the Semantic Web will allow presently available digital media to be used in unseen ways. An example of such an application is multimedia retrieval. At present, there are vast amounts of digital media available on the web. Once this media gets associated with machine-understandable metadata, the web can serve as a potentially unlimited supplier for multimedia web services, which could populate themselves by searching for keywords and subsequently retrieving images or articles, which is precisely the type of system that is proposed in this paper. Such a system requires solid interoperability, a central ontology, semantic agent search capabilities, and standards. Specifically, this paper explores this cross-section of image annotation and Semantic Web services, models the web service components that constitute such a system, discusses the sequential, cooperative execution of these Semantic Web services, and introduces intelligent storage of image semantics as part of a semantic link space.  相似文献   

12.
We investigate the possibility of using Semantic Web data to improve hypertext Web search. In particular, we use relevance feedback to create a ‘virtuous cycle’ between data gathered from the Semantic Web of Linked Data and web-pages gathered from the hypertext Web. Previous approaches have generally considered the searching over the Semantic Web and hypertext Web to be entirely disparate, indexing, and searching over different domains. While relevance feedback has traditionally improved information retrieval performance, relevance feedback is normally used to improve rankings over a single data-set. Our novel approach is to use relevance feedback from hypertext Web results to improve Semantic Web search, and results from the Semantic Web to improve the retrieval of hypertext Web data. In both cases, an evaluation is performed based on certain kinds of informational queries (abstract concepts, people, and places) selected from a real-life query log and checked by human judges. We evaluate our work over a wide range of algorithms and options, and show it improves baseline performance on these queries for deployed systems as well, such as the Semantic Web Search engine FALCON-S and Yahoo! Web search. We further show that the use of Semantic Web inference seems to hurt performance, while the pseudo-relevance feedback increases performance in both cases, although not as much as actual relevance feedback. Lastly, our evaluation is the first rigorous ‘Cranfield’ evaluation of Semantic Web search.  相似文献   

13.
随着Web服务应用的迅速发展与日益普及,如何快速、准确地搜索到用户所需的Web服务成为了制约Web服务发展的关键问题之一。目前的Web服务搜索技术包括:基于UDDI注册中心、通过Web服务网站、使用专用搜索引擎与使用通用搜索引擎四种方式。对现有主要Web服务搜索技术进行了详细评述。在对典型Web服务搜索技术分析比较的基础上,指出了建立专用的Web服务搜索引擎的必要性以及所面临的问题与挑战。  相似文献   

14.
Web服务搜索技术综述*   总被引:1,自引:0,他引:1       下载免费PDF全文
随着Web服务应用的迅速发展与日益普及, 如何快速、准确地搜索到用户所需的Web服务成为了制约Web服务发展的关键问题之一。目前的Web服务搜索技术包括:基于UDDI注册中心、通过Web服务网站、使用专用搜索引擎与使用通用搜索引擎四种方式。对现有主要Web服务搜索技术进行了详细评述。在对典型Web服务搜索技术分析比较的基础上, 指出了建立专用的Web服务搜索引擎的必要性以及所面临的问题与挑战。  相似文献   

15.
The World-Wide Web can be viewed as a collection of semi-structured multimedia documents in the form of Web pages connected through hyperlinks. Unlike most web search engines, which primarily focus on information retrieval functionality, WebDB aims at supporting a comprehensive database-like query functionality, including selection, aggregation, sorting, summary, grouping, and projection. WebDB allows users to access (1) document level information, such as title, URL, length, keywords types and last modified date; (2) intra-document structures, such as tables, forms and images and (3) inter-document linkage information, such as destination URLs and anchors. With these three types of information, comprehensive queries for complex Web-based applications, such as Web mining and Web site management, can be answered. WebDB is based on object-relational concepts: Object-oriented modeling and relational query language. In this paper, we present the data model, language and implementation of WebDB. We also present the novel visual query/browsing interface for semi-structured Web and Web documents. Our system provides high usability compared with other existing systems.  相似文献   

16.
Time plays important roles in Web search, because most Web pages contain temporal information and a lot of Web queries are time-related. How to integrate temporal information in Web search engines has been a research focus in recent years. However, traditional search engines have little support in processing temporal-textual Web queries. Aiming at solving this problem, in this paper, we concentrate on the extraction of the focused time for Web pages, which refers to the most appropriate time associated with Web pages, and then we used focused time to improve the search efficiency for time-sensitive queries. In particular, three critical issues are deeply studied in this paper. The first issue is to extract implicit temporal expressions from Web pages. The second one is to determine the focused time among all the extracted temporal information, and the last issue is to integrate focused time into a search engine. For the first issue, we propose a new dynamic approach to resolve the implicit temporal expressions in Web pages. For the second issue, we present a score model to determine the focused time for Web pages. Our score model takes into account both the frequency of temporal information in Web pages and the containment relationship among temporal information. For the third issue, we combine the textual similarity and the temporal similarity between queries and documents in the ranking process. To evaluate the effectiveness and efficiency of the proposed approaches, we build a prototype system called Time-Aware Search Engine (TASE). TASE is able to extract both the explicit and implicit temporal expressions for Web pages, and calculate the relevant score between Web pages and each temporal expression, and re-rank search results based on the temporal-textual relevance between Web pages and queries. Finally, we conduct experiments on real data sets. The results show that our approach has high accuracy in resolving implicit temporal expressions and extracting focused time, and has better ranking effectiveness for time-sensitive Web queries than its competitor algorithms.  相似文献   

17.
Marchionini  G. Haas  S.W. Zhang  J. Elsas  J. 《Computer》2005,38(12):52-61
As government agencies provide increasing amounts of information through their Web sites, more people are attempting to make sense of it. The result is a significant volume of e-mail queries - many of which boil down to "Where can I find X?" or "What does X mean exactly?" Such queries underline a major stumbling block to widespread digital access: how best to provide highly codified, statistical data to a large, diverse population with varying levels of numerical literacy. The responsibility for this challenge falls to government statistical services, which must somehow package staggering amounts of data on everything from the gross national product to basic animal care in a way that a diversity of potential data users find palatable. The GovStat project aims to make the vast resources of government statistical data more broadly accessible to both agencies and the general population. Some first steps are creating layers of online help to address different browsing needs and developing prototype interfaces for exploring data.  相似文献   

18.
OBJECTIVE: The purpose of this study was to investigate the relationship between strategy use and search success on the World Wide Web (i.e., the Web) for experienced Web users. An additional goal was to extend understanding of how the age of the searcher may influence strategy use. BACKGROUND: Current investigations of information search and retrieval on the Web have provided an incomplete picture of Web strategy use because participants have not been given the opportunity to demonstrate their knowledge of Web strategies while also searching for information on the Web. METHODS: Using both behavioral and knowledge-engineering methods, we investigated searching behavior and system knowledge for 16 younger adults (M = 20.88 years of age) and 16 older adults (M = 67.88 years). RESULTS: Older adults were less successful than younger adults in finding correct answers to the search tasks. Knowledge engineering revealed that the age-related effect resulted from ineffective search strategies and amount of Web experience rather than age per se. Our analysis led to the development of a decision-action diagram representing search behavior for both age groups. CONCLUSION: Older adults had more difficulty than younger adults when searching for information on the Web. However, this difficulty was related to the selection of inefficient search strategies, which may have been attributable to a lack of knowledge about available Web search strategies. APPLICATION: Actual or potential applications of this research include training Web users to search more effectively and suggestions to improve the design of search engines.  相似文献   

19.
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.  相似文献   

20.
As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitudeand latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content.In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号