首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

This column describes and reviews two 'Net serials news sources, emphasizing how to access the sources and how easy the sources are to access, the usefulness and versatility of their respective search engines, and the extent of their current files and archives.  相似文献   

2.
Databases deepen the Web   总被引:2,自引:0,他引:2  
Ghanem  T.M. Aref  W.G. 《Computer》2004,37(1):116-117
The Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that users access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers. The number of database-driven Web sites is increasing exponentially, and each site is creating pages dynamically-pages that are hard for traditional search engines to reach. Such search engines crawl and index static HTML pages; they do not send queries to Web databases. The information hidden inside Web databases is called the "deep Web" in contrast to the "surface Web" that traditional search engines access easily. We expect deep Web search engines and technologies to improve rapidly and to dramatically affect how the Web is used by providing easy access to many more information resources.  相似文献   

3.
Will the World Wide Web and search engines foster access to more diverse sources of information, or have a centralizing influence through a ‘winner‐take‐all’ process? To address this question, we examined how search engines are used to access information about six global issues (climate change, poverty, HIV/AIDS, terrorism, trade reform, and Internet and society). The study used a combination of webmetric analyses and interviews with experts. From interviews we were able to explore how experts on these topics use search engines within their specialist fields. Using webmetric analysis, we were able to compare the results from a number of search engines and show how the top ranked sites are clustered as well as the distribution of their connectivity. Results suggest that the Web tends to reduce the significance of offline hierarchies in accessing information – thereby “democratizing” access to worldwide resources. It also seems, however, that centers of expertise progressively refine their specializations, gaining a ‘winner‐take‐all’ status within a narrower area. Some limitations of the winner‐take‐all thesis for access to research are discussed.  相似文献   

4.
Having indexed much of the "surface" Web, search engines are now using various approaches to index the "deep" Web. At the same time, institutional repositories and digital libraries are adopting the open archives initiative protocol for metadata harvesting (OAI-PMH) to expose their holdings. The authors harvested nearly 10 million records from OAI-PMH repositories. From these records, they extracted 3.3 million unique resource URLs and then conducted searches on samples from this collection to determine how much of the OAI-PMH corpus the three major search engines have indexed.  相似文献   

5.
As the popularity and complexity of Internet search engines increase, the design, development and maintenance of large, complex web-based Information Retrieval (WIR) systems become a challenge. The difficulty of designing a WIR system is compounded by information overload triggered from various different information sources. From a standpoint of the search engine users, it is more usable for the WIR to provide a single search point to multiple databases. To tackle this issue, we present the design and implementation of a cross-search component for the CS-Engine (Cross-Search Engine). The CS-Engine allows the user to search heterogeneous, multiple databases with one command. The CS-Engine is also distinguished from meta-search engines in that the CS-Engine does not need to trigger other search engines and translate a query for other search engines. Our performance benchmark tests show that the CS-Engine is scalable and usable. We also compare CS-engine with other search engines such as Google and AltaVista. The CS-Engine was developed with UML and design patterns including: (1) use case diagram, (2) class diagram, (3) package diagram, (4) interaction diagram, (5) Factory pattern, and (6) Strategy patterns. We conclude our paper with technical lessons learned as well as organizational issues encountered during the development phase.  相似文献   

6.
7.
Recently, search engines have enabled us to access immense quantities of useful information in an instant. In this paper, we propose a procedure for analyzing the social relationship and structure using Web search engines, which includes novel ways to create a search query and to use the number of hits. This allows us to construct various networks that reflect directed and undirected relationships among actors under arbitrary contexts. As a case study for evaluations of the proposed procedure, we focus on 50 companies belonging to the automotive industry in Japan. We constructed several directed and undirected networks under different temporal and geographical contexts. It is shown that we can acquire a general understanding of the industrial community through the analyses of these created networks and their centrality measures.  相似文献   

8.
Users’ click-through data is a valuable source of information about the performance of Web search engines, but it is included in few datasets for learning to rank. In this paper, inspired by the click-through data model, a novel approach is proposed for extracting the implicit user feedback from evidence embedded in benchmarking datasets. This process outputs a set of new features, named click-through features. Generated click-through features are used in a layered multi-population genetic programming framework to find the best possible ranking functions. The layered multi-population genetic programming framework is fast and provides more extensive search capability compared to the traditional genetic programming approaches. The performance of the proposed ranking generation framework is investigated both in the presence and in the absence of explicit click-through data in the utilized benchmark datasets. The experimental results show that click-through features can be efficiently extracted in both cases but that more effective ranking functions result when click-through features are generated from benchmark datasets with explicit click-through data. In either case, the most noticeable ranking improvements are achieved at the tops of the provided ranked lists of results, which are highly targeted by the Web users.  相似文献   

9.
开放存取(open access,OA)期刊属于网络深层资源且分散在互联网中,传统的搜索引擎不能对其建立索引,不能满足用户获取OA期刊资源的需求,从而造成了开放资源的浪费。针对如何集中采集万维网上分散的开放存取期刊资源的问题,提出了一个面向OA期刊的分布式主题爬虫架构。该架构采用主从分布式设计,提出了基于用户预定义规则的OA期刊页面学术信息提取方法,由一个主控中心节点控制多个可动态增减的爬行节点,采用基于Chrome浏览器的插件机制来实现分布式爬行节点的可扩展性和部署的灵活性。  相似文献   

10.
Blogs are increasingly accepted as a useful means to proliferate a variety of information on the web. As the popularity of blogs grows rapidly, a number of blog search engines have appeared recently to help users access and discover blog posts efficiently. Nevertheless, existing approaches tend to focus on ranking the blog posts according to their recency or popularity only, leaving the problem of retrieving more topic relevant posts to a user’s query largely unexplored. In this paper, we present a novel blog ranking framework, called PTRank, that improves search quality by taking account of relevance feedback from users as well as various information available from RSS feeds. A neural network method is employed to learn ranking functions that provide a relevance score between a keyword and a blog post. Extensive experiments on real blog data have been conducted to validate the proposed ranking framework for blog post search, and the results indicate that PTRank performs significantly better than the existing popular approach.  相似文献   

11.
The current proliferation of on-line information resources underscores the requirement for the ability to index collections of information and search and retrieve them in a convenient manner. This study develops criteria for analytically comparing the index and search engines and presents results for a number of freely-available search engines.A product of this research is a tool-kit capable of automatically indexing, searching, and extracting performance statistics from each of the focused search engines. This tool-kit is highly configurable and has the ability to run these benchmark tests against other engines as well.Results demonstrate that the tested search engines can be grouped into two levels. Level one engines are efficient on small- to medium-sized data collections, but show weaknesses when used for collections 100MB or larger. Level two search engines are recommended for data collections up to and beyond 100MB.  相似文献   

12.
《Knowledge》2007,20(4):321-328
The required information of users is distributed in the databases of various search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. Meta-search engines could provide a unified access for their users. In this paper, a novel meta-search engine, named as WebFusion, is introduced. WebFusion learns the expertness of the underlying search engines in a certain category based on the users’ preferences. It also uses the “click-through data concept” to give a content-oriented ranking score to each result page. Click-through data concept is the implicit feedback of the users’ preferences, which is also used as a reinforcement signal in the learning process, to predict the users’ preferences and reduces the seeking time in the returned results list. The decision lists of underling search engines have been fused using ordered weighted averaging (OWA) approach and the application of optimistic operator as weightening function has been investigated. Moreover, the results of this approach have been compared with those achieve by some popular meta-search engines such as ProFusion and MetaCrawler. Experimental results demonstrate a significant improvement on average click rate, and the variance of clicks as well as average relevancy criterion.  相似文献   

13.
Since the advent of the World Wide Web, online media archives have changed their audience from a restricted number of professionals and amateurs to the general public. This shift is not without consequences: if, on the one side, it represents an important opportunity for archives to engage in a dialogue with a larger audience, on the other side, it advocates novel forms of access that go beyond the highly specialized models underlying traditional access tools. In this paper, we propose to use 3-D graphics for designing novel tools of exploratory search in cultural heritage archives. Our approach has been deployed as an online virtual environment where the user can navigate the meaning relations over the items in the archive. Targeted at cultural heritage, the application, called Labyrinth 3D, relies on the use of cultural archetypes to inform the conceptualization of the archive and the creation of the virtual environment, with the goal of engaging the user in the exploration of the archive through the creation of personal paths.  相似文献   

14.
Easy access to large information collections is of great importance in many aspects of everyday life. However, limitations in information and communication technologies have so far prevented the average person from taking much advantage of existing resources. Historical documentaries held by national archives constitute some of the most precious yet least accessible cultural information. The ECHO project has facilitated accessibility to this type of precious information by developing a digital library (DL) service for historical films belonging to large national audiovisual archives.  相似文献   

15.
搜索引擎优化的作弊与防范   总被引:1,自引:0,他引:1  
随着信息技术的飞速发展,人们越来越多的通过搜索引擎获取信息。快速增长的互联网信息在提供丰富的资源的同时也催生了大量的搜索引擎作弊的行为。本文先是通过对搜索引擎优化的作弊手段进行分析进而提出防范作弊的几种方法。  相似文献   

16.
This article analyzes what presently constitutes one of the most problematic aspects of search engines' and Internet archives' practices: the protection of personal data and copyrighted material. These issues are often sidelined in favour of the quickest possible development of ever more fashionable e-services. Yet the lack of consideration for these issues may hinder the development of various e-services as all kinds of legal claims and suits may be brought against service providers, particularly in the European Union, which is fond of drafting complex legislation in the field. This paper shows how typical search engines and Internet archives work and the legal issues to which their practices give rise. The examples used are Google and Internet Archive, and their policies. It is argued that many of the legal checks and balances they offer with respect to the protection of personal data and copyrighted material leave a great deal to be desired. The article concludes by providing a short list of possible remedies that could, at least provisionally, alleviate some of the most urgent problems.  相似文献   

17.
In recent years, the historical data during the search process of evolutionary algorithms has received increasing attention from many researchers, and some hybrid evolutionary algorithms with machine-learning have been proposed. However, the majority of the literature is centered on continuous problems with a single optimization objective. There are still a lot of problems to be handled for multi-objective combinatorial optimization problems. Therefore, this paper proposes a machine-learning based multi-objective memetic algorithm (ML-MOMA) for the discrete permutation flowshop scheduling problem. There are two main features in the proposed ML-MOMA. First, each solution is assigned with an individual archive to store the non-dominated solutions found by it and based on these individual archives a new population update method is presented. Second, an adaptive multi-objective local search is developed, in which the analysis of historical data accumulated during the search process is used to adaptively determine which non-dominated solutions should be selected for local search and how the local search should be applied. Computational results based on benchmark problems show that the cooperation of the above two features can help to achieve a balance between evolutionary global search and local search. In addition, many of the best known Pareto fronts for these benchmark problems in the literature can be improved by the proposed ML-MOMA.  相似文献   

18.
The Internet is estimated to grow significantly as access to Web content in some non-English languages continues to increase. However, prior research in human–computer interaction (HCI) has implicitly assumed the primary language used on the Web to be English. This assumption is not true for many non-English-speaking regions where rapidly growing on-line populations access the Web in their native languages. For example, Latin America, where the majority of people speak Spanish, will have the fastest growing population in coming decades. However, existing Spanish search engines lack search, browse, and analysis capabilities. The research reported here studied human information seeking on the non-English Web. In it we developed a Spanish business Web portal that supports searching, browsing, summarization, categorization, and visualization of Spanish business Web pages. Using 42 Spanish speakers as subjects we conducted a two-phase experiment to evaluate this portal and found that, compared with a Spanish search engine and a Spanish Web directory, it achieved significantly better user ratings on information quality, cross-regional search capability, system performance attributes, and overall satisfaction. Subjects’ verbal comments strongly favored the search and browse functionality and user interface of our portal. As the Web becomes more international, this research makes three contributions: (1) an empirical evaluation of the performance level of a Spanish search portal; (2) an examination of the information quality, cross-regional search capability and usability of search engines for the non-English Web; and (3) a better understanding of non-English Web searching.  相似文献   

19.
The enormous growth in information technology has revolutionized the way people can access information sources. Web search engines have played an important role to support what the user wants precisely and efficiently from the vast web database. Different from conventional search engine approaches, searching the structure of the web, where the answer comprises more than a single page connected by hyperlinks, needs to be meritoriously developed. We propose Linear Programming models in order to generate the optimal structured web objects searching for relevant web graphs. In the model, the web objects with node and edge weights that represent the ranking measures for Webpages and hyperlinks are devised to rank the relevance in terms of keyword vectors. We also developed a tree-filtering algorithm and top-k Steiner tree algorithm that is used to provide the search recommendations in practical applications. With real web databases, the experimental study shows that the LP approach outperforms the conventional search engines with respect to execution time and quality of results.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号