首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
BusSEngine: a business search engine   总被引:1,自引:1,他引:0  
With the emergence of World Wide Web, business’ databases are increasingly being queried directly by customers. The customers may not be aware of the underlying data and its structure, and might have never learned a query language that enables them to issue structured queries. Some of the business’ employees who query the databases may also not be aware of the structure of the data, but they are likely to be aware of some labels of elements containing data. We propose in this article: (1) an XML Keyword-Based search engine for answering business’ customers called BusSEngine-K, and (2) an XML loosely Structured-Based search engine for answering business’ employees called BusSEngine-L. The two engines employ novel context-driven search techniques and are built on top of XQuery search engine. The two engines were evaluated experimentally and compared with three recently proposed XML search engines. The results showed marked improvement.  相似文献   

2.
Keyword‐based search engines such as Google? index Web pages for human consumption. Sophisticated as such engines have become, surveys indicate almost 25% of Web searchers are unable to find useful results in the first set of URLs returned (Technology Review, March 2004). The lack of machine‐interpretable information on the Web limits software agents from matching human searches to desirable results. Tim Berners‐Lee, inventor of the Web, has architected the Semantic Web in which machine‐interpretable information provides an automated means to traversing the Web. A necessary cornerstone application is the search engine capable of bringing the Semantic Web together into a searchable landscape. We implemented a Semantic Web Search Engine (SWSE) that performs semantic search, providing predictable and accurate results to queries. To compare keyword search to semantic search, we constructed the Google CruciVerbalist (GCV), which solves crossword puzzles by reformulating clues into Google queries processed via the Google API. Candidate answers are extracted from query results. Integrating GCV with SWSE, we quantitatively show how semantic search improves upon keyword search. Mimicking the human brain's ability to create and traverse relationships between facts, our techniques enable Web applications to ‘think’ using semantic reasoning, opening the door to intelligent search applications that utilize the Semantic Web. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

3.
传统搜索引擎是基于关键字的检索,然而文档的关键字未必和文档有关,而相关的文档也未必显式地包含此关键字。基于语义Web的搜索引擎利用本体技术,可以很好地对关键字进行语义描述。当收到用户提交的搜索请求时,先在已经建立好的本体库的基础上对该请求进行概念推理,然后将推理结果提交给传统的搜索引擎,最终将搜索结果返回给用户。相对于传统的搜索引擎,基于语义Web的搜索引擎有效地提高了搜索的查全率和查准率。  相似文献   

4.
A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.  相似文献   

5.
网络上的专业搜索引擎数量众多,普通用户在选择时往往无所适从。文章提出了一个自动的查询导向系统,可以将用户查询自动导向到合适的专业搜索引擎,解决了这个矛盾。  相似文献   

6.
Domain-specific Web search with keyword spices   总被引:4,自引:0,他引:4  
Domain-specific Web search engines are effective tools for reducing the difficulty experienced when acquiring information from the Web. Existing methods for building domain-specific Web search engines require human expertise or specific facilities. However, we can build a domain-specific search engine simply by adding domain-specific keywords, called "keyword spices," to the user's input query and forwarding it to a general-purpose Web search engine. Keyword spices can be effectively discovered from Web documents using machine learning technologies. The paper describes domain-specific Web search engines that use keyword spices for locating recipes, restaurants, and used cars.  相似文献   

7.
Traditional search engines have become the most useful tools to search the World Wide Web. Even though they are good for certain search tasks, they may be less effective for others, such as satisfying ambiguous or synonym queries. In this paper, we propose an algorithm that, with the help of Wikipedia and collaborative semantic annotations, improves the quality of web search engines in the ranking of returned results. Our work is supported by (1) the logs generated after query searching, (2) semantic annotations of queries and (3) semantic annotations of web pages. The algorithm makes use of this information to elaborate an appropriate ranking. To validate our approach we have implemented a system that can apply the algorithm to a particular search engine. Evaluation results show that the number of relevant web resources obtained after executing a query with the algorithm is higher than the one obtained without it.  相似文献   

8.
针对搜索引擎查询结果缓存与预取问题,该文提出了一种基于查询特性的搜索引擎查询结果缓存与预取方法,该方法包括用来指导预取的查询结果页码预测模型和缓存与预取算法框架,用于提高搜索引擎系统性能。通过对国内某著名中文商业搜索引擎的某段时间的用户查询日志分析得出,用户对不同查询返回的查询结果所浏览的页数具有显著的非均衡性,结合该特性设计查询结果页码预测模型来进行预取和分区缓存。在该搜索引擎两个月的大规模真实用户查询日志上的实验结果表明,与传统的方法相比,该方法可以获得3.5%~8.45%的缓存命中率提升。  相似文献   

9.
Thousands of users issue keyword queries to the Web search engines to find information on a number of topics. Since the users may have diverse backgrounds and may have different expectations for a given query, some search engines try to personalize their results to better match the overall interests of an individual user. This task involves two great challenges. First the search engines need to be able to effectively identify the user interests and build a profile for every individual user. Second, once such a profile is available, the search engines need to rank the results in a way that matches the interests of a given user. In this article, we present our work towards a personalized Web search engine and we discuss how we addressed each of these challenges. Since users are typically not willing to provide information on their personal preferences, for the first challenge, we attempt to determine such preferences by examining the click history of each user. In particular, we leverage a topical ontology for estimating a user’s topic preferences based on her past searches, i.e. previously issued queries and pages visited for those queries. We then explore the semantic similarity between the user’s current query and the query-matching pages, in order to identify the user’s current topic preference. For the second challenge, we have developed a ranking function that uses the learned past and current topic preferences in order to rank the search results to better match the preferences of a given user. Our experimental evaluation on the Google query-stream of human subjects over a period of 1 month shows that user preferences can be learned accurately through the use of our topical ontology and that our ranking function which takes into account the learned user preferences yields significant improvements in the quality of the search results.  相似文献   

10.
针对传统Web教育主体难以获得高可用教育资源的问题,提出了一种面向语义主题相似度的Web教育资源查询方法。该方法建立了本体概念语义网络(Ontology Concept Semantic Network,OCSN),在此基础上,设计了基于语义主题相似度匹配的概念检索方法:在检索前主动将教育资源根据其语义和主题组织到本体概念语义网络中,然后建立一个基于语义特性的Web教育资源发现的垂直搜索引擎,并通过构造满足条件的相似度函数,将对应的语义距离映射为相似度,有效地提高了查询效率。实验结果表明此方法能够提高Web教育资源的查准率和查全率。  相似文献   

11.
Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web search engine, the query can be represented as a set of terms found on the web pages returned by search engine. In this way, each query can be considered as a point in high-dimensional space and standard classification algorithms such as regression can be applied. However, traditional regression is too flexible in situations with large numbers of highly correlated predictor variables. It may suffer from the overfitting problem. By using search click information, the semantic relationship between queries can be incorporated into the learning system as a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled queries, we select the one which best preserves the semantic relationship between queries. We present experimental evidence suggesting that the regularized regression algorithm is able to use search click information effectively for query classification.  相似文献   

12.
Most Web pages contain location information, which are usually neglected by traditional search engines. Queries combining location and textual terms are called as spatial textual Web queries. Based on the fact that traditional search engines pay little attention in the location information in Web pages, in this paper we study a framework to utilize location information for Web search. The proposed framework consists of an offline stage to extract focused locations for crawled Web pages, as well as an online ranking stage to perform location-aware ranking for search results. The focused locations of a Web page refer to the most appropriate locations associated with the Web page. In the offline stage, we extract the focused locations and keywords from Web pages and map each keyword with specific focused locations, which forms a set of <keyword, location> pairs. In the second online query processing stage, we extract keywords from the query, and computer the ranking scores based on location relevance and the location-constrained scores for each querying keyword. The experiments on various real datasets crawled from nj.gov, BBC and New York Time show that the performance of our algorithm on focused location extraction is superior to previous methods and the proposed ranking algorithm has the best performance w.r.t different spatial textual queries.  相似文献   

13.
查询推荐是搜索引擎系统中的一项重要技术,其通过推荐更合适的查询以提高用户的搜索体验。现有方法能够找到直接通过某种属性关联的相似查询,却忽略了具有间接关联的语义相关查询。该文将用户查询及查询间直接联系建模为查询关系图,并在图结构相似度算法SimRank的基础上提出了加权SimRank (简称WSimRank)用于查询推荐。WSimRank综合考虑了查询关系图的全局信息,因而能挖掘出查询间的间接关联和语义关系。然而,WSimRank复杂度太高而难以实用,该文将WSimRank转换为一个状态层次图的遍历和计算过程,进而采用动态规划、剪枝等策略对其进行优化从而可以实际应用。在大规模真实Web搜索日志上的实验表明, WSimRank在各项评价指标上均优于SimRank和传统查询推荐方法,其MAP指标接近0.9。  相似文献   

14.
One of the useful tools offered by existing web search engines is query suggestion (QS), which assists users in formulating keyword queries by suggesting keywords that are unfamiliar to users, offering alternative queries that deviate from the original ones, and even correcting spelling errors. The design goal of QS is to enrich the web search experience of users and avoid the frustrating process of choosing controlled keywords to specify their special information needs, which releases their burden on creating web queries. Unfortunately, the algorithms or design methodologies of the QS module developed by Google, the most popular web search engine these days, is not made publicly available, which means that they cannot be duplicated by software developers to build the tool for specifically-design software systems for enterprise search, desktop search, or vertical search, to name a few. Keyword suggested by Yahoo! and Bing, another two well-known web search engines, however, are mostly popular currently-searched words, which might not meet the specific information needs of the users. These problems can be solved by WebQS, our proposed web QS approach, which provides the same mechanism offered by Google, Yahoo!, and Bing to support users in formulating keyword queries that improve the precision and recall of search results. WebQS relies on frequency of occurrence, keyword similarity measures, and modification patterns of queries in user query logs, which capture information on millions of searches conducted by millions of users, to suggest useful queries/query keywords during the user query construction process and achieve the design goal of QS. Experimental results show that WebQS performs as well as Yahoo! and Bing in terms of effectiveness and efficiency and is comparable to Google in terms of query suggestion time.  相似文献   

15.
一种基于语义理解的元搜索引擎的研究   总被引:5,自引:0,他引:5  
通过对查询短语的结构分析,发现查询短语通常由关键词和特征词构成。特征词是对网页内容的概括,它预示着网页中包含一组特定的特征词条。基于该思想建立了面向Web网页内容的特征库。以元搜索引擎为研究对象,研究了以Web网页内容特征库为基础实现对查询短语进行语义理解的方法,提出了相关度级别的算法,对库中已收入的特征词进行了查询测试,查准率为86.7%。实验表明,该模型基本实现了对查询短语的理解,对提高搜索引擎的查准率有显著的效果。  相似文献   

16.
Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.  相似文献   

17.
The fast development of GPS equipped devices has aroused widespread use of spatial keyword querying in location based services nowadays. Existing spatial keyword query methodologies mainly focus on the spatial and textual similarities, while leaving the semantic understanding of keywords in spatial Web objects and queries to be ignored. To address this issue, this paper studies the problem of semantic based spatial keyword querying. It seeks to return the k objects most similar to the query, subject to not only their spatial and textual properties, but also the coherence of their semantic meanings. To achieve that, we propose novel indexing structures, which integrate spatial, textual and semantic information in a hierarchical manner, so as to prune the search space effectively in query processing. Extensive experiments are carried out to evaluate and compare them with other baseline algorithms.  相似文献   

18.
This study presents an analysis of users' queries directed at different search engines to investigate trends and suggest better search engine capabilities. The query distribution among search engines that includes spawning of queries, number of terms per query and query lengths is discussed to highlight the principal factors affecting a user's choice of search engines and evaluate the reasons of varying the length of queries. The results could be used to develop long to short term business plans for search engine service providers to determine whether or not to opt for more focused topic specific search offerings to gain better market share.  相似文献   

19.
Search engines retrieve and rank Web pages which are not only relevant to a query but also important or popular for the users. This popularity has been studied by analysis of the links between Web resources. Link-based page ranking models such as PageRank and HITS assign a global weight to each page regardless of its location. This popularity measurement has shown successful on general search engines. However unlike general search engines, location-based search engines should retrieve and rank higher the pages which are more popular locally. The best results for a location-based query are those which are not only relevant to the topic but also popular with or cited by local users. Current ranking models are often less effective for these queries since they are unable to estimate the local popularity. We offer a model for calculating the local popularity of Web resources using back link locations. Our model automatically assigns correct locations to the links and content and uses them to calculate new geo-rank scores for each page. The experiments show more accurate geo-ranking of search engine results when this model is used for processing location-based queries.  相似文献   

20.
一种基于用户标记的搜索结果排序算法   总被引:1,自引:0,他引:1  
随着计算机网络的快速发展,网络上的信息量也日益纷繁复杂.如何准确、快速地帮助人们从海量网络数据中获取所需信息,这是目前搜索引擎首要解决的问题,为此,各种搜索排序算法应运而生.但是目前,网页信息的表达形式都十分简单,用户描述查询的形式更是十分简单,这就造成了在判断网页内容与用户查询相关性时十分困难.首先对现有的搜索引擎排序算法进行了分类总结,分析它们的优缺点.然后提出了一种基于用户反馈的语义标记的新方法,最后采用多种评估方法与Google搜索结果进行对比分析.实验结果表明,利用该方法所得到的排序结果比Google的排序结果更接近用户需求.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号