首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mapping the semantics of Web text and links   总被引:1,自引:0,他引:1  
Search engines use content and links to search, rank, cluster, and classify Web pages. These information discovery applications use similarity measures derived from this data to estimate relatedness between pages. However, little research exists on the relationships between similarity measures or between such measures and semantic similarity. The author analyzes and visualizes similarity relationships in massive Web data sets to identify how to integrate content and link analysis for approximating relevance. He uses human-generated metadata from Web directories to estimate semantic similarity and semantic maps to visualize relationships between content and link cues and what these cues suggest about page meaning. Highly heterogeneous topical maps point to a critical dependence on search context.  相似文献   

2.
张祥  葛唯益  瞿裕忠 《软件学报》2009,20(10):2834-3843
随着语义网中RDF数据的大量涌现,语义搜索引擎为用户搜索RDF数据带来了便利.但是,如何自动地发现包含语义网信息资源的站点,并高效地在语义网站点中收集语义网信息资源,一直是语义搜索引擎所面临的问题.首先介绍了语义网站点的链接模型.该模型刻画了语义网站点、语义网信息资源、RDF模型和语义网实体之间的关系.基于该模型讨论了语义网实体的归属问题,并进一步定义了语义网站点的发现规则;另外,从站点链接模型出发,定义了语义网站点依赖图,并给出了对语义网站点进行排序的算法.将相关算法在一个真实的语义搜索引擎中进行了初步测试.实验结果表明,所提出的方法可以有效地发现语义网站点并对站点进行排序.  相似文献   

3.
Using keywords as inputs to search engines and receiving documents as responses remains the prevalent way to access information on the Web. Although a shift toward entity awareness is a fairly recent trend in information access, such methods remain devoid of semantics, which are increasingly recognized as the lynchpin of search, integration, and analysis. We argue that relationships are at the heart of semantics, and, as such, we envision a Web of relationships to relate content across Web resources. Under this powerful new paradigm, information access over the Web would switch from a mere document-retrieval operation to an information framework that supports insight elicitation and semantic analytics over Web resources. In this column, we outline our vision and discuss how recent Improvements in content extraction and semantic annotation will ultimately help us realize this relationship Web.  相似文献   

4.
5.
This paper presents WebOWL, an experiment in using the latest technologies to develop a Semantic Web search engine. WebOWL consists of a community of intelligent agents, acting as crawlers, that are able to discover and learn the locations of Semantic Web neighborhoods on the Web, a semantic database to store data from different ontologies, a query mechanism that supports semantic queries in OWL, and a ranking algorithm that determines the order of the returned results based on the semantic relationships of classes and individuals. The system has been implemented using Jade, Jena and the db4o object database engine and has successfully stored over one million OWL classes, individuals and properties.  相似文献   

6.
Traditional search engines have become the most useful tools to search the World Wide Web. Even though they are good for certain search tasks, they may be less effective for others, such as satisfying ambiguous or synonym queries. In this paper, we propose an algorithm that, with the help of Wikipedia and collaborative semantic annotations, improves the quality of web search engines in the ranking of returned results. Our work is supported by (1) the logs generated after query searching, (2) semantic annotations of queries and (3) semantic annotations of web pages. The algorithm makes use of this information to elaborate an appropriate ranking. To validate our approach we have implemented a system that can apply the algorithm to a particular search engine. Evaluation results show that the number of relevant web resources obtained after executing a query with the algorithm is higher than the one obtained without it.  相似文献   

7.
Folksonomy, considered a core component for Web 2.0 user-participation architecture, is a classification system made by user’s tags on the web resources. Recently, various approaches for image retrieval exploiting folksonomy have been proposed to improve the result of image search. However, the characteristics of the tags such as semantic ambiguity and non-controlledness limit the effectiveness of tags on image retrieval. Especially, tags associated with images in a random order do not provide any information about the relevance between a tag and an image. In this paper, we propose a novel image tag ranking system called i-TagRanker which exploits the semantic relationships between tags for re-ordering the tags according to the relevance with an image. The proposed system consists of two phases: 1) tag propagation phase, 2) tag ranking phase. In tag propagation phase, we first collect the most relevant tags from similar images, and then propagate them to an untagged image. In tag ranking phase, tags are ranked according to their semantic relevance to the image. From the experimental results on a Flickr photo collection about over 30,000 images, we show the effectiveness of the proposed system.  相似文献   

8.
Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However, existing methods do not leverage the contradiction between accuracy and computation complexity appropriately (e.g. Google's ‘Search related to’ and Yahoo's ‘Also Try’). In this paper, the recommended words are extracted from the search results of the query, which guarantees the real time of query suggestion properly. A scheme for ranking words based on semantic similarity presents a list of words as the query suggestion results, which ensures the accuracy of query suggestion. Moreover, the experimental results show that the proposed method significantly improves the quality of query suggestion over some popular Web search engines (e.g. Google and Yahoo). Finally, an offline experiment that compares the accuracy of snippets in capturing the number of words in a document is performed, which increases the confidence of the method proposed by the paper. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

9.
Metadata about information sources (e.g., databases and repositories) can be collected by Query Sampling (QS). Such metadata can include topics and statistics (e.g., term frequencies) about the information sources. This provides important evidence for determining which sources in the distributed information space should be selected for a given user query. The aim of this paper is to find out the semantic relationships between the information sources in order to distribute user queries to a large number of sources. Thereby, we propose an evolutionary approach for automatically conducting QS using multiple crawlers and obtaining the optimized semantic network from the sources. The aim of combining QS and evolutionary methods is to collaboratively extract metadata about target sources and optimally integrate the metadata, respectively. For evaluating the performance of contextualized QS on 122 information sources, we have compared the ranking lists recommended by the proposed method with user feedback (i.e., ideal ranks), and also computed the precision of the discovered subsumptions in terms of the semantic relationships between the target sources.  相似文献   

10.
语义元数据是有关Web内容语义信息的数据描述,它的有效表示及生成是构建语义Web的关键性技术。本文在讨论各种语义元数据的表示方法后,研究语义元数据的生成技术,在分析现有技术的特点和不足后,评述语义元数据生成技术的发展趋势。  相似文献   

11.
Web Semantics in the Clouds   总被引:3,自引:0,他引:3  
Cloud computing refers to the use of large-scale computer clusters often built from low-cost hardware and network equipment, where resources are allocated dynamically among users of the cluster. While the paradigm is not entirely novel, recent developments in software frameworks for cloud computing are making it increasingly easy for programmers to parallelize and thereby scale-up complex data-processing tasks. This article investigates how this trend is impacting the semantic Web field and shows how cloud computing can be used to analyze, query, and reason with the massive amounts of metadata handled by semantic search engines.  相似文献   

12.
Search engines result pages (SERPs) for a specific query are constructed according to several mechanisms. One of them consists in ranking Web pages regarding their importance, regardless of their semantic. Indeed, relevance to a query is not enough to provide high quality results, and popularity is used to arbitrate between equally relevant Web pages. The most well-known algorithm that ranks Web pages according to their popularity is the PageRank.The term Webspam was coined to denotes Web pages created with the only purpose of fooling ranking algorithms such as the PageRank. Indeed, the goal of Webspam is to promote a target page by increasing its rank. It is an important issue for Web search engines to spot and discard Webspam to provide their users with a nonbiased list of results. Webspam techniques are evolving constantly to remain efficient but most of the time they still consist in creating a specific linking architecture around the target page to increase its rank.In this paper we propose to study the effects of node aggregation on the well-known ranking algorithm of Google (the PageRank) in the presence of Webspam. Our node aggregation methods have the purpose to construct clusters of nodes that are considered as a sole node in the PageRank computation. Since the Web graph is way to big to apply classic clustering techniques, we present four lightweight aggregation techniques suitable for its size. Experimental results on the WEBSPAM-UK2007 dataset show the interest of the approach, which is moreover confirmed by statistical evidence.  相似文献   

13.
Although research on integrating semantics with the Web started almost as soon as the Web was in place, a concrete Semantic Web that is, a large-scale collection of distributed semantic metadata emerged only over the past four to five years. The Semantic Web's embryonic nature is reflected in its existing applications. Most of these applications tend to produce and consume their own data, much like traditional knowledge- based applications, rather than actually exploiting the Semantic Web as a large-scale information source. These first-generation semantic Web applications typically use a single ontology that supports integration of resources selected at design time.  相似文献   

14.
This paper proposes a non-domain-specific metadata ontology as a core component in a semantic model-based document management system (DMS), a potential contender towards the enterprise information systems of the next generation. What we developed is the core semantic component of an ontology-driven DMS, providing a robust semantic base for describing documents’ metadata. We also enabled semantic services such as automated semantic translation of metadata from one domain to another. The core semantic base consists of three semantic layers, each one serving a different view of documents’ metadata. The core semantic component’s base layer represents a non-domain-specific metadata ontology founded on ebRIM specification. The main purpose of this ontology is to serve as a meta-metadata ontology for other domain-specific metadata ontologies. The base semantic layer provides a generic metadata view. For the sake of enabling domain-specific views of documents’ metadata, we implemented two domain-specific metadata ontologies, semantically layered on top of ebRIM, serving domain-specific views of the metadata. In order to enable semantic translation of metadata from one domain to another, we established model-to-model mappings between these semantic layers by introducing SWRL rules. Having the semantic translation of metadata automated not only allows for effortless switching between different metadata views, but also opens the door for automating the process of documents long-term archiving. For the case study, we chose judicial domain as a promising ground for improving the efficiency of the judiciary by introducing the semantics in this field.  相似文献   

15.
语义标注元数据及其抽取技术*   总被引:4,自引:0,他引:4  
讨论了语义Web上用XML或RDF/XML标注元数据的方法以及元数据标注在语义Web上的两种存在形式:单一文件或XML包。在此基础上,介绍了从这些单独文件或XML包宿主文件中抽取元数据的方法,包括XML解析器SAX和DOM以及XML包扫描器的构造。  相似文献   

16.
Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic measures is limited by the coverage of user studies, which do not scale with the size, heterogeneity, and growth of the Web. Here we propose to leverage human-generated metadata—namely topical directories—to measure semantic relationships among massive numbers of pairs of Web pages or topics. The Open Directory Project classifies millions of URLs in a topical ontology, providing a rich source from which semantic relationships between Web pages can be derived. While semantic similarity measures based on taxonomies (trees) are well studied, the design of well-founded similarity measures for objects stored in the nodes of arbitrary ontologies (graphs) is an open problem. This paper defines an information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology. An experimental study shows that this measure improves significantly on the traditional taxonomy-based approach. This novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good agreement with semantic similarity. Surprisingly, the traditional use of text similarity turns out to be ineffective for relevance ranking.  相似文献   

17.
Graph visualization techniques for web clustering engines   总被引:1,自引:0,他引:1  
One of the most challenging issues in mining information from the World Wide Web is the design of systems that present the data to the end user by clustering them into meaningful semantic categories. We show that the analysis of the results of a clustering engine can significantly take advantage of enhanced graph drawing and visualization techniques. We propose a graph-based user interface for Web clustering engines that makes it possible for the user to explore and visualize the different semantic categories and their relationships at the desired level of detail  相似文献   

18.
19.
Thousands of users issue keyword queries to the Web search engines to find information on a number of topics. Since the users may have diverse backgrounds and may have different expectations for a given query, some search engines try to personalize their results to better match the overall interests of an individual user. This task involves two great challenges. First the search engines need to be able to effectively identify the user interests and build a profile for every individual user. Second, once such a profile is available, the search engines need to rank the results in a way that matches the interests of a given user. In this article, we present our work towards a personalized Web search engine and we discuss how we addressed each of these challenges. Since users are typically not willing to provide information on their personal preferences, for the first challenge, we attempt to determine such preferences by examining the click history of each user. In particular, we leverage a topical ontology for estimating a user’s topic preferences based on her past searches, i.e. previously issued queries and pages visited for those queries. We then explore the semantic similarity between the user’s current query and the query-matching pages, in order to identify the user’s current topic preference. For the second challenge, we have developed a ranking function that uses the learned past and current topic preferences in order to rank the search results to better match the preferences of a given user. Our experimental evaluation on the Google query-stream of human subjects over a period of 1 month shows that user preferences can be learned accurately through the use of our topical ontology and that our ranking function which takes into account the learned user preferences yields significant improvements in the quality of the search results.  相似文献   

20.
Keyword‐based search engines such as Google? index Web pages for human consumption. Sophisticated as such engines have become, surveys indicate almost 25% of Web searchers are unable to find useful results in the first set of URLs returned (Technology Review, March 2004). The lack of machine‐interpretable information on the Web limits software agents from matching human searches to desirable results. Tim Berners‐Lee, inventor of the Web, has architected the Semantic Web in which machine‐interpretable information provides an automated means to traversing the Web. A necessary cornerstone application is the search engine capable of bringing the Semantic Web together into a searchable landscape. We implemented a Semantic Web Search Engine (SWSE) that performs semantic search, providing predictable and accurate results to queries. To compare keyword search to semantic search, we constructed the Google CruciVerbalist (GCV), which solves crossword puzzles by reformulating clues into Google queries processed via the Google API. Candidate answers are extracted from query results. Integrating GCV with SWSE, we quantitatively show how semantic search improves upon keyword search. Mimicking the human brain's ability to create and traverse relationships between facts, our techniques enable Web applications to ‘think’ using semantic reasoning, opening the door to intelligent search applications that utilize the Semantic Web. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号