首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
互联网上大部分的数字化信息都与地球上的地点和位置关联,信息检索查询中大量地包含地理信息,传统的基于关键字匹配方法没有考虑检索中的空间关系,无法满足此类检索需求。地理信息检索根据地理范围从文档中获取空间语义匹配的地理知识文档,成为国内外信息检索和GIS领域的热点研究方向。提出了一个地理信息检索的基本系统框架,依据该框架对地理信息知识库、地理信息抽取、地理信息检索模型、混合索引和检索可视化等关键性技术进行了分类概括总结。在对已有技术进行深入对比分析的基础上,指出了该领域未来的研究工作和面临的挑战,并提供了大量的参考文献。  相似文献   

2.
GML文档是XML技术在GIS方面的应用,成为空间数据在Internet上的实际表示、传输和交换的标准。目前,GML文档的查询是GIS领域的研究热点。对这一问题,研究了GML文档的数据特点和结构特点,设计了一种新的索引结构--GB树,GB树是专门针对GML文档中空间数据节点的索引结构。将XML Twig模式查询思想引入GML文档查询,借助GB树的索引特点,提出了GML文档的Twig模式查询算法--GMLTwigStackGB。GMLTwigStackGB算法保留了XML文档Twig模式查询算法的优势和特点,具有完整的空间查询功能。测试实验表明,该算法能够高效地满足GML文档上的各种数据查询。  相似文献   

3.
Geographic Information Retrieval is concerned with retrieving documents in response to a spatially related query. This paper addresses the ranking of documents by both textual and spatial relevance. To this end, we introduce multi-dimensional scattered ranking, where textually and spatially similar documents are ranked spread in the list, instead of consecutively. The effect of this is that documents close together in the ranked list have less redundant information. We present various ranking methods of this type, efficient algorithms to implement them, and experiments to show the outcome of the methods.*This research is supported by the EU-IST Project No. IST-2001-35047 (SPIRIT).  相似文献   

4.
With the rocket development of the Internet, WWW(World Wide Web), mobile computing and GPS (Global Positioning System) services, location-based services like Web GIS (Geographical Information System) portals are becoming more and more popular. Spatial keyword queries over GIS spatial data receive much more attention from both academic and industry communities than ever before. In general, a spatial keyword query containing spatial location information and keywords is to locate a set of spatial objects that satisfy the location condition and keyword query semantics. Researchers have proposed many solutions to various spatial keyword queries such as top-K keyword query, reversed kNN keyword query, moving object keyword query, collective keyword query, etc. In this paper, we propose a density-based spatial keyword query which is to locate a set of spatial objects that not only satisfies the query’s textual and distance condition, but also has a high density in their area. We use the collective keyword query semantics to find in a dense area, a group of spatial objects whose keywords collectively match the query keywords. To efficiently process the density based spatial keyword query, we use an IR-tree index as the base data structure to index spatial objects and their text contents and define a cost function over the IR-tree indexing nodes to approximately compute the density information of areas. We design a heuristic algorithm that can efficiently prune the region according to both the distance and region density in processing a query over the IR-tree index. Experimental results on datasets show that our method achieves desired results with high performance.  相似文献   

5.
空间数据索引与查询技术研究及其应用   总被引:3,自引:3,他引:3  
由于空间数据本身的复杂性,以及目前对海量空间数据快速查询的要求日益提高,当前地理信息系统正面临着大数据量空间数据存储及管理的挑战。因此,该文在对当今空间存储方法及空间查询的一些主要技术进行比较和分析之后,提出了基于R树的优化的空间查询系统框架设计,并在一个地理信息系统的应用实例中实现了该设计。  相似文献   

6.
An increasing number of emerging web database applications deal with large georeferenced data sets. However, exploring these large data sets through spatial queries can be very time and resource intensive. The need for interactive spatial queries has arisen in many applications such as Geographic Information Systems (GIS) for efficient decision-support. In this paper, we propose a new interactive spatial query processing technique for GIS. We present a family of the Incremental Refining Spatial Join (IRSJ) algorithms that can be used to report incrementally refined running estimates for aggregate queries while simultaneously displaying the actual query result tuples of the data sets sampled so far. Our goal is to minimize the time until an acceptably accurate estimate of the query result is available (to users) measured by a confidence interval. Our approach enables more interactive data exploration and analysis. While similar work has been done in relational databases, to the best of our knowledge, this is the first work using this approach in GIS. We investigate and evaluate different sampling methodologies through extensive experimental performance comparisons. Experiments on both real and synthetic data show an order of magnitude response time improvement relative to the final answer obtained when using a full R-tree join. We also show the impact of different index structures on the performance of our algorithms using three known sampling methods.  相似文献   

7.
Recently, Reverse k Nearest Neighbors (RkNN) queries, returning every answer for which the query is one of its k nearest neighbors, have been extensively studied on the database research community. But the RkNN query cannot retrieve spatio-textual objects which are described by their spatial location and a set of keywords. Therefore, researchers proposed a RSTkNN query to find these objects, taking both spatial and textual similarity into consideration. However, the RSTkNN query cannot control the size of answer set and to be sorted according to the degree of influence on the query. In this paper, we propose a new problem Ranked Reverse Boolean Spatial Keyword Nearest Neighbors query called Ranked-RBSKNN query, which considers both spatial similarity and textual relevance, and returns t answers with most degree of influence. We propose a separate index and a hybrid index to process such queries efficiently. Experimental results on different real-world and synthetic datasets show that our approaches achieve better performance.  相似文献   

8.
There is a significant commercial and research interest in location-based web search engines. Given a number of search keywords and one or more locations (geographical points) that a user is interested in, a location-based web search retrieves and ranks the most textually and spatially relevant web pages. In this type of search, both the spatial and textual information should be indexed. Currently, no efficient index structure exists that can handle both the spatial and textual aspects of data simultaneously and accurately. Existing approaches either index space and text separately or use inefficient hybrid index structures with poor performance and inaccurate results. Moreover, most of these approaches cannot accurately rank web-pages based on a combination of space and text and are not easy to integrate into existing search engines. In this paper, we propose a new index structure called Spatial-Keyword Inverted File for Points to handle point-based indexing of web documents in an integrated/efficient manner. To seamlessly find and rank relevant documents, we develop a new distance measure called spatial tf-idf. We propose four variants of spatial-keyword relevance scores and two algorithms to perform top-k searches. As verified by experiments, our proposed techniques outperform existing index structures in terms of search performance and accuracy.  相似文献   

9.
Technology in the field of digital media generates huge amounts of nontextual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and user-friendly retrieval is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while insuring relevant information is not overlooked (high recall). The traditional solution employs keyword-based search. The only documents retrieved are those containing user-specified keywords. But many documents convey desired semantic information without containing these keywords. This limitation is frequently addressed through query expansion mechanisms based on the statistical co-occurrence of terms. Recall is increased, but at the expense of deteriorating precision. One can overcome this problem by indexing documents according to context and meaning rather than keywords, although this requires a method of converting words to meanings and the creation of a meaning-based index structure. We have solved the problem of an index structure through the design and implementation of a concept-based model using domain-dependent ontologies. An ontology is a collection of concepts and their interrelationships that provide an abstract view of an application domain. With regard to converting words to meaning, the key issue is to identify appropriate concepts that both describe and identify documents as well as language employed in user requests. This paper describes an automatic mechanism for selecting these concepts. An important novelty is a scalable disambiguation algorithm that prunes irrelevant concepts and allows relevant ones to associate with documents and participate in query generation. We also propose an automatic query expansion mechanism that deals with user requests expressed in natural language. This mechanism generates database queries with appropriate and relevant expansion through knowledge encoded in ontology form. Focusing on audio data, we have constructed a demonstration prototype. We have experimentally and analytically shown that our model, compared to keyword search, achieves a significantly higher degree of precision and recall. The techniques employed can be applied to the problem of information selection in all media types.Received: 7 October 2002, Accepted: 20 May 2003, Published online: 30 September 2003Edited by: E. LochovskyThis research has been funded [or funded in part] by the Integrated Media Systems Center, a National Science Foundation Engineering Research Center, Cooperative Agreement No. EEC-9529152.  相似文献   

10.
基于固定网格划分和面向类对象的四分树空间索引机制   总被引:11,自引:0,他引:11  
本文针对地理信息系统中的空间对象形态的不同规则性和空间查询区域的不规则性。提出了一种基于固定网格划分的四分树空间索引机制。  相似文献   

11.
在信息检索领域的排序任务中, 神经网络排序模型已经得到广泛使用. 神经网络排序模型对于数据的质量要求极高, 但是, 信息检索数据集通常含有较多噪音, 不能精确得到与查询不相关的文档. 为了训练一个高性能的神经网络排序模型, 获得高质量的负样本, 则至关重要. 借鉴现有方法doc2query的思想, 本文提出了深度、端到端的模型AQGM, 通过学习不匹配查询文档对, 生成与文档不相关、原始查询相似的对抗查询, 增加了查询的多样性,增强了负样本的质量. 本文利用真实样本和AQGM模型生成的样本, 训练基于BERT的深度排序模型, 实验表明,与基线模型BERT-base对比, 本文的方法在MSMARCO和TrecQA数据集上, MRR指标分别提升了0.3%和3.2%.  相似文献   

12.
Yun  Tae-Seob  Whang  Kyu-Young  Kwon  Hyuk-Yoon  Kim  Jun-Sung  Song  Il-Yeol 《World Wide Web》2019,22(6):2437-2467

We propose two-dimensional indexing—a novel in-memory indexing architecture that operates over distributed memory of a massively-parallel search engine. The goal of two-dimensional indexing is to provide a one-integrated-memory view as in a single node system using one large integrated memory. In two-dimensional indexing, we partition the entire index into n× m fragments and distribute them over the memories of multiple nodes in such a way that each fragment is entirely stored in main memory of one node. The proposed architecture is not only scalable as it uses a scaled-out shared-nothing architecture but also is capable of achieving low query response time as it processes queries in main memory. We also propose the concept of the one-memory point, which is the amount of the memory space required to completely store the entire index in main memory providing a one-integrated-memory view. We first prove the effectiveness of two-dimensional indexing with single-keyword queries, and then, extend the notion so as to be able to handle multiple-keyword queries. To handle multiple-keyword queries, we adopt pre-join that materializes a multiple-keyword query a priori as well as a new notion of semi-memory join that obviates extensive communication overhead to perform join across multiple nodes. In experiments using the real-life search query set over a database consisting of 100 million Web documents crawled, we show that two-dimensional indexing can effectively provide a one-integrated-memory view without too much of additional memory compared with the single node system using one large integrated memory. We also show that, with a six-node prototype, in an ideal case, it significantly improves the query processing performance over a disk-based search engine with an equivalent amount of in-memory buffer but without two-dimensional indexing — by up to 535.54 times. This improvement is expected to get larger as the system is scaled-out with a larger number of machines.

  相似文献   

13.
Extensible Markup Language (XML) documents consist of text data plus structured data (markup). XPath allows to query both text and structure. Evaluating such hybrid queries is challenging. We present a system for in‐memory evaluation of XPath search queries, that is, queries with text and structure predicates, yet without advanced features such as backward axes, arithmetics, and joins. We show that for this query fragment, which contains Forward Core XPath, our system, dubbed Succinct XML Self‐Index (‘SXSI’), outperforms existing systems by 1–3 orders of magnitude. SXSI is based on state‐of‐the‐art indexes for text and structure data. It combines two novelties. On one hand, it represents the XML data in a compact indexed form, which allows it to handle larger collections in main memory while supporting powerful search and navigation operations over the text and the structure. On the other hand, it features an execution engine that uses tree automata and cleverly chooses evaluation orders that leverage the speeds of the respective indexes. SXSI is modular and allows seamless replacement of its indexes. This is demonstrated through experiments with (1) a text index specialized for search of bio sequences, and (2) a word‐based text index specialized for natural language search. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
提出了一种用于搜索XML文档的新的索引方法即RIST。通过采用代码化的结构序列(SES)来表示XML文档和XML查询,得出查询XML数据等同于查找子序列匹配。RIST采用树结构作为查询的基本单元,从而避免了代价高昂的连接操作。另外,RIST还在XML文档的内容和结构上提供了一个统一的索引,所以它的一个很明显的优势就是克服了仅仅根据内容或结构建立索引的弊端。实验表明RIST在支持结构查询上是一种高效的方法。  相似文献   

15.
一种基于HBase的高效空间关键字查询策略   总被引:2,自引:0,他引:2  
随着移动定位技术的发展以及智能手机的普及,互联网中空间文本对象的数量正在急速增长,如何在规模庞大且动态增长的空间文本对象中进行高效的空间关键字查询成为了许多空间关键字查询应用所关心的问题.现有的方法通常利用基于R树和倒排索引的混合索引结构来处理空间关键字查询,然而,面对数量巨大而且不断增长的空间文本对象,这些方法往往难以为空间关键字查询的高效性和扩展性提供支持.对此,提出一种基于HBase的空间文本数据索引结构SK-HBase.SK-HBase以HBase作为数据存储,通过有效的数据分配策略对空间文本对象的空间信息和文本信息同时进行索引.在SK-HBase的基础上,本文提出了两种空间关键字查询算法,以保证不同空间范围下的空间关键字查询的高效性和可扩展性.实验证明,我们的方法能够在海量数据下进行高效的空间关键字查询并具有良好的可扩展性.  相似文献   

16.
In this paper, we propose an efficient solution for processing continuous range spatial keyword queries over moving spatio-textual objects (namely, CRSK-mo queries). Major challenges in efficient processing of CRSK-mo queries are as follows: (i) the query range is determined based on both spatial proximity and textual similarity; thus a straightforward spatial proximity based pruning of the search space is not applicable as any object far from a query location with a high textual similarity score can still be the answer (and vice versa), (ii) frequent location updates may invalidate a query result, and thus require frequent re-computing of the result set for any object updates. To address these challenges, the key idea of our approach is to exploit the spatial and textual upper bounds between queries and objects to form safe zones (at the client-side) and buffer regions (at the server-side), and then use these bounds to quickly prune objects and queries through smart in-memory data structures. We conduct extensive experiments with a synthetic dataset that verify the effectiveness and efficiency of our proposed algorithm.  相似文献   

17.
摘要:本文提出了一种用于搜索XML文档的新的索引方法即RIST。通过采用代码化的结构序列(SES)来表示XML文档和XML查询,我们得出查询XML数据等同于查找子序列匹配。RIST采用树结构作为查询的基本单元,从而避免了代价高昂的连接操作。另外,RIST还在XML文档的内容和结构上提供了一个统一的索引,所以它的一个很明显的优势就是克服了仅仅根据内容或结构建立索引的弊端。实验表明RIST在支持结构查询上是一种高效的方法。  相似文献   

18.
针对XML数据特有的树型结构模式,提出了一种将树型结构的XML数据和查询语句转化为特定格式的字符串,基于串匹配原理对结构复杂的XML数据进行查询的方法,避免了传统的基于路径的查询方式所必需的路径之间的连接(join)操作,从而提高查询效率。利用本文提出的编码方式,可以建立关于XML数据结构和数据内容舍为一体的索引。实验显示,本文使用的针对XML数据查询的方法比传统的基于连接操作的数据查询方式高效,且本方法具有良好的扩展性。  相似文献   

19.
The inverted index is widely used in the existing information retrieval field. In order to support containment queries for structured documents such as XML, it needs to be extended. Previous work suggested an extension in storing the inverted index for XML documents and processing containment queries, and compared two implementation options: using an RDBMS and using an Information Retrieval (IR) engine. However, the previous work has two drawbacks in extending the inverted index. One is that the RDBMS implementation is generally much worse in the performance than the IR engine implementation. The other is that when a containment query is processed in an RDBMS, the number of join operations increases in proportion to the number of containment relationships in the query and a join operation always occurs between large relations. In order to solve these problems, we propose in this paper a novel approach to extend the inverted index for containment query processing, and show its effectiveness through experimental results. In particular, our performance study shows that (1) our RDBMS approach almost always outperforms the previous RDBMS and IR approaches, (2) our RDBMS approach is not far behind our IR approach with respect to performance, and (3) our approach is scalable to the number of containment relationships in queries. Therefore, our results suggest that, without having to make any modifications on the RDBMS engine, a native implementation using an RDBMS can support containment queries as efficiently as an IR implementation.  相似文献   

20.
dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号