期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document

Hongzhi Wang Jianzhong Li Wei Wang Xuemin Lin 《World Wide Web》2008,11(4):485-510

In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability. Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063. 相似文献

2.

Algorithmic Design Using Object-Z for Twig XML Queries Evaluation

Yang Liu Jun Sun 《Electronic Notes in Theoretical Computer Science》2006,151(2):107-124

相似文献

3.

Efficient Optimization of Multiple Subspace Skyline Queries

下载免费PDF全文

黄震华郭建奎孙圣力汪卫《计算机科学技术学报》2008,23(1):103-111

We present the first efficient sound and complete algorithm （i.e., AOMSSQ） for optimizing multiple subspace skyline queries simultaneously in this paper. We first identify three performance problems of the na/ve approach （i.e., SUBSKY） which can be used in processing arbitrary single-subspace skyline query. Then we propose a cell-dominance computation algorithm （i.e., CDCA） to efficiently overcome the drawbacks of SUBSKY. Specially, a novel pruning technique is used in CDCA to dramatically decrease the query time. Finally, based on the CDCA algorithm and the share mechanism between subspaces, we present and discuss the AOMSSQ algorithm and prove it sound and complete. We also present detailed theoretical analyses and extensive experiments that demonstrate our algorithms are both efficient and effective. 相似文献

4.

A syntactic approach to twig-query matching on XML streams

Chien-Ping Chou Author VitaeKuen-Fang JeaAuthor Vitae Heng-Hsun Liao Author Vitae 《Journal of Systems and Software》2011,84(6):993-1007

Query matching on XML streams is challenging work for querying efficiency when the amount of queried stream data is huge and the data can be streamed in continuously. In this paper, the method Syntactic Twig-Query Matching (STQM) is proposed to process queries on an XML stream and return the query results continuously and immediately. STQM matches twig queries on the XML stream in a syntactic manner by using a lexical analyzer and a parser, both of which are built from our lexical-rules and grammar-rules generators according to the user's queries and document schema, respectively. For query matching, the lexical analyzer scans the incoming XML stream and the parser recognizes XML structures for retrieving every twig-query result from the XML stream. Moreover, STQM obtains query results without a post-phase for excluding false positives, which are common in many streaming query methods. Through the experimental results, we found that STQM matches the twig query efficiently and also has good scalability both in the queried data size and the branch degree of the twig query. The proposed method takes less execution time than that of a sequence-based approach, which is widely accepted as a proper solution to the XML stream query. 相似文献

5.

PS+Pre/Post: A novel structure and access mechanism for wireless XML stream supporting twig pattern queries

《Pervasive and Mobile Computing》2014

XML data broadcast is an efficient way to disseminate XML data to a large number of mobile clients in mobile wireless networks. Recently, several indexing methods have been proposed to improve the performance of XML query processing in terms of access time and tuning time over XML streams. However, existing indexing methods cannot process twig pattern XML queries. In this paper, we propose a novel structure for streaming XML data called PS+Pre/Post by integrating the path summary technique and the pre/post labeling scheme. Our proposed XML stream structure exploits the benefits of the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the broadcast stream. Experimental results show that our proposed XML stream structure improves the performance of access time and tuning time in processing different types of XML queries. 相似文献

6.

一种基于有序对的小枝模式匹配算法

王瑞陶世群《计算机研究与发展》2009,46(Z2)

随着半结构化的数据在信息交换中越来越重要,近年来,在XML数据库中,研究工作者提出了很多匹配小枝查询的算法.这些算法对仅含祖先后裔边的查询是很有效的,但是当查询中同时含祖先后裔和父子边时,以前算法仍可能产生大量中间结果,尤其是输入和输出的规模很大时.为避免中间结果的产生,提出了一种新的算法OPTwig,它是基于有序对的,通过查询树和文档树中结点有序对的匹配来进行查询,且不需要进行归并操作.结果表明,该算法优于以前算法. 相似文献

7.

标签劣质的XML数据上的查询处理

下载免费PDF全文

姜国华姜守旭王宏志李建中高宏《计算机科学与探索》2011,5(8):673-685

XML数据中的不正确数据、不一致数据、不精确数据等劣质数据给XML数据上的有效查询处理带来了挑战。重点研究了标签劣质的XML数据上twig查询的处理方法。给出求得每个标签的拼写相近标签、松弛标签、同义标签等相似标签的方法,以及在XML文档中求出与原查询相似的所有查询结果的高效算法。通过实验证明了所提方法的有效性和效率。相似文献

8.

Compressed Data Cube for Approximate OLAP Query Processing 总被引：4，自引：0，他引：4

下载免费PDF全文

冯玉王珊《计算机科学技术学报》2002,17(5):0-0

Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling. 相似文献

9.

针对XML流数据的复杂Twig Pattern查询处理 总被引：2，自引：0，他引：2

杨卫东王清明施伯乐《软件学报》2007,18(4):893-904

XML流数据处理在研究领域引起了研究者的广泛兴趣.针对XML流数据的、具有嵌套AND/OR谓词的复杂Twig Pattern查询处理,提出一种新方法.为了提高查询处理性能,将所有Twig Pattern合并为一个共享前缀的查询树,其中,AND/OR谓词被表示为单独的抽象语法树,因而能够以文档顺序、单遍地处理复杂Twig Pattern的匹配,并避免了YFilter中对嵌套谓词进行后置处理所产生的中间结果.实验结果表明,该方法能够有效改善Twig Pattern的处理性能,尤其是在处理大文档的情况下.基于已相似文献

10.

Continually Answering Constraint <Emphasis Type="Italic">k</Emphasis>-<Emphasis Type="Italic">NN</Emphasis> Queries in Unstructured P2P Systems

下载免费PDF全文

Bin Wang Xiao-Chun Yang Guo-Ren Wang Ge Yu Lei Chen X. Sean Wang and Xue-Min Lin 《计算机科学技术学报》2008,23(4):538-556

We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer （P2P） system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost as possible which makes it more difficult than the existing distributed k-NN queries. Especially, we hope to reduce candidate peers and degrade communication cost. In this paper, we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries. Our approach is especially suitable for continuous k-NN queries when updating peers, including changing ranges of peers, dynamically leaving or joining peers, and updating data in a peer. In addition, simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle （MBR）-based query approaches, especially for continuous queries. 相似文献

11.

一种支持高效XML 路径查询的自适应结构索引 总被引：1，自引：0，他引：1

张博耿志华周傲英《软件学报》2009,20(7):1812-1824

提出了一种新的自适应结构索引:AS-Index(adaptive structural index),能够克服现有静态索引和自适应索引的缺陷,具备高效的查询和调整性能.AS-Index 建立在F&B-Index 的基础之上,其索引结构包括F&B-Index,Query-Table 和Part-Table.Query-Table 能够记录频繁查询,避免了查询过程中的冗余操作.并且,在Query-Table 的基础上提出了自底向上的查询处理过程,能够充分利用现有的频繁查询高效地回答非频繁查询.Part-Table 用于优化包含祖先后裔边的查询,进一步提高了查询性能.现有的自适应结构索引的调整粒度是XML 元素节点,调整过程往往需要遍历整个文档.而AS-Index 是基于F&B-Index 节点的增量调整,其过程是局部的,高效的,并且能够支持复杂分支查询的调整.实验结果表明,AS-Index 在查询和调整性能上优于现有的XML 结构索引.同时,相比于现有的自适应结构索引,AS-Index 针对大规模文档具有更加优良的可扩展性. 相似文献

12.

Indexing and querying XML using extended Dewey labeling scheme 总被引：1，自引：0，他引：1

Jiaheng LuAuthor Vitae Xiaofeng MengAuthor VitaeTok Wang LingAuthor Vitae 《Data & Knowledge Engineering》2011,70(1):35-59

Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance. 相似文献

13.

路径分区编码优化小枝查询 总被引：1，自引：1，他引：0

徐小双冯玉才王锋周英飚张俊《计算机科学》2010,37(3):182-187204

有效地存储查询XML文档已经成为当今数据库领域的研究热点。从XML文档的路径统计出发,提出了路径分区存储编码方案,并依此消除了小枝查询的后裔边和通配符。针对这类不含//和*的小枝查询,利用路径分区编码的特性,给出了基于结构约束节点的Twig查询算法,极大地减少了结构连接次数。实验表明,该算法能有效滤除无关元素,提高小枝查询效率。相似文献

14.

XML数据流上的高效聚集算法

王宏志李建中骆吉洲《软件学报》2008,19(8):2032-2042

XML数据流的特点是所有元素和值仅允许扫描1次.针对XML数据流上的聚集问题,提出了高效的XML数据流聚集算法.这种算法不但能够有效地支持XML数据流上具有复杂结构聚集查询的处理,而且能够有效地支持具有递归结构XML数据流上的聚集查询处理.理论分析和实验结果表明,算法能够有效地处理XML数据流上的聚集查询。并且具有很好的可扩展性. 相似文献

15.

Efficient Processing of Distributed Twig Queries Based on Node Distribution

下载免费PDF全文

Xin Bi Xiang-Guo Zhao Guo-Ren Wang 《计算机科学技术学报》2017,32(1):78-92

Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications. 相似文献

16.

Querying and ranking incomplete twigs in probabilistic XML

Jian Liu Z. M. Ma Li Yan 《World Wide Web》2013,16(3):325-353

As the next generation language of the Internet, XML has been the de-facto standard of information exchange over the web. A core operation for XML query processing is to find all the occurrences of a twig pattern in an XML database. In addition, the study of probabilistic data has become an emerging topic for various applications on the Web. Therefore, researching the combination of XML twig pattern and probabilistic data is quite significant. In prior work of probabilistic XML, the answers of a given twig query are always complete. However, complete answers with low probabilities may be deemed irrelevant while incomplete answers with high probabilities are of great significance because incomplete answers may be the potential answers that interest the users. Different from complete evaluation, evaluating incomplete twigs in probabilistic XML introduces some new challenges. On one hand, incomplete queries do not only obtain complete matches, but also return answers that contain considerable incomplete matches. On the other hand, the processing of incomplete evaluation is more complicated. It is obvious that a ranking approach should be adopted along with evaluating incomplete answers. In this paper, we propose an efficient algorithm to handle the problem of querying incomplete twigs over the probabilistic XML database. We also present a novel algorithm for ranking the incomplete answers. The experimental results show that our proposed algorithms can improve the performance of querying and ranking incomplete twigs significantly. 相似文献

17.

A query index for continuous queries on RFID streaming data

Jaekwan Park Bonghee Hong Chaehoon Ban 《中国科学F辑(英文版)》2008,51(12):2047-2061

RFID middleware collects and filters RFID streaming data to process applications＇ requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification （ECSpec） model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets. 相似文献

18.

面向PSTP查询的高效处理算法

下载免费PDF全文

周军锋李义国郭景峰《计算机科学与探索》2010,4(11):1039-1048

在使用"不完全结构的约束查询(PSTP查询)"从XML文档中获取信息时,用户可以根据自身对XML文档结构的熟悉程度,在查询表达式中灵活地嵌入结构约束条件,从而满足完全不了解、完全了解及了解部分结构信息的各种用户的查询需求。提出一种基于扩展Dewey编码的查询处理算法,可以在仅扫描一遍元素的情况下,处理任意形式的PSTP查询。不同数据集上的实验结果表明,EDPS算法在处理twig查询、不包含"*"结点的PSTP查询及包含"*"结点的PSTP查询时,综合性能明显优于已有方法。相似文献

19.

Ways to sparse representation: An overview 总被引：1，自引：0，他引：1

JingYu Yang YiGang Peng WenLi Xu QiongHai Dai 《中国科学F辑(英文版)》2009,52(4):695-703

Many algorithms have been proposed to find sparse representations over redundant dictionaries or transforms. This paper gives an overview of these algorithms by classifying them into three categories: greedy pursuit algorithms, l _p norm regularization based algorithms, and iterative shrinkage algorithms. We summarize their pros and cons as well as their connections. Based on recent evidence, we conclude that the algorithms of the three categories share the same root: l _p norm regularized inverse problem. Finally, several topics that deserve further investigation are also discussed. Supported by the Joint Research Fund for Overseas Chinese Young Scholars of the National Natural Science Foundation of China (Grant No. 60528004) and the Key Project of the National Natural Science Foundation of China (Grant No. 60528004) 相似文献

20.

Study and application of temporal index technology

XiaoPing Ye Yong Tang LuoWu Chen Huan Guo Jun Zhu KaiYuan Chen 《中国科学F辑(英文版)》2009,52(6):899-913

This paper addresses the mathematical relation on a set of periods and temporal indexing constructions as well as their applications. First we introduce two concepts, i.e. the temporal connection and temporal inclusion, which are equivalence relation and preorder relation respectively. Second, by studying some basic topics such as the division of “large” equivalence classes and the overlaps of preorder relational sets, we propose a temporal data index model (TDIM) with a tree-structure consisting of a root node, equivalence class nodes and linearly ordered branch nodes. Third, we study algorithms for the temporal querying and incremental updating as well as dynamical management within the framework of TDIM. Based on a proper mathematical supporting, TDIM can be applied to researching some significant practical cases such as temporal relational and temporal XML data and so on. Supported by the National Natural Science Foundation of China (Grant Nos. 60373081, 60673135), the Natural Science Foundation of Guangdong Province (Grant No. 05003348), the Program of New Century Excellent Person Supporting of Ministery of Education of China (Grant No. NCET-04-0805) 相似文献