期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel labeling of massive XML data with MapReduce

Hyebong Choi Kyong-Ha Lee Yoon-Joon Lee 《The Journal of supercomputing》2014,67(2):408-437

The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label a large volume of XML data in a timely manner. To address this issue, we devise parallel tree labeling algorithms for massive XML data. Specifically, we focus on how to efficiently label a single large XML file in parallel. We first propose parallel versions of two prominent tree labeling schemes based on the MapReduce framework. We then present techniques for runtime workload balancing and data repartition to solve performance issues caused by data skewness and MapReduce’s inherited limitation. Through extensive experiments with synthetic and real-world datasets on 15 nodes, we show that our parallel labeling algorithms are up to 17 times faster than conventional algorithms, providing strong durability against data skewness. 相似文献

2.

基于MapReduce的XML结构连接处理

《计算机科学与探索》2016,(8):1080-1091

可扩展标记语言(extensible markup language,XML)已经成为Web上数据表达和数据交换的事实标准,Hadoop已成为云计算和大数据处理典型支撑框架之一,基于Hadoop MapReduce来实现XML查询处理十分必要。为了实现基于MapReduce的XML查询处理,首先实现了区间编码、前缀编码和层次编码等3种不同的XML数据编码方式,以此为基础来研究和实现基于MapReduce的XML结构连接处理。为查询处理建立了代价模型,通过代价估算获得优化的查询计划树。最后开展了XML查询处理实验评估,结果表明相对其他两种XML编码方式,区间编码方式下实现的查询处理速度较快,基于代价估算的优化方法能进一步有效地提高XML查询处理性能。相似文献

3.

Document‐centric XML workflows with fragment digital signatures

Phillip J. Brooke Richard F. Paige Christopher Power 《Software》2010,40(8):655-672

The use of digital document management and processing is increasing. Traditional workflows of paper forms are being replaced by electronic workflows of digital documents. These workflows often require multiple signatures to be added to the documents for authorization and/or integrity. We describe examples of digital workflows that illustrate problems with digital signatures: i.e. the use of digital signatures across entire documents results in signatures that can be unnecessarily invalidated by subsequent modification of the document. We propose the use of fragment signatures, which reduce unnecessary invalidation of signatures and enable greater concurrency in workflows. Our approach is document‐centric and does not use a centralized database. We report on an implementation that allows fragment signatures over document fragments as well as the attachment (or embedding) of other documents. This allows collaborative or cooperative editing to occur on parts of a document without disturbing unrelated signatures. We describe the lessons learned from our deployments and offer further ways to embed such signatures into other document types. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

4.

Parallelizing quantum circuits

Anne Broadbent Elham Kashefi 《Theoretical computer science》2009

We present a novel automated technique for parallelizing quantum circuits via the forward and backward translation to measurement-based quantum computing patterns, and analyze the trade off in terms of depth and space complexity. As a result we distinguish a class of polynomial depth circuits that can be parallelized to logarithmic depth while adding only a polynomial number of auxiliary qubits. In particular, we provide for the first time a full characterization of patterns with flow of arbitrary depth, based on the notion of influencing walks and a simple rewriting system on the angles of the measurement. Our method provides new insight for constructing parallel circuits and as applications, we demonstrate several classes of circuits that can be parallelized to constant or logarithmic depth. Furthermore, we prove a logarithmic separation in terms of quantum depth between the quantum circuit model and the measurement-based model. 相似文献

5.

Parallelizing Feature Selection

Jerffeson Teixeira de Souza Stan Matwin Nathalie Japkowicz 《Algorithmica》2006,45(3):433-456

Classification is a key problem in machine learning/data mining. Algorithms for classification have the ability to predict the class of a new instance after having been trained on data representing past experience in classifying instances. However, the presence of a large number of features in training data can hurt the classification capacity of a machine learning algorithm. The Feature Selection problem involves discovering a subset of features such that a classifier built only with this subset would attain predictive accuracy no worse than a classifier built from the entire set of features. Several algorithms have been proposed to solve this problem. In this paper we discuss how parallelism can be used to improve the performance of feature selection algorithms. In particular, we present, discuss and evaluate a coarse-grained parallel version of the feature selection algorithm FortalFS. This algorithm performs well compared with other solutions and it has certain characteristics that makes it a good candidate for parallelization. Our parallel design is based on the master--slave design pattern. Promising results show that this approach is able to achieve near optimum speedups in the context of Amdahl's Law. 相似文献

6.

XML Compression via Directed Acyclic Graphs

Mireille Bousquet-Mélou Markus Lohrey Sebastian Maneth Eric Noeth 《Theory of Computing Systems》2015,57(4):1322-1371

相似文献

7.

Tractable XML data exchange via relations

Rada CHIRKOVA Leonid LIBKIN Juan L. REUTTER 《Frontiers of Computer Science》2012,6(3):243-263

We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with the source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and in XML data exchange. This indicates that to make the use of relational systems possible, restrictions have to be imposed on XML schemas and mappings, as well as on XML shredding schemes. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the in-lining technique for data-exchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system. 相似文献

8.

Foundations of Fast Communication via XML 总被引：3，自引：0，他引：3

Welf M. Löwe Markus L. Noga Thilo S. Gaul 《Annals of Software Engineering》2002,13(1-4):357-379

Communication with XML often involves pre-agreed document types. In this paper, we propose an offline parser generation approach to enhance online processing performance for documents conforming to a given DTD. Our examination of DTDs and the languages they define demonstrates the existence of ambiguities. We present an algorithm that maps DTDs to deterministic context-free grammars defining the same languages. We prove the grammars to be LL(1) and LALR(1), making them suitable for standard parser generators. Our experiments show the superior performance of generated optimized parsers. Our results generalize from DTDs to XML schema specifications with certain restrictions, most notably the absence of namespaces, which exceed the scope of context-free grammars. 相似文献

9.

Scientific workflows

Anna-Lena Lamprecht Kenneth J. Turner 《International Journal on Software Tools for Technology Transfer (STTT)》2016,18(6):575-580

This article commences the special section on scientific workflows. A brief history is given of workflow approaches in general. More detail is given of scientific workflows, including sample applications. Challenges and research issues are then identified. Finally, an overview is given of the articles appearing in this special section. 相似文献

10.

Parallelizing the Data Cube 总被引：1，自引：0，他引：1

Frank Dehne Todd Eavis Susanne Hambrusch Andrew Rau-Chaplin 《Distributed and Parallel Databases》2002,11(2):181-201

This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p. 相似文献

11.

通过XML实现浏览器与ASP通信 总被引：9，自引：0，他引：9

王新房刘英梁德胜《计算机应用》2001,21(11):38-39

介绍了XML的基本概念以及XML中常用的两个对象XMLDOM和XMLHTTP。在此基础上，通过XML实现ASP和浏览器的通讯并结合DHTML来解决Web基MIS的硬刷新问题。相似文献

12.

Parallelizing subroutines in sequential programs

Chih-Ping Chu Carver D.L. 《Software, IEEE》1994,11(1):77-85

An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran in our work, but many of the concepts apply to other languages. Our hardware model is a shared-memory multiprocessor system with a fixed number of identical processors, each with its own local memory connected to a common memory that is accessible to all processors equally. The model implements interprocessor synchronization and communication via special memory locations or special storage. Systems like the Cray X-MP, IBM 3090, and Alliant FX/8 fit this model. Our input is a sequential, structured Fortran program with no overlapping branches. With today's emphasis on writing structured code, this restriction is reasonable. A prototype of a system to implement the algorithm is under development on an IBM 3090 multiprocessor 相似文献

13.

顺序扫描实现程序并行化

容红波汤志忠《软件学报》2000,11(12):1648-1655

提出扩展选择调度,统一处理循环和非循环代码,对它们不加区分但却分别产生软件流水和全局压缩的效果;程序并行化不需要分层简化,只要顺序扫描一遍即可.该方法打破了有环调度和无环调度的界限,是一种基于一般图而不是路径或有向无环图的全局调度算法.它从一个全新的角度来看待多重循环,通过恰当地计算可用集合和活变量集合,实现了多重循环的直接调度,对任意控制流程序都是适用的. 相似文献

14.

Parallelizing simplex within SMT solvers

Milan Banković 《Artificial Intelligence Review》2017,48(1):83-112

相似文献

15.

Experiments with Parallelizing Tribology Simulations

V. Chaudhary W. L. Hase H. Jiang L. Sun D. Thaker 《The Journal of supercomputing》2004,28(3):323-343

Different parallelization methods vary in their system requirements, programming styles, efficiency of exploring parallelism, and the application characteristics they can handle. For different situations, they can exhibit totally different performance gains. This paper compares OpenMP, MPI, and Strings for parallelizing a complicated tribology problem. The problem size and computing infrastructure is changed to assess the impact of this on various parallelization methods. All of them exhibit good performance improvements and it exhibits the necessity and importance of applying parallelization in this field. 相似文献

16.

Integrating and querying distributed XML data via XLink

Wolfgang May Erik Behrends Oliver Fritzen 《Information Systems》2008

XML instances are not necessarily self-contained but may have connections to remote XML data residing on other servers. In this paper, we show that—in spite of its minor support and use in the XML world—the XLink language provides a powerful mechanism for expressing such links both from the modeling point of view and for actually querying interlinked XML data: in our dbxlink approach, the links are not seen as explicit links (where the users must be aware of the links and traverse them explicitly in their queries), but define views that combine into a logical, transparent XML model which serves as an external schema and can be queried by XPath/XQuery. We motivate the underlying modeling and give a concise and declarative specification as an XML-to-XML mapping. We also describe the implementation of the model as an extension of the eXist [eXist: an Open Source Native XML Database, http://exist-db.org/] XML database system. The approach can be applied both for distribution of data and for integration of data from autonomous sources. 相似文献

17.

Parallelizing the spectral transform method

Patrick H. Worley John B. Drake 《Concurrency and Computation》1992,4(4):269-291

The spectral transform method is a standard numerical technique used to solve partial differential equations on the sphere in global climate modeling. In particular, it is used in CCM1 and CCM2, the Community Climate Models developed at the National Center for Atmospheric Research. This paper describes initial experiences in parallelizing a program that uses the spectral transform method to solve the non-linear shallow water equations on the sphere, showing that an efficient implementation is possible on the Intel iPSC/860. The use of PICL, a portable instrumented communication library, and Paragraph, a performance visualization tool, in tuning the implementation is also described. The Legendre transform and the Fourier transform comprise the computational kernel of the spectral transform method. This paper is a case study of parallelizing the Legendre transform. For many problem sizes and numbers of processors, the spectral transform method can be parallelized efficiently by parallelizing only the Legendre transform. 相似文献

18.

Parallelizing Time with Polynomial Circuits

Ryan Williams 《Theory of Computing Systems》2011,48(1):150-169

We study the problem of asymptotically reducing the runtime of serial computations with circuits of polynomial size. We give an algorithmic size-depth tradeoff for parallelizing time t random access Turing machines, a model at least as powerful as logarithmic cost RAMs. Our parallel simulation yields logspace-uniform t ^O(1) size, O(t/log t)-depth Boolean circuits having semi-unbounded fan-in gates. In fact, for appropriate d, uniform t ^O(1)2^O(t/d) size circuits of depth O(d) can simulate time t. One corollary is that every log-cost time t RAM can be simulated by a log-cost CRCW PRAM using t ^O(1) processors and O(t/log t) time. This improves over previous parallel speedups, which only guaranteed an Ω(log t)-speedup with an exponential number of processors for weaker models of computation. These results are obtained by generalizing the well-known result that DTIME[t] í ASPACE[logt]textsf{DTIME}[t]subseteq textsf{ASPACE}[log t]. 相似文献

19.

Case-based adaptation of workflows

《Information Systems》2014

This paper presents on a Case-based Reasoning approach for automated workflow adaptation by reuse of experience. Agile workflow technology allows structural adaptations of workflow instances at build time or at run time. The approach supports the expert in performing such adaptations by an automated method. The method employs workflow adaptation cases that record adaptation episodes from the past. The recorded changes can be automatically transferred to a new workflow that is in a similar situation of change. First, the notion of workflow adaptation cases is introduced. The sample workflow modeling language CFCN is presented, which has been developed by the University of Trier as a part of the agile workflow management system Cake. Then, the retrieval of adaptation cases is briefly discussed. The case-based adaptation method is explained including the so-called anchor mapping algorithm which identifies the parts of the target workflow where to apply the changes. A formative evaluation in two application domains compares different variants of the anchor mapping algorithm by means of experts assessing the results of the automated adaptation. 相似文献

20.

Parallelizing AdaBoost by weights dynamics

Stefano Merler Bruno Caprile 《Computational statistics & data analysis》2007,51(5):2487-2498

AdaBoost is one of the most popular classification methods. In contrast to other ensemble methods (e.g., Bagging) the AdaBoost is inherently sequential. In many data intensive real-world situations this may limit the practical applicability of the method. P-AdaBoost is a novel scheme for the parallelization of AdaBoost, which builds upon earlier results concerning the dynamics of AdaBoost weights. P-AdaBoost yields approximations to the standard AdaBoost models that can be easily and efficiently distributed over a network of computing nodes. Properties of P-AdaBoost as a stochastic minimizer of the AdaBoost cost functional are discussed. Experiments are reported on both synthetic and benchmark data sets. 相似文献