首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
杨程  陆佳民  冯钧 《计算机应用》2020,40(11):3184-3191
随着知识图谱的日益发展和在各个垂直领域的广泛应用,对于资源描述框架(RDF)数据的高效处理需求日益成为现代大数据管理领域中的新课题。RDF是W3C提出的用于描述知识图谱实体以及实体间关系的数据模型。为了有效地应对大规模RDF数据的存储和查询,很多学者考虑在分布式环境中管理RDF数据。RDF数据的分布式存储所面临的关键问题是数据的划分,而划分的结果很大程度上决定了SPARQL的查询性能。从数据划分的角度,主要围绕两类:基于图结构的RDF数据划分方法和基于语义的RDF数据划分方法展开深入阐述。前者包括多粒度层次划分、模板划分和聚类划分,适用于通用领域查询的语义范畴较为宽泛的场景;后者包括哈希划分、垂直划分和模式划分,更加适用于垂直领域查询的语义范畴相对固定的环境。此外,针对几种典型的划分方法进行对比与分析,为未来RDF数据划分方法的研究提供参考。最后,对未来RDF数据划分方法的发展方向进行了归纳总结。  相似文献   

2.
In many distributed databases locality of reference is crucial to achieve acceptable performance. However, the purpose of data distribution is to spread the data among several remote sites. One way to solve this contradiction is to use partitioned data techniques. Instead of accessing the entire data, a site works on a fraction that is made locally available, thereby increasing the site's autonomy. We present a theory of partitioned data that formalizes the concept and establishes the basis to develop a correctness criterion and a concurrency control protocol for partitioned databases. Set-serializability is proposed as a correctness criterion and we suggest an implementation that integrates partitioned and non-partitioned data. To complete this study, the policies required in a real implementation are also analyzed. Recommended by: Hector Garcia-Molina  相似文献   

3.
This paper discusses the relationship between two optimization methods in deductive databases: the distribution of selections and the magic sets method. The former is a direct generalization of pushing selections in relational databases, and the latter realizes a more general view of selection propagation. The characteristics of the generalized form of the distribution of selections are discussed and compared to other methods. It is shown that the distribution of selections corresponds to one of the least effective variations of the magic sets method. It is also shown that both methods have essentially the same power for non-recursive queries. Hence, the magic sets method can be regarded as a natural generalization of pushing selections in relational databases.  相似文献   

4.
This paper describes the design and implementation of a high-level query language called Generalized Query-By-Rule (GQBR) which supports retrieval, insertion, deletion and update operations. This language, based on the formalism of database logic, enables the users to access each database in a distributed heterogeneous environment, without having to learn all the different data manipulation languages. The compiler has been implemented on a DEC 1090 system in Pascal.  相似文献   

5.
Specialized processing units such as GPUs or FPGAs provide great opportunities to speed up database operations by exploiting parallelism and relieving the CPU. However, distributing a workload on suitable (co-)processors is a challenging task, because of the heterogeneous nature of a hybrid processor/co-processor system. In this paper, we present a framework that automatically learns and adapts execution models for arbitrary algorithms on any (co-)processor. Our physical optimizer uses the execution models to distribute a workload of database operators on available (co-)processing devices. We demonstrate its applicability for two common use cases in modern database systems. Additionally, we contribute an overview of GPU-co-processing approaches, an in-depth discussion of our framework's operator model, the required steps for deploying our framework in practice and the support of complex operators requiring multi-dimensional learning strategies.  相似文献   

6.
分布式综合知识发现系统结构研究   总被引:2,自引:0,他引:2  
利用多Agent技术,采用多层次结构,建立基于内在机理研究基础上的分布式综合知识发现系统(DKD(D&K))总体结构模型。该模型设计了基于双库协同机制的分布式KDD*的知识发现线路,使得分布式数据的预处理、挖掘算法及挖掘结果的评价和导航等研究贯穿于一体,形成了一个完整的系统。该模型不仅较好的继承原综合知识发现系统KD(D&K)的主要特征,而且紧密结合了分布式数据库已经成熟的技术方法,并且与现在国际上比较典型的分布式知识发现系统比较有一定的优越性。  相似文献   

7.
利用.NET Web Service构建分布式异构应用平台是当今软件设计的一个热点。文章介绍了.NET框架中基于Web Service的分布式异构应用平台体系结构及主要的后台工作协议XML(eXtensible Markup Language,可扩展标识语言),阐述了利用XML作为异构数据转换的中介,Web Service作为分布式应用的平台来实现分布式异构数据透明转换的机制。  相似文献   

8.
In intelligent networking telecommunication services such as free-phone and various personal communications services, the dialed number corresponds to the identity of the called party rather than to the called party's physical location. The dialed number must therefore be converted to a routable telephone number during call setup. This accomplished by querying a database. If the query is successful, its result is a routable number which is passed to the switch at which the call originates so that the call may be completed. A database maintained solely at a single node may not be sufficient to support the large capacity, reliability, and rapid processing requirements of this service. However, these requirements may be met by replicating and distributing the database instead.Replication and distribution induce problems of consistency, concurrency, and load balancing. We describe and present a performance model of a scheme for replicating customer profiles efficiently. To spread the query transaction load, the database and its replicates could be distributed over a set of geographically distinct nodes. The database would be partitioned into as many disjoint fragments as there are nodes. Each fragment would be stored at two of the nodes, subject to the constraint that no pair of subsets would be stored at more than one node. This constraint ensures that the load which would have been carried by a failed node is spread to two other nodes instead of one, thus reducing the risk of overload in case of failure. We also use a performance analysis to arrive at heuristics for routing transactions within the database which attempt to minimize the query and update response times. For a particular implementation, the analysis suggests that READ or query transactions should be routed to the least loaded node of those holding the fragment of interest. By contrast, when strict locking of all copies is required while performing an update, WRITE or update transactions which are initiated at one node and repeated at the other should be routed to the more heavily loaded of the nodes to minimise overall response time. Recommended by: Amit ShethA shorter version of this paper was presented at the 8th ITC Specialist Seminar on Universal Personal Telecommunications held at Santa Margherita Ligure, Italy, in October 1992.  相似文献   

9.
高校分布式数据库管理实验系统的设计与实现   总被引:3,自引:1,他引:3  
张文东  夏伟伟 《计算机工程与设计》2007,28(5):1211-1212,1228
目前,高校分布式数据库实验一般使用Oracle数据库管理系统.然而Oracle要经过复杂的数据管理和配置才能实现分布式数据库管理系统的技术和方法,也不利于学生对一些基本概念的理解.针对这一问题,设计并实现了基于C/S的同构型的高校分布式数据库管理实验系统,提出了系统的体系结构,定义了系统语法,并详细阐述了数据库字典、语法分析模块、定位模块等关键技术的设计.该系统能够满足高校分布式数据库实验的要求,对高校教学有一定的意义.  相似文献   

10.
Weka4WS采用WSRF技术用于执行远程的数据挖掘和管理分布式计算,支持分布式数据挖掘任务。基于Weka4WS和网格环境,尝试了一种新的分布式聚类方法,并成功地将其嵌入到Weka4WS框架中,借助Weka Library实现分布式数据挖掘算法,同时引入了距离代价和混合概率的概念,将网格与Web服务技术融合,以分布式问题求解环境和开源数据挖掘类库Weka为底层支持环境,构建了网格环境下面向服务的分布式数据挖掘体系,并以基于Weka4WS的分布式聚类算法验证了算法的有效性和体系结构的可行性。  相似文献   

11.
Wireless sensor networks (WSN) are composed of several sensors having limited memory, processing power, communication bandwidth, and energy, which cooperate in performing a given task. The use of the database paradigm has emerged in the last few years as a viable solution to manage data in such a context. In this paper we present the MaD‐WiSe system, a distributed query processing framework that moves the processing of the query into the network. MaD‐WiSe reconsiders various aspects related to database system design and it reinterprets them according to the WSN constraints and requirements. In particular it considers the aspects related to the definition of a query language to formalize the queries, a stream model to manage data acquired by the sensors, a query algebra to define the operators that actually perform the query, and energy efficiency and query optimization strategies for saving energy. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
This paper considers the implications of Distributed Virtual Reality in terms of user interaction and networking. It is argued that the allocation strategy of data in such a system is one of the fundamental problems faced by designers of Distributed Virtual Reality applications. The need to use experience and techniques drawn from distributed databases is highlighted.  相似文献   

13.
An adaptive probe-based optimization technique is developed and demonstrated in the context of an Internet-based distributed database environment. More and more common are database systems which are distributed across servers communicating via the Internet where a query at a given site might require data from remote sites. Optimizing the response time of such queries is a challenging task due to the unpredictability of server performance and network traffic at the time of data shipment; this may result in the selection of an expensive query plan using a static query optimizer. We constructed an experimental setup consisting of two servers running the same database management system connected via the Internet. Concentrating on join queries, we demonstrate how a static query optimizer might choose an expensive plan by mistake. This is due to the lack of a priori knowledge of the run-time environment, inaccurate statistical assumptions in size estimation, and neglecting the cost of remote method invocation. These shortcomings are addressed collectively by proposing a probing mechanism. An implementation of our run-time optimization technique for join queries was constructed in the Java language and incorporated into an experimental setup. The results demonstrate the superiority of our probe-based optimization over a static optimization. Received 6 February 1999 / Revised 15 February 2000 / Accepted 10 May 2000  相似文献   

14.
XML structural joins, which evaluate the containment (ancestor-descendant) relationships between XML elements, are important operations of XML query processing. Estimating structural join size accurately and quickly is crucial to the success of XML query plan selection and the query optimization. XML structural joins are essentially complex θ-joins, which render well-known estimation techniques for relational equijoins, such as discrete cosine transform, wavelet transform, and sketch, not applicable. In this paper, we model structural joins from a relational point of view and convert the complex θ-joins to equijoins so that those well-known estimation techniques become applicable to structural join size estimation. Theoretical analyses and extensive experiments have been performed on these estimation methods. It is shown that discrete cosine transform requires the least memory and yields the best estimates among the three techniques. Compared with state-of-the-art method IM-DA-Est, discrete cosine transform is much faster, requires less memory, and yields comparable estimates.  相似文献   

15.
对树形网络上的数据副本最优安置问题,在已有K子树中心优化模型的基础上提出了K节点中心的改进模型.改进模型相对于原有模型优化了分布式数据库更新操作的执行代价.给出了两个动态规划算法来求解树形网络K节点中心问题,一个是非常简单的复杂度较高的动态规划,另一个是使用分治的较复杂的高效动态规划,最后通过实验验证了模型的优化作用.  相似文献   

16.
Skyline计算是多准则决策,数据挖掘和数据库可视化的重要操作。移动对象在运动过程中,由于位置信息的不确定,导致局部各数据点间的支配关系不稳定,从而影响全局概率Skyline集合。针对分布式环境下不确定移动对象的连续概率Skyline查询更新进行研究,提出了一种降低通信开销的连续概率Skyline查询的有效算法CDPS-UMO,该算法在局部节点中对局部概率Skyline点的变化进行跟踪;提出了有效的排序方法和反馈机制,大大降低了通信开销和计算代价;提出一种基本算法naive,与CDPS-UMO进行了对比实验,实验结果证明了算法的有效性。  相似文献   

17.
针对分布式存储系统上使用非主键访问数据带来的性能问题,探讨在分布式存储系统上实现索引的相关关键技术。在充分分析分布式存储特征的基础上,提出了分布式索引设计和实现的关键点,并结合分布式存储系统的特点及相关的索引技术,讨论了索引的组织形式、索引的维护和数据一致性等问题;然后基于如上的分析,选择在分布式数据库系统OceanBase开源版本上,设计和实现分布式索引机制,并通过基准测试工具YCSB进行性能测试。实验结果表明,虽然辅助索引会对系统性能产生影响,但因为充分考虑了系统特征及存储特点,在不同数据规模下,该索引都能够将性能影响控制在5%以内。另外,使用冗余列的方式,能进一步将该索引的性能提升100%。  相似文献   

18.
Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data–metadatapairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta-first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization.  相似文献   

19.
Replication in distributed computing systems provides improved availability and reliability in the event of site failure and network partitioning. However, if strict mutual consistency is required, transactions can be processed in at most one partition, thereby reducing availability. We present a consistency control algorithm that relaxes strict mutual consistency criteria, and allows concurrent processing in all partitions. Inconsistency of data objects in different partitions is resolved at the time of merging the partitions when recovery occurs. The basis of our algorithm is a new merge mechanism that utilizes available semantic information about the data objects and transaction types. We present a formal proof of correctness of the algorithm. Results from a simulation model show that our algorithm performs better than a previously proposed approach that uses compensating transactions to sacrifice serializability of replicated data.  相似文献   

20.
《国际计算机数学杂志》2012,89(12):1447-1454
A model is developed for allocating tables in a distributed database system. The model considers memory cost, transmission cost, table size and request rates, as well as updating rates of tables, the maximum allowable expected access times to tables at each computer and the memory capacity of each computer. The objective function is concerned with overall operating cost optimality. In this regard, the model is formulated as a non-linear integer zero–one programming problem, which can be converted into a linear zero–one programming model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号