首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于关系数据库的位置相关查询处理   总被引:2,自引:0,他引:2  
随着无线通信技术以及全球定位技术的发展,位置相关的查询处理及基于位置的信息服务技术已经成为一个热点研究领域,作为支持位置相关查询的一项关键技术,位置相关数据的处理也正在引起人们日益广泛的关注,对位置相关数据处理中的关键技术进行了分析,提出了一种基于关系数据库的位置相关数据的存储及表示方法,同时提出了一种可变粒度格栅索引方法用于对位置相关数据的数据区域进行索引,在此基础上,研究了相应的查询处理算法,为了测试相应算法的性能,设计并实现了一个原型系统,实验结果表明,所提出的方法具有灵活的表示能力、快捷的查询性能以及良好的可扩充性。  相似文献   

2.
Philipp Adler  Wolfram Amme 《Software》2014,44(10):1223-1249
Most constrained systems use interpreters to run mobile programs written in Java. Such interpreters are designed to minimize resource usage and often do not allow mobile code in the devices to be changed. For this reason, runtime optimization is typically not supported, even though it is completely feasible. In this paper, we propose optimistic optimization as a concept for improving application performance in restricted interpreter environments. In an optimistic optimization, a mobile program is restructured speculatively during code generation. This requires that it is possible to undo such optimizations, at runtime, if an incorrect use is detected or the set of available classes has changed when compared with compile time. Experimental results show that interpreted applications using optimistic optimizations tend to run faster when compared with their conventionally optimized counterparts. Compared with standard load elimination, reductions in runtimes of up to 9% for optimistic load elimination and up to 23% for the combined optimization were achieved. Whereas an average performance improvement of 1.87% for optimistic load elimination and 3.7% for the combined optimization could be realized. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
We identify the roles played by design versions and alternatives in an engineering database. The obvious way to implement versions is to maintain each in a separate collection of files. Because several versions must be kept on line in a design environment, the approach leads to large disk requirements. We develop B-tree-based storage structures to encode versions as ``negative' differential files. Our objective is to keep the disk requirements small. We discuss the effect of enormous amounts of cheap archival storage (write-once optical digital disks) on the proposed structures. We have implemented versions in the Wisconsin storage system (WiSS), an experimental database component developed at the University of Wisconsin-Madison.  相似文献   

4.
Threads as contained in a thread algebra are used for the modeling of sequential program behavior. A thread that may use a counter to control its execution is called a ‘one-counter thread’. In this paper the decidability of risk assessment (a certain form of action forecasting) for one-counter threads is proved. This relates to Cohen’s impossibility result on virus detection (Comput. Secur. 6(1), 22–35, 1984). Our decidability result follows from a general property of the traces of one-counter threads: if a state is reachable from some initial state, then it is also reachable along a path in which all counter values stay below a fixed bound that depends only on the initial and final counter value. A further consequence is that the reachability of a state is decidable. These properties are based on a result for ω-one counter machines by Rosier and Yen (SIAM J. Comput. 16(5), 779–807, 1987).  相似文献   

5.
The execution performance of an information gathering plan can suffer significantly due to remote I/O latencies. A streaming dataflow model of execution addresses the problem to some extent, exploiting all natural opportunities for parallel execution, as allowed by the data dependencies in a plan. Unfortunately, plans that integrate information from multiple sources often use the results of one operation as the basis for forming queries to a subsequent operation. Such cases require sequential execution, an inefficiency that can erase prior gains made through techniques like streaming dataflow. To address this problem, we present a technique called speculative plan execution, an out-of-order method that capitalizes on knowledge gained from prior executions as a means for overcoming remaining data dependencies between plan operators. Our approach inserts additional plan operators that generate and confirm speculative results, while preserving the safety and fairness of overall execution. To increase the utility of speculative execution, we propose a method of value prediction that combines caching with the more effective and space-efficient techniques of classification and transduction. We present experimental results that demonstrate how the performance of information gathering plans can benefit from speculative execution and how its overall utility can be increased through our hybrid method of value prediction.  相似文献   

6.
We are concerned with programs composed of cooperative threads whose execution proceeds in synchronous rounds called instants. We develop static analysis methods to guarantee that each instant terminates in time polynomial in the size of the parameters of the program at the beginning of the computation.  相似文献   

7.
方娟  张红波 《计算机科学》2012,(Z2):48-50,64
存储访问延迟一直是制约计算机系统整体性能的瓶颈,多核处理器的出现使"存储墙"问题更加严重。预取技术可以隐藏存储访问延迟,因此基于多核处理器的预取技术最近成为学术界研究的热点。研究了目前较为新颖的多核处理器预取技术Future execution,然后针对其缺陷提出改进,即提出了FE-Runahead架构,其减少了二级Cache访问缺失,提高了二级Cache命中率。实验结果表明,改进后的预取架构的二级Cache命中率提高了约9%,相对执行时间减少了8%。  相似文献   

8.
9.
We revisit memory hierarchy design viewing memory as an inter-operation communication mechanism. We show how dynamically collected information about inter-operation memory communication can be used to improve memory latency. We propose two techniques: (1) Speculative Memory Cloaking, and (2) Speculative Memory Bypassing. In the first technique, we use memory dependence prediction to speculatively identify dependent loads and stores early in the pipeline. These instructions may then communicate prior to address calculation and disambiguation via a fast communication mechanism. In the second technique, we use memory dependence prediction to speculatively transform DEF-store-load-USE dependence chains within the instruction window into DEF-USE ones. As a result, dependent stores and loads are taken off the communication path resulting in further reduction in communication latency. Experimental analysis shows that our methods, on the average, correctly handle 40% (integer) and 19% (floating point) of all memory loads. Moreover, our techniques result in performance improvements of 4.28% (integer) and 3.20% (floating point) over a highly aggressive, dynamically scheduled processor implementing naive memory dependence speculation. We also study the value and address locality characteristics of the values our methods correctly handle. We demonstrate that our methods are orthogonal to both address and value prediction.  相似文献   

10.
Martinez  J.F. Torrellas  J. 《Micro, IEEE》2003,23(6):126-134
Proper synchronization is vital to ensuring that parallel applications execute correctly. A common practice is to place synchronization conservatively so as to produce simpler code in less time. unfortunately, this practice frequently results in suboptimal performance because it stalls threads unnecessarily. Speculative synchronization overcomes this problem by allowing threads to speculatively execute past active barriers, busy locks, and unset flags. The result is high performance.  相似文献   

11.
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper we present a new asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 + O(K⋅ p⋅ D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its performance using a set of C-based benchmarks that have dynamic or irregular parallelism. We compare the performance of our scheduler with that of two previous schedulers: the thread library's original scheduler (which uses a FIFO queue), and a provably space-efficient depth-first scheduler. At a fine thread granularity, our scheduler outperforms both these previous schedulers, but requires marginally more memory than the depth-first scheduler. We also present simulation results on synthetic benchmarks to compare our scheduler with space-efficient versions of both a work-stealing scheduler and a depth-first scheduler. The results indicate that unlike these previous approaches, the new algorithm covers a range of scheduling granularities and space requirements, and allows the user to trade the space requirement of a program with the scheduling granularity. Received June 11, 2000, and in revised form March 19, 2001, and in final form August 7, 2001. Online publication April 5, 2002.  相似文献   

12.
Communication latencies within critical sections constitute a major bottleneck in some classes of emerging parallel workloads. In this paper, we argue for the use of two mechanisms to reduce these communication latencies: Inferentially Queued locks (IQLs) and Speculative Push (SP). With IQLs, the processor infers the existence, and limits, of a critical section from the use of synchronization instructions and joins a queue of lock requestors, reducing synchronization delay. The SP mechanism extracts information about program structure by observing IQLs. SP allows the cache controller, responding to a request for a cache line that likely includes a lock variable, to predict the data sets the requestor will modify within the associated critical section. The controller then pushes these lines from its own cache to the target cache, as well as writing them to memory. Overlapping the protected data transfer with that of the lock can substantially reduce the communication latencies within critical sections. By pushing data in exclusive state, the mechanism can collapse a read-modify-write sequences within a critical section into a single local cache access. The write-back to memory allows the receiving cache to ignore the push. Neither mechanism requires any programmer or compiler support nor any instruction set changes. Our experiments demonstrate that IQLs and SP can improve performance of applications employing frequent synchronization.  相似文献   

13.
As technology advances, the number of cores in Chip MultiProcessor systems and MultiProcessor Systems-on-Chips keeps increasing. The network must provide sustained throughput and ultra-low latencies.In this paper we propose new pipelined switch designs focused in reducing the switch latency. We identify the switch components that limit the switch frequency: the arbiter. Then, we simplify the arbiter logic by using multiple smaller arbiters, but increasing greatly the switch area. To solve this problem, a second design is presented where the routing traversal and arbitrations tasks are mixed. Results demonstrate a switch latency reduction ranging from 10% to 21%. Network latency is reduced in a range from 11% to 15%.  相似文献   

14.
席琳  赵东明 《微计算机信息》2007,23(24):32-33,18
近年来,安全电子商务协议的设计和分析逐渐成为热点。机密性、公平性等性质是衡量电子商务协议安全与否的重要标志,也是协议能否顺利使用的重要前提。机密性和公平性是安全电子商务协议的基本性质,但可证实电子邮件协议CMP作为电子商务协议的一种却并不满足这些性质。该文指出可证实电子邮件协议CMP的几个缺陷,对其分析并提出了改进方案,改进后的协议满足了机密性和公平性。  相似文献   

15.
支持向量机的研究是当前人工智能领域的研究热点。基于支持向量机的大样本回归问题一直是一个非常具有挑战性的课题。最近,基于递归最小二乘算法,Engel等人提出了核递归最小二乘算法。文中基于块增量学习和逆学习过程,提出了自适应迭代回归算法。为了说明两种方法的性能,论文在训练速度、精度和支持向量数量等方面,对它们做了比较。模拟结果表明:核递归最小二乘算法所得到的支持向量个数比自适应迭代回归算法少,而训练时间比自适应迭代回归算法的训练时间长,训练和测试精度也比自适应迭代回归算法差。  相似文献   

16.
多核处理器通过增加处理器核数提高计算能力,虽然可以通过同时运行多道程序的方式利用处理器资源,但是多核处理器真正的成功取决于解决并行应用开发中的难题.为此,处理器体系结构和编程模型的协同开发是必须的.而随着核数的增多,传统上使用的软件模拟器因为软件的串行性而性能越来越差,无法支持这种软硬件协同开发.FPGA天生的并行性使它在模拟多核处理器时具有较高的模拟性能和高度的可扩放性,成为处理器体系结构研究的理想工具.本文介绍了基于FPGA的多核模拟系统,RAMP-Pink.该系统基于HASim实现,同时支持事务存储和线程级推测,用于对事务存储和线程级推测的软硬件协同开发.该模拟系统可配置不同的FPGA开发平台,也可以以软件模拟方式运行.  相似文献   

17.
A NUCA Substrate for Flexible CMP Cache Sharing   总被引:1,自引:0,他引:1  
We propose an organization for the on-chip memory system of a chip multiprocessor in which 16 processors share a 16-Mbyte pool of 64 level-2 (L2) cache banks. The L2 cache is organized as a nonuniform cache architecture (NUCA) array with a switched network embedded in it for high performance. We show that this organization can support a spectrum of degrees of sharing: unshared, in which each processor owns a private portion of the cache, thus reducing hit latency, and completely shared, in which every processor shares the entire cache, thus minimizing misses, and every point in between. We measure the optimal degree of sharing for different cache bank mapping policies and also evaluate a per-application cache partitioning strategy. We conclude that a static NUCA organization with sharing degrees of 2 or 4 works best across a suite of commercial and scientific parallel workloads. We demonstrate that migratory dynamic NUCA approaches improve performance significantly for a subset of the workloads at the cost of increased complexity, especially as per-application cache partitioning strategies are applied. We also evaluate the energy efficiency of each design point in terms of network traffic, bank accesses, and external memory accesses.  相似文献   

18.
由成百上千处理器核构成的众核处理器在提供大量计算能力的同时,也对如何高效利用资源提出挑战;具有不同并行度的应用对处理器核资源有不同的需求,不合理的分配会造成资源浪费(分配过多)或者限制并行性开发(分配过少).针对众核结构上串行程序线程级推测执行面临的处理器核资源分配问题,提出一种基于硬件的推测执行能力监测和评估机制,设计三种线程级推测执行能力评估器;该评估器能够根据串行程序推测执行能力的动态变化,对应用分配的处理器核资源数量进行实时调整.实验结果表明,利用一个硬件开销极小的评估器对众核平台上串行程序的线程级推测执行进行资源分配指导,即可使性能和资源利用率达到有效的平衡.  相似文献   

19.
GPUs and CPUs have fundamentally different architectures. It is conventional wisdom that GPUs can accelerate only those applications that exhibit very high parallelism, especially vector parallelism such as image processing. In this paper, we explore the possibility of using GPUs for value prediction and speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism, and software speculation techniques to accelerate programs that contain runtime parallelism, which are hard to parallelize statically. Our experiment results show that due to the relatively high overhead, mapping software value prediction techniques on existing GPUs may not bring any immediate performance gain. On the other hand, although software speculation techniques introduce some overhead as well, mapping these techniques to existing GPUs can already bring some performance gain over CPU. Based on these observations, we explore the hardware implementation of speculative execution operations on GPU architectures to reduce the software performance overheads. The results indicate that the hardware extensions result in almost tenfold reduction of the control divergent sequential operations with only moderate hardware (5–8%) and power consumption (1–5%) overheads.  相似文献   

20.
This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号