首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
连接是数据查询处理中最耗时、使用最频繁的操作之一,对提高连接操作的速率具有重要意义。阵列众核处理器是一类重要的众核处理器,具有强大的并行能力,可用来加速并行计算。基于阵列众核处理器的结构,设计和优化了一种高效的多层分区Hash连接算法。该算法通过多层划分的策略大大降低了主存访问次数,通过分区重排方法有效消除了数据倾斜的影响,获得了很高的性能。在异构融合阵列众核处理器DFMC(Deeply-Fused Many Core)原型系统上的实验结果表明,DFMC上多层分区Hash连接算法的性能是CPU-GPU耦合结构上最快的连接算法的8.0倍,表明利用阵列众核处理器加速数据查询应用具有优势。  相似文献   

2.
网格连接的处理器阵列是一种应用广泛的高性能体系结构,而容错处理器阵列的重构技术是近年来的研究热点之一.现有的研究多数集中在串行重构算法上,忽视了该结构重构时内在的可并行性.本文根据阵列结构的特点设计了一种基于VHDL语言的重构算法,该算法从第一行的各个无故障处理器单元同时向下选路,具有潜在的并行性,.实验结果表明,与现有的串行算法相比,本文提出的并行算法同样能够生成最大规模的目标阵列并且当物理阵列大小为48×48,本文提出的并行算法加速重构将近20倍.  相似文献   

3.
随着web应用的不断普及,web应用软件的测试越来越重要。本文针对web应用的测试数据生成和组织结构问题,提出了一种元模型驱动的数据池技术,并实现了一个测试数据生成与组织框架。该框架依据web页面上的输入域类型、输入变量的数据类型以及实例化约束条件,支持手工或自动产生测试数据。  相似文献   

4.
一种面向异构众核处理器的并行编译框架   总被引:1,自引:0,他引:1  
异构众核处理器是面向高性能计算领域处理器发展的重要趋势,但其更为复杂的体系结构使得编程难的问题更加突出.针对这一问题,基于开源编译器Open64,提出了一种面向异构众核处理器的并行编译框架,将程序自动转换为异构并行程序.该框架主要包括4个模块:任务划分模块用来识别适合进行加速计算的程序段,实现了嵌套循环的多维并行识别方法;数据布局模块完成数据在主存和SPM之间的布局,实现了数组边界分析和指针范围分析;传输优化模块实现了数据传输合并、传输外提、打包传输、数组转置等多种数据传输优化方法;收益评估模块在构建代价模型的基础上实现了一种动静结合的收益评估方法.并且,基于SW26010处理器,对该编译框架进行了实现,测试结果表明,该编译框架能够实现一些程序以面向异构众核结构的并行变换,且获得较好的加速效果.  相似文献   

5.
Explicit Data Graph Execution(EDGE)ISA是一种专门为类数据流驱动的分片式众核处理器而设计的指令集体系结构.相较于传统的采用控制流驱动的处理器,EDGE结构以超块(Hyperblock)而不是单个指令作为其执行单位,在超块内部实现数据流执行,超块之间按照推测序保持控制流执行,有利于挖掘指令级并行性.但是,EDGE编译器按照程序的串行执行顺序组织超块,超块间和超块内部受限于数据依赖,削弱了整个程序运行时的潜在数据级并行性和线程级并行性,不利于发挥EDGE分片式结构的优势.本文通过分析EDGE编译器超块组织的特点,结合EDGE结构特有的执行模型,提出一种普适性的超块组织框架来模拟EDGE结构上多线程运行的效果,进一步挖掘EDGE结构运行串行单线程程序时的指令级并行性.本文选用TRIPS微处理器作为EDGE结构的实例处理器,利用矩阵乘法等三个实验验证了我们所提出的框架的可行性,实验结果表明这些应用在TRIPS上获得了较好的性能提升.  相似文献   

6.
基于脚本的构件测试自动化框架   总被引:1,自引:0,他引:1       下载免费PDF全文
倪铭  黄萍 《计算机工程》2010,36(6):94-96
针对传统的测试自动化技术已不适用于构件的问题,提出基于脚本的构件测试自动化框架。将面向对象单元测试自动化框架xUnit与数据驱动的测试框架结合并改进,实现构件测试脚本自动生成、测试脚本自动执行、测试结果自动验证与记录。实例表明,该框架能有效对构件进行自动化测试,自动化程度较高。  相似文献   

7.
提出了FFT处理器的蝶形单元和地址发生器优化方案。通过改进Wallace树型加法器阵列结构,提高了蝶形单元乘法器的工作频率。提出了地址快速生成算法,该算法在快速产生地址的同时降低了读取旋转因子ROM的功耗。在Xilinx的Vertex-II系列FPGA上实现,该处理器可以稳定工作在150 MHz时钟下,速度满足设计指标。  相似文献   

8.
本文在深入分析K-means算法计算特征的基础上,基于FPGA平台提出并实现了一种细粒度的并行浮点K-means算法。设计采用了阵列多PE并行处理的任务划分策略,实现了处理单元间的负载平衡,采用数据驱动的流水线隐藏片外存储访问,设计了一种基于脉动阵列结构的主从多PE并行计算阵列,并在单片FPGA(XC5VLX330)上成功集成了4个PE。实验结果表明,我们提出的K-means算法加速器结构具备良好的可扩展性。通过实验测试,我们的实现方案相对于Pentium 4 2.66 GHz单处理器程序达到了15倍的加速比。  相似文献   

9.
根据Web系统测试的特点,采用数据驱动和业务逻辑脚本模块化的测试方法,设计一种以数据驱动框架和关键字框架为基础的混合模式的Web自动化测试框架,借鉴这两种测试框架模块化的思想,采用分层设计。该测试框架具有良好的独立性和可扩展性。  相似文献   

10.
针对拥有复杂指令系统及处理过程的网络处理器(NP)仿真器的测试,提出了一个融入数据的关键字驱动NP自动化测试系统.该系统兼有数据和关键字驱动两种测试技术的优点,可将用例自动转化成系统测试脚本,并自动生成封装有指令的VPP协议报文,通过系统的自动处理、预期及比对等过程,实现对NP仿真器在集成状态的自动测试.在实际应用中验证了这种自动测试技术可大大简化用例设计,降低编程需求,从而大幅提高测试效率,降低测试成本.  相似文献   

11.
A fine-grained MIMD (multiple-instruction, multiple-data) array processor for video applications that combines submicron technology, parallel processing, and dataflow programming is presented. The Datawave processor is used as the building block of this cellular, data-driven system architecture. The processor executes statically scheduled dataflow programs, and self-timed hardware mechanisms handle the asynchronous dataflows automatically and transparently. The architecture is discussed first at the array level and then at the cell level. It is shown how Datawave implements a four-tap finite impulse response filer and a real-time image codec. Program development tools for Datawave are discussed, and the chip itself is briefly described  相似文献   

12.
13.
The design of specialized processing array architectures, capable of executing any given arbitrary algorithm, is proposed. An approach is adopted in which the algorithm is first represented in the form of a dataflow graph and then mapped onto the specialized processor array. The processors in this array execute the operations included in the corresponding nodes (or subsets of nodes) of the dataflow graph, while regular interconnections of these elements serve as edges of the graph. To speed up the execution, the proposed array allows the generation of computation fronts and their cancellation at a later time, depending on the arriving data operands; thus it is called a data-driven array. The structure of the basic cell and its programming are examined. Some design details are presented for two selected blocks, the instruction memory and the flag array. A scheme for mapping a dataflow graph (program) onto a hexagonally connected array is described and analyzed. Two distinct performance measures-mapping efficiency and array utilization-and some performance results are discussed  相似文献   

14.
15.
With the developing demands of massive-data services, the applications that rely on big geographic data play crucial roles in academic and industrial communities. Unmanned aerial vehicles (UAVs), combining with terrestrial wireless sensor networks (WSN), can provide sustainable solutions for data harvesting. The rising demands for efficient data collection in a larger open area have been posed in the literature, which requires efficient UAV trajectory planning with lower energy consumption methods. Currently, there are amounts of inextricable solutions of UAV planning for a larger open area, and one of the most practical techniques in previous studies is deep reinforcement learning (DRL). However, the overestimated problem in limited-experience DRL quickly throws the UAV path planning process into a locally optimized condition. Moreover, using the central nodes of the sub-WSNs as the sink nodes or navigation points for UAVs to visit may lead to extra collection costs. This paper develops a data-driven DRL-based game framework with two partners to fulfill the above demands. A cluster head processor (CHP) is employed to determine the sink nodes, and a navigation order processor (NOP) is established to plan the path. CHP and NOP receive information from each other and provide optimized solutions after the Nash equilibrium. The numerical results show that the proposed game framework could offer UAVs low-cost data collection trajectories, which can save at least 17.58% of energy consumption compared with the baseline methods.  相似文献   

16.
In many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access in many parallel programs on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance trade-off between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a generalized processor mapping technique to minimize the amount of data exchange for BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) array redistribution and vice versa. The main idea of the generalized processor mapping technique is first to develop mapping functions for computing a new rank of each destination processor. Based on the mapping functions, a new logical sequence of destination processors can be derived. The new logical processor sequence is then used to minimize the amount of data exchange in a redistribution. The generalized processor mapping technique can handle array redistribution with arbitrary source and destination processor sets and can be applied to multidimensional array redistribution. We present a theoretical model to analyze the performance improvement of the generalized processor mapping technique. To evaluate the performance of the proposed technique, we have implemented the generalized processor mapping technique on an IBM SP2 parallel machine. The experimental results show that the generalized processor mapping technique can provide performance improvement over a wide range of redistribution problems  相似文献   

17.
A longest common subsequence (LCS) of two strings is a common subsequence of two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, we present a new linear processor array for solving the LCS problem. The array is based on parallelization of a recent LCS algorithm which consists of two phases, i.e. preprocessing and computation. The computation phase is based on bit-level dynamic programming approach. Implementations of the preprocessing and computation phases are discussed on the same processor array architecture for the LCS problem. Further, we propose a block processor array architecture which reduces the overall communication and time requirements. Finally, we develop a performance model for estimating the performance of the processor array architecture on Pentium processors.  相似文献   

18.
This paper proposes a high speed multi-level-parallel array processor for programmable vision chips.This processor includes 2-D pixel-parallel processing element(PE)array and 1-D row-parallel row processor(RP)array.The two arrays both operate in a single-instruction multiple-data(SIMD)fashion and share a common instruction decoder.The sizes of the arrays are scalable according to dedicated applications.In PE array,each PE can communicate not only with its nearest neighbor PEs,but also with the next near neighbor PEs in diagonal directions.This connection can help to speed up local operations in low-level image processing.On the other hand,global operations in mid-level processing are accelerated by the skipping chain and binary boosters in RP array.The array processor was implemented on an FPGA device,and was successfully tested for various algorithms,including real-time face detection based on PPED algorithm.The results show that the image processing speed of proposed processor is much higher than that of the state-of-the-arts digital vision chips.  相似文献   

19.
处理器阵列的容错重构技术是片上网络多核、众核高性能体系结构的可靠性技术之一。现有的最大逻辑阵列并行重构技术仅对单条逻辑列的构造实现了并行化,而对多条逻辑列的同步并行仍未见可行算法。依据处理器阵列的潜在并行性,在分治策略的基础上,提出了一种阵列分块的并行重构算法。算法对处理器阵列实施横向分块划分,对每个阵列块进行并行重构,并对所得逻辑子阵列进行归并,实现了多条逻辑列的同步并行重构。与现有的并行算法相比,新算法同样能够生成最大逻辑列,并且减少了通信开销与计算中的数据冗余,有效提高了运行速度。实验结果表明,在物理阵列大小为64×64的处理器阵列上,运行速度比现有并行算法提高39.55%,并且具有良好的可扩展性。  相似文献   

20.
Remaining useful life prediction is one of the key requirements in prognostics and health management. While a system or component exhibits degradation during its life cycle, there are various methods to predict its future performance and assess the time frame until it does no longer perform its desired functionality. The proposed data-driven and model-based hybrid/fusion prognostics framework interfaces a classical Bayesian model-based prognostics approach, namely particle filter, with two data-driven methods in purpose of improving the prediction accuracy. The first data-driven method establishes the measurement model (inferring the measurements from the internal system state) to account for situations where the internal system state is not accessible through direct measurements. The second data-driven method extrapolates the measurements beyond the range of actually available measurements to feed them back to the model-based method which further updates the particles and their weights during the long-term prediction phase. By leveraging the strengths of the data-driven and model-based methods, the proposed fusion prognostics framework can bridge the gap between data-driven prognostics and model-based prognostics when both abundant historical data and knowledge of the physical degradation process are available. The proposed framework was successfully applied on lithium-ion battery remaining useful life prediction and achieved a significantly better accuracy compared to the classical particle filter approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号