期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A NoC-based hybrid message-passing/shared-memory approach to CMP design

Mario R. CasuAuthor Vitae Massimo Ruo RochAuthor VitaeSergio V. TotaAuthor Vitae Maurizio ZamboniAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):261-273

Future chip-multiprocessors (CMP) will integrate many cores interconnected with a high-bandwidth and low-latency scalable network-on-chip (NoC). However, the potential that this approach offers at the transport level needs to be paired with an analogous paradigm shift at the higher levels. In particular, the standard shared-memory programming model fails to address the requirements of scalability of the many-core era. Fast data exchange among the cores and low-latency synchronization are desirable but hard to achieve in practice due to the memory hierarchy. The message-passing paradigm permits instead direct data communication and synchronization between the cores. The shared-memory could still be used for the instruction fetch. Hence, we propose a hybrid approach that combines shared-memory and message passing in a single general-purpose CMP architecture that allows efficient execution of applications developed with both parallel programming approaches. Cores fetch instructions from a hierarchical memory and exchange their data through the same memory, for compatibility with existing software, or directly through the fast NoC. We developed a fast SystemC based cycle-accurate simulator for design space explorations that we used to evaluate the performance with real benchmarks. The various components have been RTL coded and mapped to a CMOS 45 nm technology to build a silicon area model that we used to select the best architectural configurations. 相似文献

2.

基于共享存储和Gzip的并行压缩算法研究 总被引：2，自引：1，他引：1

宋刚蒋孟奇张云泉刘胜飞《计算机工程与设计》2009,30(4)

Gzip无损压缩算法.尽管gzip算法能够取得很好的压缩比,但它在分析和压缩编码的过程需要进行大量的计算.为了缩短压缩时间,提出了一种基于共享存储的并行压缩策略,采用OpenMP标准和"生产者/消费者"模型实现了gzip的并行压缩版本.在Beowulf集群中的一个SMP节点(双CPU)和曙光天阔服务器(4路双核)上的测试表明,并行化的gzip程序取得了极大的性能提升,尤其是大文件的压缩. 相似文献

3.

The distributed virtual shared-memory system based on the InfiniBand architecture

《Journal of Parallel and Distributed Computing》2005,65(10):1271-1280

Even though there have been strong research activities about distributed virtual shared-memory (DVSM) systems, their architectures have been not widely used in current high-performance computing markets. The reason is that the previously introduced DVSM systems use conventional interconnection technologies like Ethernet, which incurs high execution overhead due to process interruption at data communication for memory consistency. In this paper, we present the DVSM architecture based on the next generation of an interconnection technique, the InfiniBand Architecture (IBA). Because the IBA supports shared-memory programming semantics by means of remote direct-memory access (RDMA) and atomic operations in hardware, we can minimize the communication overhead for memory consistency on the DVSM system. For characterizing multithreaded applications on our IBA-based DVSM system, we examined two different shared-memory programming models, i.e. SPMD and OpenMP benchmarks. We show that our DVSM to use full features of the IBA can improve the performance significantly over the IPoIB-based DVSM system in all benchmarks, and also comparable to the bus-based shared-memory multiprocessor system in some benchmarks. 相似文献

4.

面向DRAM和NVM异构混合内存架构的排序连接算法优化

杨柳金培权《计算机工程与科学》2021,43(2):191-198

随着计算机技术的高速发展,数据的应用规模也在不断扩大,各行各业对于数据存取速度的要求也越来越高.为了满足这种需求,内存数据库的思想被提出,然而传统的内存存储器DRAM由于密度和能耗的限制无法大规模集成和扩展.与此同时,非易失内存(NVM)以其性能高、密度高、能耗低的优势弥补了DRAM的不足.DRAM和NVM结合在一起组... 相似文献

5.

Parallel thinning algorithms on multicomputers: experimental study on load balancing

M. G. Montoya C. Gil I. Garcia 《Concurrency and Computation》2000,12(5):327-340

In this work, a practical implementation of two parallel thinning algorithms on a multicomputer system are described. The solution has been conceived for a multiprocessor using the SPMD (single program multiple data) programming model. Our main goal is intended to describe our experiences on data partition/distribution among processors for parallel thinning algorithms as a representative type of algorithm where communications take place between neighbor processors and the work load for each processor depends on the input data. It will be shown how the efficiency of the parallel implementation can be improved through the application of a preprocess. This preprocess is based on the analysis of the work load balance. An analysis of the communication cost is also made. Although the results shown here are concerned with the implementations of two parallel thinning algorithms we think that our proposal about data distribution is general and useful for a wide set of algorithms in the field of image processing. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

6.

Parallel polygon scan conversion algorithms: Performance evaluation on a shared bus architecture

《Computers & Graphics》1986,10(1):7-25

In this paper, three parallel polygon scan conversion algorithms have been proposed, and their performance when executed on a shared bus architecture has been compared. It has been shown that the parallel algorithm that does not use edge coherence performs better than those that use edge coherence. Further, a multiprocessing architecture has been proposed to execute the parallel polygon scan conversion algorithms more efficiently than a single shared bus architecture. 相似文献

7.

Parallel algorithms for image enhancement and segmentation by region growing, with an experimental study

David A. Bader Joseph Jájá David Harwood Larry S. Davis 《The Journal of supercomputing》1996,10(2):141-168

This paper presents efficient and portable implementations of a powerful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique that makes use of the SNF and a variant of the conventional connected components algorithm which we call -Connected Components. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient connected components algorithm based on a novel approach for parallel merging. The algorithms have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results are consistent with the theoretical analysis (and provide the best known execution times for segmentation, even when compared with machine-specific implementations). Our test data include difficult images from the Landsat Thematic Mapper (TM) satellite data.Also affiliated with the Department of Electrical Engineering.Also affiliated with the Department of Computer Science and the Center for Automation Research. 相似文献

8.

基于C/S与B/S混合架构的排水地理信息系统 总被引：1，自引：0，他引：1

下载免费PDF全文

吴建华付仲良王力陈颖《计算机工程与应用》2007,43(7):230-232,235

根据排水行业管理和服务工作的自身特点,设计和开发了一套基于C/S与B/S混合架构模型的城市主城区排水地理信息系统。着重对系统体系架构、接口设计、系统的多项关键技术进行了详细的分析和论述,并对系统实现等做了阐述。利用公共数据库技术和该文提出的DLODI模型有效地解决了多源数据的集成与共享问题。相似文献

9.

The mesh with hybrid buses: an efficient parallel architecture fordigital geometry

Lin R. Olariu S. Schwing J.L. Wang B.-F. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(3):266-280

相似文献

10.

基于DSP的混合动力汽车综合显示仪设计

窦志军何培祥李庆东张霞《电子技术应用》2009,35(8)

一款以智能彩色液晶显示器为终端的基于TMS320F2812 DSP以及CAN总线的混合动力汽车综合显示仪。通过软件实现预置画面与现场实时数据组合显示,并随时根据需要对显示界面进行修改和扩充。实验表明,该综合显示仪表不仅保证了信息处理的实时性,并能够将表征混合动力汽车工作状态的数据通过指针式画面、数值式画面或动力传输示意图及时显示出来。相似文献

11.

Parallel multigrid algorithms based on generic approximate sparse inverses: an SMP approach

Christos K. Filelis-Papadopoulos George A. Gravvanis 《The Journal of supercomputing》2014,67(2):384-407

New parallel computational techniques are introduced for the parallelization of Generic Approximate Sparse Inverse multigrid methods, based on Portable Operating System Interface for UniX (POSIX) threads, for multicore systems. Parallelization of the Generic Approximate Sparse Inverse Matrix (GenAspI) algorithm is achieved based on a new computational approach, namely “strip,” which utilizes the data independence of the rows assigned in each available processor. Additionally, new parallel computational techniques are proposed for the parallelization of a modified multigrid V-Cycle method, based on POSIX Threads, for multicore systems. The modified V-Cycle utilized a Parallel PGenAspI Preconditioned Bi-Conjugate Gradient STABilized (BiCGSTAB) as a coarse solver to ensure better parallel performance of the multigrid method. For parallelization purposes, a replication of the multigrid method function is executed on each processor with different index bands and with proper synchronization points to ensure less thread-creation overhead and to maximize parallel performance. Theoretical estimates on speedups and efficiency are also presented. Finally, numerical results for the performance of the PGenAspI algorithm and the PGenAspI–MGV method for solving classical two-dimensional boundary value problems on multicore computer systems are presented. The implementation issues of the proposed method are also discussed using POSIX threads on multicore systems. 相似文献

12.

基于异构平台的自适应图像去马赛克的OpenCL加速

《电子技术应用》2016,(4)

相似文献

13.

BOAR: an advanced HW/SW coemulation environment for DSP system development

Jouni Isoaho Vesa Köppä Jarkko Oksala Pasi OjalaAuthor vitae 《Microprocessors and Microsystems》1997,20(10):2330-615

The BOAR emulation system is targeted to hardware/software (HW/SW) codevelopment of advanced embedded DSP and telecom systems. The challenge of the BOAR system is efficient customization of programmable hardware, and dedicated partitioning routine to target applications and structures, which allows quite high overall system performance. The system allows multiple configurations for communication between processors and field programmable gate arrays (FPGAs) making the BOAR system an efficient tool for real-time HW/SW coverification. The reprogrammable hardware of the emulation tool is based on four Xilinx 4000-series devices, two Texas TMS320C50 signal processors and one Motorola MC68302 microcontroller. With current devices the BOAR hardware provides approximately 40–70 kgates of logic capacity in DSP applications. The emulation capacity can be expanded by connecting several similar boards in chain. The system has also a versatile internal reprogrammable test environment for test bench development, performance evaluations and design debugging. The logic development environment is based on the Synopsys synthesis tools and an automatic design management software, which performs resource mapping and performance-driven design partitioning between FPGAs. The emulation hardware is currently connected to logic and software development environments via an RS-232C bus. The BOAR emulation system has been found a very efficient platform for real-life prototyping of different types of DSP algorithms and systems, and validating correct functionality of a VHDL macro library. 相似文献

14.

基于Maze的混合型超级节点架构设计和实现

雷凯林彦彦刘振宇《计算机工程与设计》2008,29(14)

针对天网Maze(一个P2P网络文件系统)[1]单点中央服务器负载瓶颈的问题,通过对网络和系统真实数据的量化分析,确立可优化的网络拓扑参数;采用日志统计分析的方法,归纳出用户的行为特征,节点特性,采用K-均值类聚方法建立了用户超级节点选举算法模型.综合考虑前面两个分析结论,提出了服务器超级节点配合用户超级节点的混合型超级节点架构设计.最后描述了架构实现中关键流程和模块,并采集新构架数据进行简单评测,达到了预期的改进效果. 相似文献

15.

基于混合模式的农产品质量安全可追溯系统集成方法* 总被引：1，自引：0，他引：1

刘树田东张小栓穆维松 《计算机应用研究》2009,26(10):3804-3806

为了提升农产品质量安全管理水平,提出一种基于混合模式的农产品质量安全可追溯系统的集成方法。该方法采用C/S和B/S混合模式来构架系统,采用射频识别（RFID）和条码技术对产品进行标志、信息采集和传输,使用组件技术开发系统关键模块;最后以蔬菜为例,在对某企业实地调研的基础上,应用此方法开发了一套蔬菜质量安全可追溯系统。运行结果表明系统将供应链各部分有机联系,保证了农产品在整个供应链流动时信息流的准确、通畅和实时,实现了农产品的质量安全可追溯功能。相似文献

16.

The real-time supervisory control of an experimental manufacturing system based on a hybrid method 总被引：1，自引：0，他引：1

Murat Uzam Gkhan Gelen 《Control Engineering Practice》2009,17(10):1174-1189

In this paper, the real-time supervisory control of an experimental manufacturing system is reported based on a recently proposed hybrid (mixed PN/automaton) approach. Assuming that an uncontrolled bounded Petri net (PN) model of a (plant) discrete event system (DES) and a set of forbidden state specifications are given, the proposed approach computes a maximally permissive and nonblocking closed-loop hybrid model. The method is straightforward logically, graphically and technologically. This paper particularly shows the applicability of a hybrid (mixed PN/automaton) approach to low-level real-time DES control. To do this, programmable logic controller (PLC) based real-time control of an experimental manufacturing system is considered. 相似文献

17.

HIT-IIP: an information integrating platform for CIM system based on client/server architecture

Wang Gang Xu Xiaaw Gao Guoan 《Computers & Industrial Engineering》1996,31(3-4):603-607

This paper presents a plan of an information integrating platform based on the client/server technology, named HIT UP which developed in Harbin Institute of Technology (HIT). The function structure, the system architecture end the detailed implementation plan of the HTT-IIP are proposed. 相似文献

18.

Parallel computation on interval graphs: algorithms and experiments

A. Ferreira I. Gurin Lassous K. Marcus A. Rau‐Chaplin 《Concurrency and Computation》2002,14(11):885-910

This paper describes efficient coarse‐grained parallel algorithms and implementations for a suite of interval graph problems. Included are algorithms requiring only a constant number of communication rounds for connected components, maximum weighted clique, and breadth‐first‐search and depth‐first‐search trees, as well as communication rounds algorithms for optimization problems such as minimum interval covering, maximum independent set and minimum dominating set, where is the number of processors in the parallel system. This implies that the number of communication rounds is independent of the problem size. Implementations of these algorithms are evaluated on parallel clusters, using both Fast Ethernet and Myrinet interconnection networks, and on a CRAY T3E parallel multicomputer, with extensive experimental results being presented and analyzed. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

19.

Effect of garbage collection in iterative algorithms on Spark: an experimental analysis

Kang Minseo Lee Jae-Gil 《The Journal of supercomputing》2020,76(9):7204-7218

The Journal of Supercomputing - Spark is one of the most widely used systems for the distributed processing of big data. Its performance bottlenecks are mainly due to the network I/O, disk I/O, and... 相似文献

20.

Hybrid crossover operators for real-coded genetic algorithms: an experimental study 总被引：2，自引：0，他引：2

F. Herrera M. Lozano A.M. Sánchez 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2005,9(4):280-298

Most real-coded genetic algorithm research has focused on developing effective crossover operators, and as a result, many different types have been proposed. Some forms of crossover operators are more suitable to tackle certain problems than others, even at the different stages of the genetic process in the same problem. For this reason, techniques which combine multiple crossovers have been suggested as alternative schemes to the common practice of applying only one crossover model to all the elements in the population. Therefore, the study of the synergy produced by combining the different styles of the traversal of solution space associated with the different crossover operators is an important one. The aim is to investigate whether or not the combination of crossovers perform better than the best single crossover amongst them. In this paper we have undertaken an extensive study in which we have examined the synergetic effects among real-parameter crossover operators with different search biases. This has been done by means of hybrid real-parameter crossover operators, which generate two offspring for every pair of parents, each one with a different crossover operator. Experimental results show that synergy is possible among real-parameter crossover operators, and in addition, that it is responsible for improving performance with respect to the use of a single crossover operator. 相似文献