期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

新型体系结构概念—虚拟寄存器与并行的指令处理部件 总被引：4，自引：1，他引：3

李三立廖恒《小型微型计算机系统》1995,16(6):6-11

随着程序对地址空间的需求日益提高，研究者提出了虚拟存储器概念，使程序访问的地址空间免受物理存储器的限制。随着面向寄存器的ＲＩＳＣ技术发展以及多发射结构中指令调度的日益重要，我们提出了虚拟寄存器的新概念，使寄存器空间不受物理寄存器堆大小的束缚，有利于指令调度和寄存器重新命名技术，提高指令级并行性ＩＬＰ。此外，现代新型ＲＩＳＣ处理机都着重于加强数据处理部件中的执行并行度，忽略了放在存储器中指令的处理。相似文献

2.

结合的指令调度与寄存器分配技术*

黄磊冯晓兵《计算机应用研究》2008,25(4):979-982

提出了很多结合技术使得指令调度与寄存器分配之间进行一些信息交互,在没有引入过多溢出代码的情况下提高了指令级并行度,从而提高了性能。按照算法的特征分类介绍了几种影响力较大的算法,同时作了简单的评价和效果比较,最后介绍了有关指令调度和寄存器分配结合的一些新方向。相似文献

3.

一个VLIW体系结构的单片多处理机

汤志忠张赤红《计算机研究与发展》1993,30(10):1-8

本介绍一个采用ＶＬＩＷ超长指令字体系结构的高性能单片多处理机，在这个体系结构中采用流水寄存器堆来消除循环程序内的数据相关，从而使程序能够在指令级以极高的并行度并行运行。模拟实验结果表明这个体系结构具有很高的运算速度和很好的性能价格比。相似文献

4.

一种支持多重循环软件流水的寄存器结构 总被引：1，自引：0，他引：1

容红波汤志忠《软件学报》2000,11(3):401-409

寄存器结构及其分配是软件流水算法的关键之一.为支持多重循环的软件流水,该文提出一种新颖的寄存器结构：半共享跳跃式流水寄存器堆.它可以有效地解决多重循环软件流水下的特殊问题,即：同层次和跨层次的寄存器重命名问题以及断流问题;有效地消除外层循环的体间读写相关,提高程序的指令级并行度.它有3种分配方式可供灵活使用：单个寄存器、流水寄存器和寄存器组方式.流水寄存器方式对生存期确定的、局限于一个循环层次的寄存器重命名问题提供简单而有效的支持.寄存器组分配方式解决了多重循环软件流水时变量生存期不确定的情况.跳跃操作为相似文献

5.

一种基于寄存器压力的VLIW DSP分簇算法 总被引：1，自引：0，他引：1

雷一鸣洪一徐云姜海涛《计算机应用》2010,30(1):274-276

寄存器是程序运行时最宝贵的资源之一,软件流水在对VLIW DSP指令调度的同时,会显著增加寄存器的压力,从而导致寄存器溢出,软件流水中止。在以往的研究中,软件流水之前的指令分簇会更多地考虑指令并行性,往往会把寄存器的压力交给寄存器分配阶段,当物理寄存器不够分配时会造成寄存器溢出。通过考察指令运行时的寄存器压力情况对指令进行分簇,这样可根据各个簇的寄存器压力的动态信息减少寄存器的溢出,提高指令运行效率。相似文献

6.

一种基于活跃周期的低端口数低能耗寄存器堆设计

赵雨来李险峰佟冬孙含欣陈杰程旭《计算机学报》2008,31(2):299-308

多端口寄存器堆有助于挖掘指令级和线程级并行性,但同时带来面积、能耗和访问时间的压力.文章面向超标量和SMT处理器,给出了一种方法,即通过增加一个小的活跃值堆(Active Value File,AVF)选择性地保存处于活跃周期(从产生到最后一次使用之间)的物理寄存器值.AVF结构可分担主寄存器堆的访问压力并降低端口数目,实现简单且具有写过滤的特点.在获得较大幅度能耗降低的同时不影响时钟频率且IPC损失较小. 相似文献

7.

基于数据流图的异构VLIW DSP分簇方法

邱鹏飞洪一耿锐徐云《计算机应用》2011,31(4):935-937

超长指令字数字信号处理器(VLIW DSP)的指令级并行性(ILP)主要通过指令分簇和软件流水来实现。在以前的研究中,指令分簇主要只考虑指令级并行性和减少簇间转移指令,对异构体系结构和某些寄存器只能分在指定簇上的情况考虑较少。提出一种基于数据流图(DFG)的异构体系结构上的分簇方法,利用指令的相关性将DFG划分为与簇数目相同个数的子图,再根据特殊寄存器对簇的要求采用启发式算法对子图进行调整,实验结果表明这种分簇方法使得负载更均衡,加速比相对于传统方法可以提高8%左右。相似文献

8.

虚拟寄存器结构 总被引：3，自引：1，他引：2

廖恒李三立《计算机学报》1996,19(11):801-809

虚拟存会器概念在名已经沿用近３０年，鉴于面向寄存器的ＲＩＳＣ结构的迅速发展以及寄存器对指令级并行性的重要性，本文首先提出了虚拟寄存器的新概念。虚拟寄存器结构是指令级并行调度和发射ＴｒａｃｅＭｅｒｇｉｎｇ算法在处理机体系结构上的一种实现方法。相似文献

9.

VelociTI结构浮点DSPs寄存器堆读写的流水线设计

下载免费PDF全文

胡正伟仲顺安陈禾《计算机工程》2007,33(21):237-239

研究了VelociTI结构浮点数字信号处理器寄存器堆的流水线读写原理并提出了一种设计方法。该方法对单操作数双精度浮点指令采用2个32位数据通路用1个流水线周期读取源操作数，双操作数双精度浮点指令采用锁定译码单元，利用若干流水线周期读取源操作数。采用写控制向量的方法实现了流水线多个周期执行写操作。该方法正确实现了基于IEEE754标准的双精度浮点数据在寄存器堆与功能单元之间的32位数据通路上的传输，仿真结果验证了其正确性。相似文献

10.

基于谓词代码的编译优化技术研究

田祖伟孙光《计算机科学》2010,37(5):130-133

程序中大量分支指令的存在,严重制约了体系结构和编译器开发并行性的能力。有效发掘指令级并行性的一个主要挑战是要克服分支指令带来的限制。利用谓词执行可有效地删除分支,将分支指令转换为谓词代码,从而扩大了指令调度的范围并且删除了分支误测带来的性能损失。阐述了基于谓词代码的指令调度、软件流水、寄存器分配、指令归并等编译优化技术。设计并实现了一个基于谓词代码的指令调度算法。实验表明,对谓词代码进行编译优化,能有效提高指令并行度,缩短代码执行时间,提高程序性能。相似文献

11.

Understanding the Thermal Implications of Multi-Core Architectures

Chaparro P. Gonzalez J. Magklis G. Cai Qiong Gonzalez A. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(8):1055-1065

Multicore architectures are becoming the main design paradigm for current and future processors. The main reason is that multicore designs provide an effective way of overcoming instruction-level parallelism (ILP) limitations by exploiting thread-level parallelism (TLP). In addition, it is a power and complexity-effective way of taking advantage of the huge number of transistors that can be integrated on a chip. On the other hand, today's higher than ever power densities have made temperature one of the main limitations of microprocessor evolution. Thermal management in multicore architectures is a fairly new area. Some works have addressed dynamic thermal management in bi/quad-core architectures. This work provides insight and explores different alternatives for thermal management in multicore architectures with 16 cores. Schemes employing both energy reduction and activity migration are explored and improvements for thread migration schemes are proposed. 相似文献

12.

Improving the efficiency of inductive logic programming systems

Nuno A. Fonseca Vítor Santos Costa Ricardo Rocha Rui Camacho Fernando Silva 《Software》2009,39(2):189-219

Inductive logic programming (ILP) is a sub‐field of machine learning that provides an excellent framework for multi‐relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well‐known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

13.

Building a retargetable local instruction scheduler

Vicki Allan Steven J. Beaty Bogong Su Philip H. Sweany 《Software》1998,28(3):249-283

While high-performance architectures have included some Instruction-Level Parallelism (ILP) for at least 25 years, recent computer designs have exploited ILP to a significant degree. Although a local scheduler is not sufficient for generation of excellent ILP code, it is necessary as many global scheduling and software pipelining techniques rely on a local scheduler. Global scheduling techniques are well-documented, yet practical discussions of local schedulers are notable in their absence. This paper strives to remedy that disparity by describing a list scheduling framework and several important practical details that, taken together, allow implementation of an efficient local instruction scheduler that is easily retargetable for ILP architectures. The foundation of our machine-independent instruction scheduler is a timing model that allows easy retargetability to a wide range of architectures. In addition to describing how a general list-scheduler can be implemented within the framework of our timing model, experimental results indicate that lookahead scheduling can profoundly improve a scheduler's ability to produce a legal schedule. Further experimental data shows that deciding to schedule a data dependence DAG (DDD) in forward or reverse order depends significantly upon that target architecture, suggesting the possibility of scheduling in each direction and using the best of the two schedules. In contrast, experiments demonstrate little difference in code quality for schedules generated by either instruction-driven or operation-driven schedulers. Thus, the inherent flexibility of operation-driven methods suggests including that approach in a retargetable instruction scheduler. List scheduling is, of course, a heuristic scheduling method. A variety of scheduling heuristics are presented. In addition, the paper describes a method, using a genetic algorithm search, to ‘fine-tune’ the weights of twenty-four individual heuristics to form a DDD-node heuristic tuned to a specific architecture. © 1998 John Wiley & Sons, Ltd. 相似文献

14.

Using the BSP cost model to optimise parallel neural network training

R. O. Rogers D. B. Skillicorn 《Future Generation Computer Systems》1998,14(5-6):409-424

We derive cost formulae for three different parallelisation techniques for training both supervised and unsupervised networks. These formulae are parameterised by properties of the target computer architecture. It is therefore possible to decide both which technique is best for a given parallel computer, and which parallel computer best suits a given technique. One technique, exemplar parallelism, is far superior to almost all parallel computer architectures. Formulae also take into account optimal batch learning as the overall training approach. Cost predictions are made for several of today's popular parallel computers. 相似文献

15.

汽油辛烷值NIR数据处理与建模仿真

王瑾蒋书波《计算机与应用化学》2011,28(7):947-950

随着当今计算机与各类程序软件的开发使用,化学计量学不断发展,人们可以在近红外光谱区内采集大量的数据,并使用各种有效的统计方法,把近红外光谱技术应用于定性与定量.近红外光谱为分子振动光谱的倍频和组合频谱带,主要是对含氢基团的吸收,包含有绝大多数类型有机物的组成与分子结构的丰富信息.原理是基于不同的基团或同一基团在不同化学... 相似文献

16.

数据库系统安全浅析

郑燕玲《数字社区&智能家居》2010,(2):271-272

随着计算机技术的飞速发展,数据库的应用深入到各个领域,但随之产生的各种应用系统的数据库中大量数据的安全问题、敏感数据的防窃取和防篡改问题,越来越引起人们的高度重视。该文就数据库安全问题,提出了实现其安全性的几种常用方法．并在数据库安全性方面作了一些探讨。相似文献

17.

Scalable multicore architectures for long DNA sequence comparison

Friman Snchez Felipe Cabarcas Alex Ramirez Mateo Valero 《Concurrency and Computation》2011,23(17):2205-2219

Biological sequence comparison is one of the most important tasks in Bioinformatics. Owing to the fast growth of databases that contain biological information, sequence comparison represents an important challenge for high‐performance computing, especially when very long sequences are compared, i.e. the complete genome of several organisms. The Smith–Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). Concurrently, there is a paradigm shift towards chip multiprocessors in computer architecture, which offer a huge amount of potential performance that can only be exploited efficiently if applications are effectively mapped and parallelized. In this work, we analyze how large‐scale biology sequence comparison takes advantage of the current and future multicore architectures. Our starting point is the performance analysis of the current multicore IBM Cell B.E. processor; we analyze two different SW implementations on the Cell B.E. Then, using simulation tools, we study the performance scalability when a many‐core architecture is used for performing long DNA sequence comparison. We investigate the efficient memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture can be an efficient alternative to execute challenging bioinformatic workloads. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

18.

计算机体系结构的分类模型 总被引：6，自引：1，他引：5

沈绪榜张发存冯国臣车得亮王光《计算机学报》2005,28(11):1759-1766

根据计算机体系结构的发展,以指令流（instruction stream）计算、数据流（data stream）计算与构令流（configuration stream）计算的概念为基础,提出了一种新的计算机体系结构的分类模型. 相似文献

19.

Planning for the Solid-State Millennium

Gerald W. Houston 《Information Systems Management》1988,5(2):39-44

The most common performance bottleneck in today's large-scale commercial computer systems is disk I/O. And the problem is worsening with each new generation of magnetic disk. This article identifies the trends that made this situation inevitable and explains why the next performance revolution will not be the introduction of faster CPUs or exotic new parallel architectures but the widespread use of technology that is already available - semiconductor-based mass storage. 相似文献

20.

数据挖掘中pagerank算法研究

刘学超《计算机光盘软件与应用》2012,(2):24+34

计算机技术和网络的迅速发展,使人们获取信息和投放信息的方式发生了翻天地覆的改变,开放式、全球化的互联网络使得当今社会进入了信息大爆炸的社会,如何将种类繁多数量巨大的数据转化为有用的信息和知识变得极为重要。本文对数据挖掘中常用的PageRank算法的应用进行了研究,提出了PageRank算法的优化改进策略。相似文献