首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
分析了处理器访存操作的指针追逐模式,指出了链式数据应用中的指针追逐操作的数据预取准确率低、访存延迟大的问题。为了提升处理器指针追逐访存性能,提出了指令标签辅助的数据预取(ILAMP)技术。ILAMP技术是一种指令标签提示的预取机制,其通过在指令集架构中添加新的访存指令,使该指令在处理器译码阶段产生特殊访存标签,指明该访存操作的加载内容是指针。在Cache缺失的情况下,该标签一直传递到内存控制器。当加载的指针返回内存控制器时,则提取指针、发出预取请求。实验结果表明,ILAMP技术与无ILAMP情况相比,ILAMP技术降低DRAM读请求的平均访问延迟的平均值约为15%,预取精度高于77%,访存带宽增加10%左右,硬件开销约为1k B。  相似文献   

2.
为了提高异构多核处理器的性能和资源利用率,研究了优化异构多核处理器的程序调度方法。针对异构多核处理器的特点,提出了一种基于神经网络的低开销程序性能预测的调度模型。该调度模型根据程序固有特征预测各个程序在不同处理器核上的性能,然后根据性能预测找出程序与处理器核之间的最优匹配方案进行调度。试验证明,该调度模型对于异构多核处理器的性能和能效都取得了很好的提升效果,超过了现有的轮转调度、抽样调度和性能影响评估(PIE)调度。相比于轮转调度,该调度模型在处理器性能和能效上分别取得了13.64%和10.78%的提升。  相似文献   

3.
为了提高非一致内存访问(NUMA)架构虚拟机解释器的访存性能,研究了解释器在NUMA架构下的访存优化技术,提出了一种NUMA架构下的解释器访存优化方案,而且设计并实现了解释器的静态指令分派优化方法和动态指令分派优化方法。根据这一方案虚拟机启动时首先获取NUMA节点信息,并在每个NUMA节点中自动生成解释器所需的全部数据结构;解释器在运行时,通过静态或动态的指令分派技术来实现其执行线程在NUMA节点上访存的局部化。试验结果表明,上述方法能够显著提升解释器在NUMA系统中的性能。在DaCapo测试集上的总体性能提升了8%,最高性能提升幅度高达23%,而且算法实现代价低,适用于绝大多数的NUMA服务器系统。  相似文献   

4.
针对多核处理器上并行程序执行不确定性所造成的并行调试难问题,提出了一种基于硬件的快速确定性重放方法——时间切割者。该方法采用面向并行的记录机制来区分出原执行中并行执行的访存指令块和非并行执行的指令块,并在重放执行中避免串行执行那些在原执行中并行执行的访存指令块,从而使得重放执行的性能开销小。在多核模拟器Sim-Godson上的仿真实验结果表明:该方法的重放速度快,其性能开销仅为2%左右。此外,该方法还具有硬件支持简单特点,未来有望应用于国产多核处理器研制中。  相似文献   

5.
嵌入式图形处理器(GPU)随着访存数据量越来越大,访存子系统在性能、面积及功耗等方面的瓶颈已经日益凸显。针对图形处理器的数据特点及访存需求,考虑到嵌入式图形处理器面积及功耗的约束,结合Godson GPU架构平台,提出了一种面向嵌入式图形处理器的访存子系统结构设计。该设计主要针对图形处理流水线的访存特点,对cache的结构进行了优化,并提出了一种基于链表方式的结构,提高了访存的效率,减少了面积且降低了功耗。为了使访存子系统适配并行图形流水线,提出了一种屏幕分区方法,可以在消除cache的一致性问题的同时,使访存子系统的负载更加均衡。该设计为嵌入式图形处理器的访存子系统设计提供了借鉴。  相似文献   

6.
为解决多核处理器系统中的实时任务调度问题,尤其是实时任务和非实时任务的混合调度问题,在对最早截止时间优先(EDF)算法进行改进的基础上,提出多核处理器混合任务调度算法——EDF-segment算法.EDF-segment算法可以整理调度混合任务时出现的碎片,并通过对碎片的迁移、合并提高处理器的利用率,从而提高系统处理混合任务的性能.通过EDF-segment算法不但可以解决混合任务的调度问题,还可以避免使用EDF算法时造成的多核处理器利用率下降,在保证实时任务处理延迟的前提下提升多核处理器的利用率.经过理论推导和实验分析证明,EDF-segment算法可以有效地应用于多核处理器系统中.  相似文献   

7.
对多内核共享L2 cache时的cache污染问题进行了研究,认为内核猜测执行预测路径上的访存指令导致的cache污染会严重影响处理器性能.提出了一种基于置信度评估的cache污染过滤技术FCPC,该技术置信度评估机制对条件分支进行动态评估,并为每个cache数据行新增两个标志位--置信度评估标志(CET)和访问指示标...  相似文献   

8.
针对现有异构多核处理器任务调度算法效率低的问题,提出了一种综合性的、高效的静态任务调度算法,即聚簇与复制列表优化调度(CDLOS)算法.该算法首先通过对任务图进行聚簇优化,降低某些特殊任务的通信开销;然后从整个任务图的拓扑结构出发计算任务的优先级权值,提高关键任务的优先级;继而采用区间插入和任务复制技术进行调度,降低处理器资源浪费;最后通过优化调度结果,消除冗余任务,减小整个任务的调度长度.实例分析和模拟实验结果表明:与以往算法相比,此新算法较高地提升了多核处理器任务调度的效率,具有更好的应用前景.  相似文献   

9.
为提高网络入侵检测系统(NIDS)在互联网流量和网络攻击数量增长下的性能,进行了在多核处理器上利用并行结构提高NIDS处理能力的研究.首先实现了NIDS在TILERA-GX36众核处理器上的数据并行(RTC)和任务并行(SPL)这两种并行机构方法,实验结果表明众核处理器上丰富的计算资源支持大量并行的NIDS实例,但同时也带来严重的资源竞争和冲突,系统并行化开销大大增加.为此,提出了一种基于共享的RTC方法,即SRTC方法,和已有方法相比,SRTC方法解决了RTC模型内存占用线性增长的问题,同时避免了SPL模型中的线程间通信开销.以开源NIDS软件Snort为基础,在TILERA-GX36众核处理器上对SRTC方法进行了实现和验证,实验结果证明采用SRTC的并行系统获得了类似线性的加速比,当加载超过7000条NIDS真实规则条目时,系统能够处理包长为1K字节的10Gbps的网络流量.  相似文献   

10.
针对当前采用独立显卡的桌面计算机系统架构普遍存在的CPU(中心处理单元)访问GPU(图形处理单元)存储空间数据传输延迟较大的瓶颈,采用了龙芯GS464处理器核心实现的UA(uncache acceleration)机制对GPU驱动程序中的GPU存储空间访问接口进行优化,极大地提高了处理器向GPU等IO存储空间连续数据写入的速度。详细分析了龙芯处理器UA机制的原理及其相对于uncache方式IO写所能带来的性能提升。通过UA机制优化了龙芯3A+2H平台的GPU驱动性能,x11perf测试结果显示,采用UA优化GPU驱动后,Xserver的一些接口性能提升达5%~230%。将龙芯处理器的UA机制封装到了标准MMAP系统调用,并通过该扩展后的系统调用优化了Xserver的Xvideo扩展接口,实验结果显示,播放常见较高分辨率视频时该接口性能能够有6~12倍的性能提升。  相似文献   

11.
Effective control of batch processors is very essential to improve on-time delivery of wafers in semiconductor manufacturing. In this paper, the focus is on mean tardiness performance of a batch processor in a two-stage processor system by including an upstream serial processor. Two new control strategies are proposed for this problem. The first strategy effectively incorporates the product information at the upstream serial station in batching decisions. The second strategy further applies a re-sequencing approach in the serial processor's queue when there is a benefit in shortening the arrival time of an urgent product. Discrete event simulation is used to test the performance of the strategies. Results are very promising as compared to benchmark control strategies.  相似文献   

12.
Although the scheduling problem of the uplink transmission in the IEEE 802.16 broadband wireless access network is extensively discussed, most of the results are limited to the quality of service (QoS) upon throughput and delay requirement. But as in practice only limited wireless resources are made available, a fairness-based scheduling upon each connection?s QoS provides better outcomes. In this study, the authors propose a new fair uplink scheduling for real-time polling service and non-real-time polling service with the proportional sharing of excess bandwidth of the network. To implement the proposed fair scheduling that satisfies the delay requirement and full bandwidth utilisation, the authors introduce the rate control algorithm. With the proposed scheduling, we guarantee the fairness, delay requirement and full bandwidth utilisation which are not fully achieved in the existing results.  相似文献   

13.
Seo YH  Lee YH  Yoo JS  Kim DW 《Applied optics》2012,51(18):4003-4012
In this paper we propose a hardware architecture for high-speed computer-generated hologram generation that significantly reduces the number of memory access times to avoid the bottleneck in the memory access operation. For this, we use three main schemes. The first is pixel-by-pixel calculation, rather than light source-by-source calculation. The second is a parallel calculation scheme extracted by modifying the previous recursive calculation scheme. The last scheme is a fully pipelined calculation scheme and exactly structured timing scheduling, achieved by adjusting the hardware. The proposed hardware is structured to calculate a row of a computer-generated hologram in parallel and each hologram pixel in a row is calculated independently. It consists of and input interface, an initial parameter calculator, hologram pixel calculators, a line buffer, and a memory controller. The implemented hardware to calculate a row of a 1920×1080 computer-generated hologram in parallel uses 168,960 lookup tables, 153,944 registers, and 19,212 digital signal processing blocks in an Altera field programmable gate array environment. It can stably operate at 198 MHz. Because of three schemes, external memory bandwidth is reduced to approximately 1/20,000 of the previous ones at the same calculation speed.  相似文献   

14.
The problem of scheduling batch processors is important in some industries and, at a more fundamental level, captures an element of complexity common to many practical scheduling problems. We describe a branch and bound procedure applicable to a batch processor model with arbitrary job processing times, job weights and job sizes. The scheduling objective is to minimize total weighted completion time. We find that the procedure returns optimal solutions to problems of up to 25 jobs in reasonable CPU time, and can be adapted for use as a heuristic for larger problems.  相似文献   

15.
Kim  T. Lim  J.T. 《Communications, IET》2010,4(1):32-38
IEEE 802.16 is a standardisation for a broadband wireless access in metropolitan area networks (MAN). Since the IEEE 802.16 standard defines the concrete quality of service (QoS) requirement, a scheduling scheme is necessary to achieve the QoS requirement. Many scheduling schemes are proposed with the purpose of throughput optimisation and fairness enhancement, however, few scheduling schemes support the delay requirement. In this study, the authors propose a new scheduling scheme reflecting the delay requirement. Specifically, the authors add the delay requirement term in the proportional fair scheduling scheme and the scheduling parameters are optimised with respect to the QoS requirement. Therefore the QoS requirement is achieved without the excessive resource consumption.  相似文献   

16.
Trucks are the most popular transport equipment in most mega-terminals, and scheduling them to minimize makespan is a challenge that this article addresses and attempts to resolve. Specifically, the problem of scheduling a fleet of trucks to perform a set of transportation jobs with sequence-dependent processing times and different ready times is investigated, and the use of a genetic algorithm (GA) to address the scheduling problem is proposed. The scheduling problem is formulated as a mixed integer program. It is noted that the scheduling problem is NP-hard and the computational effort required to solve even small-scale test problems is prohibitively large. A crossover scheme has been developed for the proposed GA. Computational experiments are carried out to compare the performance of the proposed GA with that of GAs using six popular crossover schemes. Computational results show that the proposed GA performs best, with its solutions on average 4.05% better than the best solutions found by the other six GAs.  相似文献   

17.
The improvements in request throughput that result from the use of the shortest seek time first (SSTF) request, scheduling algorithm for major/minor loop organized magnetic bubble memories are considered. For the satisfaction of read requests the bubble memory can be considered as equivalent to a file drum with fixed block size. Bubble memories are considered with 64 kbits per chip running at a 5.6 μs stepping rate and serving 750 to 830 read requests pet second (as opposed to ≤ 419 requests per second without queueing) with both uniform and Poisson arrival rates. A priority interrupt algorithm is implemented that assures that all requests are served in ≤ 60ms, while average service times are 10 to 20 ms. Results of simulation runs, corresponding to the various cases of interest, are presented. It is concluded that request queueing with the appropriate scheduling algorithm is a practical way of improving bubble memory performance.  相似文献   

18.
This paper addresses bi-objective cyclic scheduling in a robotic cell with processing time windows. In particular, we consider a more general non-Euclidean travel time metric where robot’s travel times are not required to satisfy the well-known triangular inequality. We develop a tight bi-objective mixed integer programming (MIP) model with valid inequalities for the cyclic robotic cell scheduling problem with processing time windows and non-Euclidean travel times. The objective is to minimise the cycle time and the total robot travel distance simultaneously. We propose an iterative ε-constraint method to solve the bi-objective MIP model, which can find the complete Pareto front. Computational results both on benchmark instances and randomly generated instances indicate that the proposed approach is efficient in solving the cyclic robotic cell scheduling problems.  相似文献   

19.
Li  C. Wang  X. 《Communications, IET》2008,2(4):573-586
The authors treat the multiuser scheduling problem for practical power-controlled code division multiple access (CDMA) systems under the opportunistic fair scheduling (OFS) framework. OFS is an important technique in wireless networks to achieve fair and efficient resource allocation. Power control is an effective resource management technique in CDMA systems. Given a certain user subset, the optimal power control scheme can be derived. Then the multiuser scheduling problem refers to the optimal user subset selection at each scheduling interval to maximise certain metric subject to some specific physical-layer constraints. The authors propose discrete stochastic approximation algorithms to adaptively select the user subset to maximise the instantaneous total throughput or a general utility. Both uplink and downlink scenarios are considered. They also consider the time-varying channels where the algorithm can track the time-varying optimal user subset. Simulation results to show the performance of the proposed algorithms in terms of the throughput/ utility maximisation, the fairness, the fast convergence and the tracking capability in time-varying environments are presented.  相似文献   

20.
Non-orthogonal multiple access (NOMA) is one of the key 5G technology which can improve spectrum efficiency and increase the number of user connections by utilizing the resources in a non-orthogonal manner. NOMA allows multiple terminals to share the same resource unit at the same time. The receiver usually needs to configure successive interference cancellation (SIC). The receiver eliminates co-channel interference (CCI) between users and it can significantly improve the system throughput. In order to meet the demands of users and improve fairness among them, this paper proposes a new power allocation scheme. The objective is to maximize user fairness by deploying the least fairness in multiplexed users. However, the objective function obtained is non-convex which is converted into convex form by utilizing the optimal Karush-Kuhn-Tucker (KKT) constraints. Simulation results show that the proposed power allocation scheme gives better performance than the existing schemes which indicates the effectiveness of the proposed scheme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号