期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Abid M. Elabdalla Riyad A. Husein 《Computers & Education》1991,17(4)

A procedure is described to prepare an exam schedule which meets the following goals: (1) No student should have two exams at the same time. (2) No student should take more than two exams in a day. (3) The number of exam days should be a minimum. (4) The total number of students having two exams in a day is minimum. (5) The number of students having two consecutive exams in a day is a minimum. 相似文献

2.

An optimal resource scheduler for continuous display of structuredvideo objects

Escobar-Molano M.L. Gandeharizadeh S. Ierardi D. 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(3):508-511

A structured video consists of a collection of background objects, characters, spatial and temporal constructs, and rendering features. Assuming a platform consisting of a fixed amount of memory and a magnetic disk drive, this study presents a resource scheduler for the continuous display of structured video that minimizes both the latency observed by a display and its required amount of memory 相似文献

3.

超标量处理器中引入SMT技术的性能分析研究

下载免费PDF全文

史莉雯樊晓桠黄小平《计算机工程与应用》2009,45(5):13-15

同时多线程（SMT）是一种允许多个独立的线程每周期发射多条指令的技术,这种技术充分利用了可能存在的指令级并行和线程级并行,提高了有限资源的利用率。文章以西北工业大学航空微电子中心自主研发的32位超标量处理器“龙腾R2”为基础,引入SMT技术,在基本不改变内部结构大小、不增加执行功能部件、仅做一些必要修改的前提条件下进行研究。通过仿真不同的线程数和各种线程组合,进行性能分析。尽管存在制约性能提升的一些因素,引入SMT技术后依然获得了最高约50%的性能增加。相似文献

4.

Analysing superscalar processor architectures with coloured Petri nets

F.P. Burns A.M. Koelmans A.V. Yakovlev 《International Journal on Software Tools for Technology Transfer (STTT)》1998,2(2):182-191

相似文献

5.

Accurately modeling superscalar processor performance with reduced trace

Kiyeon Lee Sangyeun Cho 《Journal of Parallel and Distributed Computing》2013

Trace-driven simulation of out-of-order superscalar processors is far from straightforward. The dynamic nature of out-of-order superscalar processors combined with the static nature of traces can lead to large inaccuracies in the results when the traces contain only a subset of executed instructions for trace reduction. In this paper, we describe and comprehensively evaluate the pairwise dependent cache miss model (PDCM), a framework for fast and accurate trace-driven simulation of out-of-order superscalar processors. The model determines how to treat a cache miss with respect to other cache misses recorded in the trace by dynamically reconstructing the reorder buffer state during simulation and honoring the dependencies between the trace items. Our experimental results demonstrate that a PDCM-based simulator produces highly accurate simulation results (less than 3% error) with fast simulation speeds (62.5× on average) compared with an execution-driven simulator. Moreover, we observed that the proposed simulation method is capable of preserving a processor’s dynamic off-core memory access behavior and accurately predicting the relative performance change when a processor’s low-level memory hierarchy parameters are changed. 相似文献

6.

V830R/AV: embedded multimedia superscalar RISC processor

Suzuki K. Arai T. Nadehara K. Kuroda I. 《Micro, IEEE》1998,18(2):36-47

The V830R/AV's real-time decoding of MPEG-2 video and audio data enables practical embedded-processor-based multimedia systems 相似文献

7.

Scalable processor instruction set extension

Becker J. Thomas A. 《Design & Test of Computers, IEEE》2005,22(2):136-148

Coarse-grained reconfigurable platforms are good for parallel data-intensive applications but inefficient for sequential control-dominated code. This article explores the integration of the general purpose Sparc-compliant Leon processor with the Extreme Processing Platform reconfigurable data path. The integration's goal is to optimize the execution of complex multimedia applications such as MPEG-4. 相似文献

8.

An abductive-based scheduler for air-crew assignment

A. C. Kakas A. Michael 《Applied Artificial Intelligence》2013,27(3):333-360

This article presents the design and implementation of an air-crew assignment system, for producing and refining a solution to this problem, based on the artificial intelligence principles and techniques of abductive reasoning as captured by the framework of abductive logic programming (ALP). The system offers a high level of flexibility in addressing both the tasks of crew scheduling and rescheduling. Itcan be used to generate a valid and good quality initial solution and then help the human operators adjust and refine further this solution in order to meet extra requirements of the problem. These additional needs can arise either due to new foreseen requirements that the company wants to have or experiment with for a particular period in time, or due to unexpected events that have occurred while the solution (crew-roster) is in operation. This work shows the ability and flexibility of abduction, and, more specifically, of ALP, in tackling problems of this type with complex and changing requirements. 相似文献

9.

Exploiting selective instruction reuse and value prediction in a superscalar architecture

Arpad Gellert Adrian Florea Lucian Vintan 《Journal of Systems Architecture》2009,55(3):188-195

In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties. Our statistics show that about 28% of branches are dependent on critical Load instructions. Moreover, 5.61% of branches are unbiased and depend on critical Loads, too. In the same way, about 21% of branches depend on MUL/DIV instructions whereas 3.76% are unbiased and depend on MUL/DIV instructions. These dependences involve high-penalty mispredictions becoming serious performance obstacles and causing significant performance degradation in executing instructions from wrong paths. Therefore, the negative impact of (unbiased) branches over global performance should be seriously attenuated by anticipating the results of long-latency instructions, including critical Loads. On the other hand, hiding instructions’ long latencies in a pipelined superscalar processor represents an important challenge itself. We developed a superscalar architecture that selectively anticipates the values produced by high-latency instructions. In this work we are focusing on multiply, division and loads with miss in L1 data cache, implementing a dynamic instruction reuse scheme for the MUL/DIV instructions and a simple last value predictor for the critical Load instructions. Our improved superscalar architecture achieves an average IPC speedup of 3.5% on the integer SPEC 2000 benchmarks, of 23.6% on the floating-point benchmarks, and an improvement in energy-delay product (EDP) of 6.2% and 34.5%, respectively. We also quantified the impact of our developed selective instruction reuse and value prediction techniques in a simultaneous multithreaded architecture (SMT) that implies per thread reuse buffers and load value prediction tables. Our simulation results showed that the best improvements on the SPEC integer applications have been obtained with 2 threads: an IPC speedup of 5.95% and an EDP gain of 10.44%. Although, on the SPEC floating-point programs, we obtained the highest improvements with the enhanced superscalar architecture, the SMT with 3 threads also provides an important IPC speedup of 16.51% and an EDP gain of 25.94%. 相似文献

10.

多核同时多线程处理器的线程调度器设计

《电子技术应用》2016,(1):19-21

多核同时多线程处理器(SMT_PAAG)是用于图形、图像及数字信号处理的一种多核处理器。基于这种处理器提出了一种硬件线程调度器,该调度器采用同时多线程技术,最多可同时执行四个线程,支持八个线程阻塞模式下的快速上下文切换。这样避免了因阻塞带来的等待问题,能够有效提高处理器的工作效率和资源利用率。通过在处理器上运行图形处理算法进行性能评测。结果表明,SMT-PAAG处理器通过挖掘指令级并行和线程级并行,将处理器的性能提高了69.25%。相似文献

11.

An implementable parallel scheduler for input-queued switches 总被引：1，自引：0，他引：1

Giaccone P. Shah D. Prabhakar B. 《Micro, IEEE》2002,22(1):19-25

The Apsara algorithm is an input-queued switch scheduler that uses limited parallelism to find a matching in a single iteration, as compared to the O(N³) iterations of the more common maximum-weight matching algorithm. The Apsara algorithm also achieves a throughput of up to 100 percent and has very good delay properties 相似文献

12.

Variable assignment and instruction scheduling for processor with multi-module memory 总被引：1，自引：0，他引：1

Lei ZhangAuthor VitaeMeikang QiuAuthor Vitae Edwin H.-M. ShaAuthor Vitae Qingfeng ZhugeAuthor Vitae 《Microprocessors and Microsystems》2011,35(3):308-317

Multi-module memory has been employed in high-end digital signal processing system (DSP). It provides high memory bandwidth and low power operating mode for energy savings. However, making full use of these architectural features is a challenging problem for code optimization. In this paper, we propose an integer linear programming (ILP) model to optimize the performance and energy consumption of multi-module memories by solving variable assignment, instruction scheduling and operating mode setting problems simultaneously. The combined effect of performance and energy saving requirements has been considered as well. Specially, we develop two optimization techniques to improve the computation efficiency of our ILP model. The experimental results show that the optimal performance and energy solution can be achieved within a reasonable amount of time. 相似文献

13.

An optimal scheduling procedure for matrix inversion on linear array at a processor level

M. K. Stojčev E. I. Milovanović I. Ž. Milovanović 《International journal of parallel programming》1994,22(4):435-448

This paper presents a parallel algorithm for computing the inversion of a dense matrix based on modified Jordan's elimination which requires fewer calculation steps than the standard one. The algorithm is proposed for the implementation on the linear array with a small to moderate number of processors which operate in a parallel-pipeline fashion. A communication between neighboring processors is achieved by a common memory module implemented as a FIFO memory module. For the proposed algorithm we define a task scheduling procedure and prove that it is time optimal. In order to compute the speedup and efficiency of the system, two definitions (Amdahl's and Gustafson's) were used. For the proposed architecture, involving two to 16 processors, estimated Gustafson's (Amdahl's) speedups are in the range 1.99 to 13.76 (1.99 to 9.69). 相似文献

14.

A two-stage hardware scheduler combining greedy and optimal scheduling

Raymond R. Hoare Zhu Ding Alex K. Jones 《Journal of Parallel and Distributed Computing》2008

Greedy scheduling heuristics provide a low complexity and scalable albeit particularly sub-optimal strategy for hardware-based crossbar schedulers. In contrast, the maximum matching algorithm for Bipartite graphs can be used to provide optimal scheduling for crossbar-based interconnection networks with a significant complexity and scalability cost. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By leveraging the inherent parallelism available in custom hardware design, we reformulate maximum matching in terms of Boolean operations rather than matrix computations and introduce three maximum matching implementations in hardware. Specifically, we examine a Pure Logic Scheduler with three dimensions of parallelism, a Matrix Scheduler with two dimensions of parallelism and a Vector Scheduler with one dimension of parallelism. These designs reduce the algorithmic complexity for an N×N

N \times N

network from O(N³)

O (N^{3})

to O(1)

O (1)

, O(K)

O (K)

, and O(KN)

O (K N)

, respectively, where K

K

is the number of optimization steps. While an optimal scheduling algorithm requires K=2N−1

K = 2 N - 1

steps, by starting with our hardware-based greedy strategy to generate an initial schedule, our simulation results show that the maximum matching scheduler can achieve 99% of the optimal schedule when K=9

K = 9

. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024

N = 1024

. Using FPGA synthesis results, we show that a greedy schedule for crossbars, ranging from 8×8 to 256×256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024×1024 the scheduling can be completed in approximately 10 μs with current technology and could reach under 90 ns with future technologies. 相似文献

15.

专用指令集处理器(ASIP)评估方法研究

余洁刘方方周学海《计算机工程与设计》2010,31(22)

针对专用指令集处理器(ASIP)评估具有多属性维数、多目标类型、多数据类型的特点,提出一种基于比较的评估方法.在评估的不同阶段,选取不同的参照指标,对数据进行处理、集结,从而获取候选方案的排序向量.根据评估目标类型和数据信息类型选取特定的参照数据,进行数据信息与参照数据的比较;利用模糊矩阵对评估指标的重要程度进行两两比较,并对其进行一致性判断和修正,获取指标权值向量;最后,利用有序加权平均法获取各候选方案的综合属性值,并提出区间比较度的概念,为评估提供量化依据.实例计算表明了该方法的有效性. 相似文献

16.

基于指令Cache作废的多核处理器同步技术

下载免费PDF全文

郭建军戴葵王志英《计算机工程与应用》2009,45(4):1-3

共享存储多核处理器中“忙-等待”技术常用来实现锁或栅栏等同步操作,这些典型的同步机制通常受限于较长的同步延迟和资源竞争等问题,导致扩展性较差,且需要不时进行访存操作,影响正常存储器访问操作,加剧对存储系统的带宽需求。提出了一种用于同步数据触发结构多核处理器的基于指令Cache作废的同步技术,同步时作废将执行的指令Cache行导致取指失效,向L2 Cache发送取指请求,L2 Cache中设置相应的过滤机制,不服务不满足同步条件的处理器核的取指请求,使相应处理器核暂停,达到同步目的。测试表明,该方法在可扩展性和同步性能方面均具有一定的优势。相似文献

17.

An optimal parallel processor bound in strong orientation of an undirected graph

Yung Hyang Tsin 《Information Processing Letters》1985,20(3):143-146

相似文献

18.

Application specific instruction set processor for sensor conditioning in automotive applications

《Microprocessors and Microsystems》2016

相似文献

19.

IS: An intelligent scheduler for batch manufacturing systems

Jui-chin Jiang 《Computers & Industrial Engineering》1991,21(1-4):319-323

This paper describes the design and development of IS (for Intelligent Scheduler), a true multiple criteria knowledge-based scheduler which can be used for operational level scheduling of batch manufacturing systems, sometimes called job shops. IS incorporates a heuristic algorithm coupled with two knowledge bases, one for job scheduling and the other for selecting a suitable schedule based on the user provided criterion or criteria. With fourteen dispatching rules, it can generate both static and dynamic schedules. IS is a far more realistic and sophisticated model, accounting for many important factors, such as multiple machines, multiple fixtures, multiple tools, alternate processing routes, machine setup time, machine processing time, due date, job arrival time, initial shop loading, hot jobs, and considering either one criterion or multiple criteria simultaneously. In addition, IS coded in C and has all the features of a modern professional quality interactive program. It has moving bar and pull down menus and an on-line help function, a friendly human-computer interface, and an intuitive and easy to understand representation of the schedules. 相似文献

20.

Self-stabilizing leader election in optimal space under an arbitrary scheduler

Ajoy K. Datta Lawrence L. LarmorePriyanka Vemula 《Theoretical computer science》2011,412(40):5541-5561

A silent self-stabilizing asynchronous distributed algorithm, SSLE, is given for the leader election problem in a connected unoriented (bidirectional) network with unique IDs. SSLE also constructs a BFS tree on the network rooted at that leader. SSLE uses O(logn) space per process and stabilizes in O(n) rounds, against the unfair daemon, where n is the number of processes in the network. 相似文献