共查询到20条相似文献,搜索用时 62 毫秒
1.
A procedure is described to prepare an exam schedule which meets the following goals: (1) No student should have two exams at the same time. (2) No student should take more than two exams in a day. (3) The number of exam days should be a minimum. (4) The total number of students having two exams in a day is minimum. (5) The number of students having two consecutive exams in a day is a minimum. 相似文献
2.
Escobar-Molano M.L. Gandeharizadeh S. Ierardi D. 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(3):508-511
A structured video consists of a collection of background objects, characters, spatial and temporal constructs, and rendering features. Assuming a platform consisting of a fixed amount of memory and a magnetic disk drive, this study presents a resource scheduler for the continuous display of structured video that minimizes both the latency observed by a display and its required amount of memory 相似文献
3.
同时多线程(SMT)是一种允许多个独立的线程每周期发射多条指令的技术,这种技术充分利用了可能存在的指令级并行和线程级并行,提高了有限资源的利用率。文章以西北工业大学航空微电子中心自主研发的32位超标量处理器“龙腾R2”为基础,引入SMT技术,在基本不改变内部结构大小、不增加执行功能部件、仅做一些必要修改的前提条件下进行研究。通过仿真不同的线程数和各种线程组合,进行性能分析。尽管存在制约性能提升的一些因素,引入SMT技术后依然获得了最高约50%的性能增加。 相似文献
4.
5.
Trace-driven simulation of out-of-order superscalar processors is far from straightforward. The dynamic nature of out-of-order superscalar processors combined with the static nature of traces can lead to large inaccuracies in the results when the traces contain only a subset of executed instructions for trace reduction. In this paper, we describe and comprehensively evaluate the pairwise dependent cache miss model (PDCM), a framework for fast and accurate trace-driven simulation of out-of-order superscalar processors. The model determines how to treat a cache miss with respect to other cache misses recorded in the trace by dynamically reconstructing the reorder buffer state during simulation and honoring the dependencies between the trace items. Our experimental results demonstrate that a PDCM-based simulator produces highly accurate simulation results (less than 3% error) with fast simulation speeds (62.5× on average) compared with an execution-driven simulator. Moreover, we observed that the proposed simulation method is capable of preserving a processor’s dynamic off-core memory access behavior and accurately predicting the relative performance change when a processor’s low-level memory hierarchy parameters are changed. 相似文献
6.
The V830R/AV's real-time decoding of MPEG-2 video and audio data enables practical embedded-processor-based multimedia systems 相似文献
7.
Coarse-grained reconfigurable platforms are good for parallel data-intensive applications but inefficient for sequential control-dominated code. This article explores the integration of the general purpose Sparc-compliant Leon processor with the Extreme Processing Platform reconfigurable data path. The integration's goal is to optimize the execution of complex multimedia applications such as MPEG-4. 相似文献
8.
This article presents the design and implementation of an air-crew assignment system, for producing and refining a solution to this problem, based on the artificial intelligence principles and techniques of abductive reasoning as captured by the framework of abductive logic programming (ALP). The system offers a high level of flexibility in addressing both the tasks of crew scheduling and rescheduling. Itcan be used to generate a valid and good quality initial solution and then help the human operators adjust and refine further this solution in order to meet extra requirements of the problem. These additional needs can arise either due to new foreseen requirements that the company wants to have or experiment with for a particular period in time, or due to unexpected events that have occurred while the solution (crew-roster) is in operation. This work shows the ability and flexibility of abduction, and, more specifically, of ALP, in tackling problems of this type with complex and changing requirements. 相似文献
9.
In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties. Our statistics show that about 28% of branches are dependent on critical Load instructions. Moreover, 5.61% of branches are unbiased and depend on critical Loads, too. In the same way, about 21% of branches depend on MUL/DIV instructions whereas 3.76% are unbiased and depend on MUL/DIV instructions. These dependences involve high-penalty mispredictions becoming serious performance obstacles and causing significant performance degradation in executing instructions from wrong paths. Therefore, the negative impact of (unbiased) branches over global performance should be seriously attenuated by anticipating the results of long-latency instructions, including critical Loads. On the other hand, hiding instructions’ long latencies in a pipelined superscalar processor represents an important challenge itself. We developed a superscalar architecture that selectively anticipates the values produced by high-latency instructions. In this work we are focusing on multiply, division and loads with miss in L1 data cache, implementing a dynamic instruction reuse scheme for the MUL/DIV instructions and a simple last value predictor for the critical Load instructions. Our improved superscalar architecture achieves an average IPC speedup of 3.5% on the integer SPEC 2000 benchmarks, of 23.6% on the floating-point benchmarks, and an improvement in energy-delay product (EDP) of 6.2% and 34.5%, respectively. We also quantified the impact of our developed selective instruction reuse and value prediction techniques in a simultaneous multithreaded architecture (SMT) that implies per thread reuse buffers and load value prediction tables. Our simulation results showed that the best improvements on the SPEC integer applications have been obtained with 2 threads: an IPC speedup of 5.95% and an EDP gain of 10.44%. Although, on the SPEC floating-point programs, we obtained the highest improvements with the enhanced superscalar architecture, the SMT with 3 threads also provides an important IPC speedup of 16.51% and an EDP gain of 25.94%. 相似文献
10.
11.
An implementable parallel scheduler for input-queued switches 总被引:1,自引:0,他引:1
The Apsara algorithm is an input-queued switch scheduler that uses limited parallelism to find a matching in a single iteration, as compared to the O(N3) iterations of the more common maximum-weight matching algorithm. The Apsara algorithm also achieves a throughput of up to 100 percent and has very good delay properties 相似文献
12.
Variable assignment and instruction scheduling for processor with multi-module memory 总被引:1,自引:0,他引:1
Lei ZhangAuthor VitaeMeikang QiuAuthor Vitae Edwin H.-M. ShaAuthor Vitae Qingfeng ZhugeAuthor Vitae 《Microprocessors and Microsystems》2011,35(3):308-317
Multi-module memory has been employed in high-end digital signal processing system (DSP). It provides high memory bandwidth and low power operating mode for energy savings. However, making full use of these architectural features is a challenging problem for code optimization. In this paper, we propose an integer linear programming (ILP) model to optimize the performance and energy consumption of multi-module memories by solving variable assignment, instruction scheduling and operating mode setting problems simultaneously. The combined effect of performance and energy saving requirements has been considered as well. Specially, we develop two optimization techniques to improve the computation efficiency of our ILP model. The experimental results show that the optimal performance and energy solution can be achieved within a reasonable amount of time. 相似文献
13.
M. K. Stojčev E. I. Milovanović I. Ž. Milovanović 《International journal of parallel programming》1994,22(4):435-448
This paper presents a parallel algorithm for computing the inversion of a dense matrix based on modified Jordan's elimination
which requires fewer calculation steps than the standard one. The algorithm is proposed for the implementation on the linear
array with a small to moderate number of processors which operate in a parallel-pipeline fashion. A communication between
neighboring processors is achieved by a common memory module implemented as a FIFO memory module. For the proposed algorithm
we define a task scheduling procedure and prove that it is time optimal. In order to compute the speedup and efficiency of
the system, two definitions (Amdahl's and Gustafson's) were used. For the proposed architecture, involving two to 16 processors,
estimated Gustafson's (Amdahl's) speedups are in the range 1.99 to 13.76 (1.99 to 9.69). 相似文献
14.
Greedy scheduling heuristics provide a low complexity and scalable albeit particularly sub-optimal strategy for hardware-based crossbar schedulers. In contrast, the maximum matching algorithm for Bipartite graphs can be used to provide optimal scheduling for crossbar-based interconnection networks with a significant complexity and scalability cost. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By leveraging the inherent parallelism available in custom hardware design, we reformulate maximum matching in terms of Boolean operations rather than matrix computations and introduce three maximum matching implementations in hardware. Specifically, we examine a Pure Logic Scheduler with three dimensions of parallelism, a Matrix Scheduler with two dimensions of parallelism and a Vector Scheduler with one dimension of parallelism. These designs reduce the algorithmic complexity for an N×N network from O(N3) to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N−1 steps, by starting with our hardware-based greedy strategy to generate an initial schedule, our simulation results show that the maximum matching scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for crossbars, ranging from 8×8 to 256×256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024×1024 the scheduling can be completed in approximately 10 μs with current technology and could reach under 90 ns with future technologies. 相似文献
15.
针对专用指令集处理器(ASIP)评估具有多属性维数、多目标类型、多数据类型的特点,提出一种基于比较的评估方法.在评估的不同阶段,选取不同的参照指标,对数据进行处理、集结,从而获取候选方案的排序向量.根据评估目标类型和数据信息类型选取特定的参照数据,进行数据信息与参照数据的比较;利用模糊矩阵对评估指标的重要程度进行两两比较,并对其进行一致性判断和修正,获取指标权值向量;最后,利用有序加权平均法获取各候选方案的综合属性值,并提出区间比较度的概念,为评估提供量化依据.实例计算表明了该方法的有效性. 相似文献
16.
共享存储多核处理器中“忙-等待”技术常用来实现锁或栅栏等同步操作,这些典型的同步机制通常受限于较长的同步延迟和资源竞争等问题,导致扩展性较差,且需要不时进行访存操作,影响正常存储器访问操作,加剧对存储系统的带宽需求。提出了一种用于同步数据触发结构多核处理器的基于指令Cache作废的同步技术,同步时作废将执行的指令Cache行导致取指失效,向L2 Cache发送取指请求,L2 Cache中设置相应的过滤机制,不服务不满足同步条件的处理器核的取指请求,使相应处理器核暂停,达到同步目的。测试表明,该方法在可扩展性和同步性能方面均具有一定的优势。 相似文献
17.
18.
19.
Jui-chin Jiang 《Computers & Industrial Engineering》1991,21(1-4):319-323
This paper describes the design and development of IS (for Intelligent Scheduler), a true multiple criteria knowledge-based scheduler which can be used for operational level scheduling of batch manufacturing systems, sometimes called job shops. IS incorporates a heuristic algorithm coupled with two knowledge bases, one for job scheduling and the other for selecting a suitable schedule based on the user provided criterion or criteria. With fourteen dispatching rules, it can generate both static and dynamic schedules. IS is a far more realistic and sophisticated model, accounting for many important factors, such as multiple machines, multiple fixtures, multiple tools, alternate processing routes, machine setup time, machine processing time, due date, job arrival time, initial shop loading, hot jobs, and considering either one criterion or multiple criteria simultaneously. In addition, IS coded in C and has all the features of a modern professional quality interactive program. It has moving bar and pull down menus and an on-line help function, a friendly human-computer interface, and an intuitive and easy to understand representation of the schedules. 相似文献
20.
Ajoy K. Datta Lawrence L. LarmorePriyanka Vemula 《Theoretical computer science》2011,412(40):5541-5561
A silent self-stabilizing asynchronous distributed algorithm, SSLE, is given for the leader election problem in a connected unoriented (bidirectional) network with unique IDs. SSLE also constructs a BFS tree on the network rooted at that leader. SSLE uses O(logn) space per process and stabilizes in O(n) rounds, against the unfair daemon, where n is the number of processes in the network. 相似文献