首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Runahead执行技术能够显著地提高计算机系统的存储级并行,而无需对处理器结构做出较大改动。但Runahead执行处理器要比传统处理器多执行很多指令,最多是正常执行指令数的三倍以上,大大增加了处理器的功耗。本文通过分析发现Runahead执行在预执行阶段会执行大量的无效指令,据此提出一种减少无效指令的方法来提高Runa-head执行处理器的效率。通过实验分析,在性能影响较小的情况下,该方法最多可以减少50%的Runahead执行处理器在预执行阶段执行的无效指令。  相似文献   

2.
Rod Adams  Sue Gray 《Software》1995,25(9):1003-1020
Multiple-instruction-issue processors seek to improve performance over scalar RISC processors by providing multiple pipelined functional units in order to fetch, decode and execute several instructions per cycle. The process of identifying instructions which can be executed in parallel and distributing them between the available functional units is referred to as instruction scheduling. This paper describes a simple compile-time scheduling technique, called conditional compaction, which uses the concept of conditional execution to move instructions across basic block boundaries. It then presents the results of an investigation into the performance of the scheduling technique using C benchmark programs scheduled for machines with different functional unit configurations. This paper represents the culmination of our investigation into how much performance improvement can be obtained using conditional execution as the sole scheduling technique.  相似文献   

3.
《Computers & Education》2005,44(3):217-235
We developed a hybrid course format (part online, part face-to-face) to deliver a high-enrollment, introductory environmental biology course to resident (living on or near campus), non-science majors at a large, public university. The hybrid course was structured to include bi-weekly online assignments and weekly meetings in the lecture hall focused on active-learning exercises. To evaluate the effectiveness of the web-based component of the hybrid course, we taught the hybrid course simultaneously with a traditional course in which we used passive lectures to cover material in the online assignments. Both courses received the same active-learning activities in class. Students in the hybrid course reported that the quality of interaction with the instructor was high, that they read the text more often and studied in groups more frequently. Performance on a post-course assessment test indicated that the hybrid course format was better or equivalent to the traditional course. Specifically, online assignments were equivalent to or better than passive lectures, and that active-learning exercises were more effective when coupled with online activities. Performance gains were greater for upperclassmen than for freshmen, indicating that hybrid course formats might be a superior option for upperclassmen when satisfying general science requirements.  相似文献   

4.
Programming languages generally provide a ‘string’ or ‘text’ type to allow manipulation of sequences of characters. This type is usually of crucial importance, since it is normally mentioned in most interfaces between system components. We claim that the traditional implementations of strings, and often the supported functionality, are not well suited to such general-purpose use. They should be confined to applications with specific, and unusual, performance requirements. We present ‘ropes’ or ‘heavyweight’ strings as an alternative that, in our experience leads to systems that are more robust, both in functionality and in performance. Ropes have been in use in the Cedar environment almost since its inception, but this appears to be neither well-known, nor discussed in the literature. The algorithms have been gradually refined. We have also recently built a second similar, but somewhat lighter weight, C-language implementation, which is included in our publically released garbage collector distribution. We describe the algorithms used in both, and give some performance measurements for the C version.  相似文献   

5.
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects and easily allows various formulas to model execution and communication times of particular blocks of code. A simulator engine within the MERPSYS environment simulates execution of the application that consists of processes with various codes, to which distinct labels are assigned. The simulator runs one Java thread per label and scales computations and communication times adequately. This approach allows fast coarse-grained simulation of large applications on large-scale systems. We have performed tests and verification of results from the simulator for three real parallel applications implemented with C/MPI and run on real HPC clusters: a master-slave code computing similarity measures of points in a multidimensional space, a geometric single program multiple data parallel application with heat distribution and a divide-and-conquer application performing merge sort. In all cases the simulator gave results very similar to the real ones on configurations tested up to 1000 processes. Furthermore, it allowed us to make predictions of execution times on configurations beyond the hardware resources available to us.  相似文献   

6.
This article is an experience report about the application of a top-down strategy to use and embed an architecture reconstruction approach in the incremental software development process of the Philips MRI scanner, a representative large and complex software-intensive system. The approach is an iterative process to construct execution views without being overwhelmed by the system size and complexity. An execution view contains architectural information that describes what the software of a software-intensive system does at runtime and how it does this. The application of the strategy is illustrated with a case study, the construction of an up-to-date execution view for the start-up process of the Philips MRI scanner. The construction of this view helped the development organization to quickly reduce about 30% the start-up time of the scanner, and set up a new system benchmark for assuring the system performance through future evolution steps. The report provides detailed information about the application of the top-down strategy, including how it supports top-down analysis, communication within the development organization, and the aspects that influence the use of the top-down strategy in other contexts.  相似文献   

7.
This article is an experience report about the application of a top-down strategy to use and embed an architecture reconstruction approach in the incremental software development process of the Philips MRI scanner, a representative large and complex software-intensive system. The approach is an iterative process to construct execution views without being overwhelmed by the system size and complexity. An execution view contains architectural information that describes what the software of a software-intensive system does at runtime and how it does this. The application of the strategy is illustrated with a case study, the construction of an up-to-date execution view for the start-up process of the Philips MRI scanner. The construction of this view helped the development organization to quickly reduce about 30% the start-up time of the scanner, and set up a new system benchmark for assuring the system performance through future evolution steps. The report provides detailed information about the application of the top-down strategy, including how it supports top-down analysis, communication within the development organization, and the aspects that influence the use of the top-down strategy in other contexts.  相似文献   

8.
在现代微处理器中,指令缓存的Tag读取、比较消耗了指令缓存较大比例的能耗。提出一种基于推断的低能耗指令缓存:不对称指令缓存。根据跳转指令比例低的特点,在该结构中区别处理跳转指令和顺序指令,使用和数据不完全对应的简化标记管理位。该结构采用了命中推断和变长指令取指两种创新技术,其中基于命中推断技术解决了指令缓存命中时Tag比较过多的问题;使用变长指令取指技术提高了顺序指令块的命中率。实验结果表明,对于选取的SPEC2006测试程序,不对称指令缓存结构较常规L1指令Cache取指能耗下降了40%~60%,比无标记指令缓存结构TH IC能耗降低了9%;取指ED2P方面,较常规L1指令Cache优化约50%,比TH IC结构优化约17%。  相似文献   

9.
本文介绍了在Windows环境下,利用Windows的API函数,显示与游览多幅大尺寸彩色位图的几种方法。通过比较几个常用图象函数的运行结果,提出了提高程序运行速度的方法,并给出相关的源程序。  相似文献   

10.
Nearest neighbor search is a core process in many data mining algorithms. Finding reliable closest matches of a test instance is still a challenging task as the effectiveness of many general-purpose distance measures such as \(\ell _p\)-norm decreases as the number of dimensions increases. Their performances vary significantly in different data distributions. This is mainly because they compute the distance between two instances solely based on their geometric positions in the feature space, and data distribution has no influence on the distance measure. This paper presents a simple data-dependent general-purpose dissimilarity measure called ‘\(m_p\)-dissimilarity’. Rather than relying on geometric distance, it measures the dissimilarity between two instances as a probability mass in a region that encloses the two instances in every dimension. It deems two instances in a sparse region to be more similar than two instances of equal inter-point geometric distance in a dense region. Our empirical results in k-NN classification and content-based multimedia information retrieval tasks show that the proposed \(m_p\)-dissimilarity measure produces better task-specific performance than existing widely used general-purpose distance measures such as \(\ell _p\)-norm and cosine distance across a wide range of moderate- to high-dimensional data sets with continuous only, discrete only, and mixed attributes.  相似文献   

11.
Many devices with modern microprocessor have generated an increased attention for transient soft errors. Previous strategies for instruction level temporal redundancy in super-scalar out-of-order processors have up to 45% performance degradation in certain applications compared to normal execution. The reason is that the redundant workload slows down the normal execution. Solutions are proposed to avoid certain redundant execution by reusing the result of the previously executed instructions, but there are still limitations on the instruction level parallelism and the pipeline throughput. In this paper, we propose a novel technique to recover the performance gap between instruction level temporal redundancy and normal execution. We present a set of micro-architectural extensions to implement the reliability prediction and integrate it with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how it can solve the performance problem. Experiments show that in average it can gain back nearly 71.13% of the overall IPC loss caused by redundant execution. Generally, it exhibits much performance and power efficiency within a high transient error rate.  相似文献   

12.
As the width of the processor grows, complexity of a register file (RF) with multiple ports grows more than linearly and leads to larger register access time and higher power consumption. Analysis of SPEC2000 programs reveals that only a small portion of the instructions in a program (16% in integer and 38% in floating-point) require both the source operands. Also, when the programs are executed in an 8-wide processor only a very few (two or less) two-source instructions are executed in a cycle for a significant portion of time (more than 98% for integer and 93% for floating-point), leading to a significant under-utilization of register port bandwidth. In this paper, we propose a novel technique to significantly reduce the number of register ports, with a very minor modification in the select logic to issue only a limited number of two-source instructions each cycle. This is achieved with no significant impact on processor’s overall performance. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the register file, without aggravating these factors in any other logic on the chip. With this technique in an 8-wide processor, as compared to a conventional 128-entry RF with 16 read ports, for integer programs a register file can be designed with 11 or 10 read ports as these configurations result in instructions per cycle (IPC) degradation of only 0.929% and 3.38%, respectively. This significantly low degradation in IPC is achieved while reducing the register access time by 9% and 12%, respectively, and reducing power by 35% and 50%, respectively. For FP programs, a register file can be designed with 12 read ports (1.16% IPC loss, 8% less access time, and 28% less power) or with 11 read ports (3.5% IPC loss, 9% less access time, and 35% less power). The paper analyzes the performance of all the possible flavors of the proposed technique for register file in both 4-wide and 8-wide processors, and presents a choice of the performance and register port complexity combination to the designer.  相似文献   

13.
This paper describes an implemented, prototype system for a sophisticated, intelligent tutor for instruction in a foreign language. The system is an application of artificial intelligence research in natural language, but it implements several ideas that depart from standard approaches to natural language understanding.

For instance, the semantic analyzer diagnoses several kinds of comprehension problems and semantic errors that a student might make. Some fine distinctions in meaning are represented to detect misuse of words. Not only is a model of good syntax included in the tutor, but also a model of incorrect forms, rich enough to pinpoint specific syntactic mistakes. Finding the intended interpretation is complicated by the likelihood of student errors. Therefore, perfect syntactic form is not necessary for semantic analysis of the student's input.

The problems discussed and solutions presented are closely related to the more general problem of how to respond to a natural language input that surpasses the computer's model of language or of context.  相似文献   


14.
15.
In 1991 Morris proposed an effective screening sensitivity measure to identify the few important factors in models with many factors. The method is based on computing for each input a number of incremental ratios, namely elementary effects, which are then averaged to assess the overall importance of the input. Despite its value, the method is still rarely used and instead local analyses varying one factor at a time around a baseline point are usually employed.In this piece of work we propose a revised version of the elementary effects method, improved in terms of both the definition of the measure and the sampling strategy. In the present form the method shares many of the positive qualities of the variance-based techniques, having the advantage of a lower computational cost, as demonstrated by the analytical examples.The method is employed to assess the sensitivity of a chemical reaction model for dimethylsulphide (DMS), a gas involved in climate change. Results of the sensitivity analysis open up the ground for model reconsideration: some model components may need a more thorough modelling effort while some others may need to be simplified.  相似文献   

16.
This paper presents a new approach to the treatment of arrays in Pascal and Pascal-like languages. The proposed mechanisms allow arrays to be used flexibly in assignments and as parameters to procedures while at the same time preserving the benefits of strong type checking. Several examples to illustrate the mechanisms are presented. Their implementation is also described.  相似文献   

17.
Graphs are widely used for modeling complicated data such as social networks, bibliographical networks and knowledge bases. The growing sizes of graph databases motivate the crucial need for developing powerful and scalable graph-based query engines. We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language enables the expression of different types of graph queries that are of large interest in the databases that are modeled as large graph such as pattern matching, reachability and shortest path queries. Each query can combine both structural predicates and value-based predicates (on the attributes of the graph nodes/edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe an efficient hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph are stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database (using SQL) while the execution of other parts of the query plan is processed using memory-based algorithms, as necessary. Experimental results on real and synthetic datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.  相似文献   

18.
This paper presents a new solver for the exactly one-in-a-set Generalized Traveling Salesman Problem (GTSP). In the GTSP, we are given as input a complete directed graph with edge weights, along with a partition of the vertices into disjoint sets. The objective is to find a cycle (or tour) in the graph that visits each set exactly once and has minimum length. In this paper we present an effective algorithm for the GTSP based on adaptive large neighborhood search. The algorithm operates by repeatedly removing from, and inserting vertices in, the tour. We propose a general insertion mechanism that contains as special cases the well-known nearest, farthest and random insertion mechanisms. We provide extensive benchmarking results for our solver in comparison to the state-of-the-art on a wide range of existing and new problem libraries. We show that on the GTSP-LIB library, the proposed algorithm is competitive with the best known algorithms. On several other libraries, we show that given the same amount of time, the proposed solver finds higher quality solutions than existing approaches, particularly on harder instances that are non-metric and/or whose sets are not clustered.  相似文献   

19.
Both college level students and instructors alike have become frustrated with the modern textbook. Textbooks are expensive, passive in tone; water downed, and rarely has the breadth and depth that is needed for a course. Recently, Myles Johnson a former Psychology Instructor, has come up with an instructional method known as My Text that addresses these frustrations. Here students, with the instructor’s guidance, create their own textbooks. This paper takes a look at some of the Pedagogical merits of this method as a Learner Centered Approach and makes recommendations for future research.  相似文献   

20.
The prism machine is a stack of n cellular arrays, each of size 2n × 2n. Cell (i, j) on level k is connected to cells (i, j), (i + 2k, j), and (i, j + 2k) on level k + 1, 1 ≤ k < n, where the sums are modulo 2n. Such a machine can perform various operations (e.g., “Gaussian” convolutions or least-squares polynomial fits) on image neighborhoods of power-of-2 sizes in every position in O(n) time, unlike a pyramid machine which can do this only in sampled positions. It can also compute the discrete Fourier transform in O(n) time. It consists of n · 4n cells, while a pyramid consists of fewer than 4n+1/3 cells; but in practice n would be at most 10, so that a prism would be at most about seven times as large as a pyramid.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号