期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance evaluation of a bus-based multistage multiprocessor architecture

《Journal of Systems Architecture》2000,46(1):39-47

This paper proposes and evaluates a class of interconnection networks, which provide performance comparable to a multiple bus network with considerably lower cost. These networks, referred to as hybrid networks, are formed by beginning with a multistage network and substituting buses in the second stage. Analytic models are developed to evaluate the performance of the system. The analysis includes both uniform and non-uniform distribution of requests. The results obtained are compared with simulation results. 相似文献

2.

HARTS: a distributed real-time architecture

Shin K.G. 《Computer》1991,24(5):25-35

The design, implementation, and evaluation of a distributed real-time architecture called HARTS (hexagonal architecture for real-time systems) are discussed, emphasizing its support of time-constrained, fault-tolerant communications and I/O (input/output) requirements. HARTS consists of shared-memory multiprocessor nodes, interconnected by a wrapped hexagonal mesh. This architecture is intended to meet three main requirements of real-time computing: high performance, high reliability, and extensive I/O. The high-level and low-level architecture is described. The evaluation of HARTS, using modeling and simulation with actual parameters derived from its implementation, is reported. Fault-tolerant routing, clock synchronization and the I/O architecture are examined 相似文献

3.

CPU-GPU融合架构上的缓存性能分析与优化

孙传伟安虹孙荪陈俊仕《计算机工程与应用》2017,53(2):47-52

现今CPU和GPU的发展已经出现新的瓶颈,将两者“结合”在同一块芯片上成为一种新的趋势。这种新的异构架构给片上共享资源的管理带来压力。而共享末级缓存（LLC）的管理对性能的影响非常关键。由于CPU程序和GPU程序的不同特性,给CPU和GPU间共享的末级缓存管理带来新的挑战。通过分析GPU程序访存特征,借鉴之前的缓存管理方案,提出对CPU-GPU融合系统的末级缓存进行等量的静态划分和最优静态划分的方案。实验结果表明：通过缓存划分可以有效避免CPU和GPU程序间的干扰。与传统LRU策略相比,等量静态划分和最优静态划分可以使系统整体性能分别提高7.68%和11.62%。相似文献

4.

Performance evaluation of a ACF-AMDF based pitch detection scheme in real-time

Sandeep Kumar Satish Kumar Singh S. Bhattacharya 《International Journal of Speech Technology》2015,18(4):521-527

相似文献

5.

Parallel polygon scan conversion algorithms: Performance evaluation on a shared bus architecture

《Computers & Graphics》1986,10(1):7-25

In this paper, three parallel polygon scan conversion algorithms have been proposed, and their performance when executed on a shared bus architecture has been compared. It has been shown that the parallel algorithm that does not use edge coherence performs better than those that use edge coherence. Further, a multiprocessing architecture has been proposed to execute the parallel polygon scan conversion algorithms more efficiently than a single shared bus architecture. 相似文献

6.

SVM-based real-time hyperspectral image classifier on a manycore architecture

《Journal of Systems Architecture》2017

This paper presents a study of the design space of a Support Vector Machine (SVM) classifier with a linear kernel running on a manycore MPPA (Massively Parallel Processor Array) platform. This architecture gathers 256 cores distributed in 16 clusters working in parallel. This study aims at implementing a real-time hyperspectral SVM classifier, where real-time is defined as the time required to capture a hyperspectral image. To do so, two aspects of the SVM classifier have been analyzed: the classification algorithm and the system parallelization. On the one hand, concerning the classification algorithm, first, the classification model has been optimized to fit into the MPPA structure and, secondly, a probability estimation stage has been included to refine the classification results. On the other hand, the system parallelization has been divided into two levels: first, the parallelism of the classification has been exploited taking advantage of the pixel-wise classification methodology supported by the SVM algorithm and, secondly, a double-buffer communication procedure has been implemented to parallelize the image transmission and the cluster classification stages. Experimenting with medical images, an average speedup of 9 has been obtained using a single-cluster and double-buffer implementation with 16 cores working in parallel. As a result, a system whose processing time linearly grows with the number of pixels composing the scene has been implemented. Specifically, only 3 µs are required to process each pixel within the captured scene independently from the spatial resolution of the image. 相似文献

7.

CABARET: rule interpretation in a hybrid architecture

《International journal of man-machine studies》1991,34(6):839-887

Rules often contain terms that are ambiguous, poorly defined or not defined at all. In order to interpret and apply rules containing such terms, appeal must be made to their previous constructions, as in the interpretation of legal statutes through relevant legal cases. We describe a system CABARET (CAse-BAsed REasoning Tool) that provides a domain-independent shell that integrates reasoning with rules and reasoning with previous cases in order to apply rules containing ill-defined terms. The integration of these two reasoning paradigms is performed via a collection of control heuristics, which suggest how to interleave case-based methods and rule-based methods to construct an argument to support a particular interpretation. CABARET is currently instantiated with cases and rules from an area of income tax law, the so-called “home office deduction”. An example of CABARET's processing of an actual tax case is provided in some detail. The advantages of CABARET's hybrid approach to interpretation stem from the synergy derived from interleaving case-based and rule-based tasks. 相似文献

8.

实时微处理器体系结构综述 总被引：1，自引：0，他引：1

下载免费PDF全文

石伟张明郭御风龚锐《计算机工程与科学》2015,37(5):857-864

实时应用已经成为嵌入式应用中一类快速崛起的典型应用。作为实时系统的核心部件,实时微处理器体系结构是微处理器领域的一个重要研究方向。与通用处理器追求最大吞吐量不同,实时处理器要求具有紧凑且可计算的最坏执行时间。传统的实时处理器往往采用较为简单的处理器结构,避免复杂结构引入执行时间的不确定性。随着实时应用对处理器性能需求越来越高,实时处理器正逐渐向多线程与多核结构发展。在多线程与多核处理器中,共享资源竞争导致实时系统的确定性变差,对实时处理器体系结构带来了更大挑战。对实时微处理器体系结构进行综述,首先从指令集、微体系结构、存储、I/O、任务调度等多个方面对传统实时处理器进行分析;然后分别对采用多线程与多核结构的高性能实时处理器展开分析;最后对几种商用实时处理器结构进行比较,总结实时处理器发展现状与未来发展趋势。相似文献

9.

Performance evaluation of switched Ethernet for real-time industrial communications 总被引：12，自引：0，他引：12

Kyung Chang Suk 《Computer Standards & Interfaces》2002,24(5):411-423

The real-time industrial network, often referred to as fieldbus, is an important element for building automated manufacturing systems. Thus, in order to satisfy the real-time requirements of field devices such as sensors, actuators, and controllers, numerous standard organizations and vendors have developed various fieldbus protocols. As a result, the IEC 61158 standard, including Profibus, WorldFIP, and Foundation Fieldbus, was recently announced as an international standard. These fieldbus protocols have an important advantage over the widely used Ethernet (IEEE 802.3) in terms of the deterministic characteristics. However, the application of fieldbus has been limited due to the high cost of hardware and the difficulty in interfacing with multivendor products. In order to solve these problems, the computer network technology, especially Ethernet, is being adopted by the industrial automation field. The key technical obstacle for Ethernet for industrial applications is that its nondeterministic behavior makes it inadequate for real-time applications, where the frames containing real-time information, such as control command and alarm signal, have to be delivered within a certain time limit. Recently, the development of switched Ethernet shows a very promising prospect for industrial applications due to the elimination of uncertainties in the network operation that leads to the dramatically improved performance. This paper focuses on the application of the switched Ethernet for industrial communications. More specifically, this paper presents the performance evaluation of the switched Ethernet on an experimental network testbed along with an implementation method for using the switched Ethernet for industrial automation. 相似文献

10.

Performance of symbolic applications on a parallel architecture

Adolfo Guzman Edward J. Krall Patrick F. McGehearty Nader Bagherzadeh 《International journal of parallel programming》1987,16(3):183-214

The results of a study of a family of parallel symbolic architectures executing several parallel applications are presented. The class of architectures being simulated is characterized by a shared memory structure, by a hierarchical interconnect, and by clustered processors. Speedup measurements were obtained from six different application kernels. Measurements were also performed to assess the degradation of speedup as a function of the interconnection delays, and to study the effect of different scheduling algorithms. The results presented support the claim that the proposed architecture would be a powerful parallel symbolic computation system. The paper discusses processor starvation, fine grain parallelism, unever loads, foreign reference, schedule and indeterminate computation with respect to the applications chosen.This work was completed within the Advanced Computer Architecture Program, Micro-electronics and Technology Computer Corporation, Austin, Texas. 相似文献

11.

A rule processing architecture based on distributed platform

CHEN Meng-dong YUAN Hao XIE Xiang-hui WU Dong 《计算机工程与科学》1990,42(1):18

相似文献

12.

一种基于分布式平台的规则处理架构

陈孟东原昊谢向辉吴东《计算机工程与科学》2020,42(1):18-24

采用字符串变换规则对字典进行变形变换是安全字符串恢复中的一种有效方法,然而,规则的处理过程复杂,现有的方式都是基于软件实现,针对处理性能、功耗等方面的现实需求,提出了一种基于分布式平台的规则处理架构,首次使用FPGA硬件来加速规则的处理过程,并通过将复杂的规则组合进行拆分,分布到并行结点上进一步加速规则的处理过程。在蚁群系统上的实验结果表明,采用该种架构的规则处理系统满足实际需求,性能和能效相比CPU和GPU都有显著提高,表明了该分布式规则处理架构的有效性。相似文献

13.

Performance evaluation of mesh-based NoCs: Implementation of a new architecture and routing algorithm

Sudhanshu Choudhary Shafi Qureshi 《国际自动化与计算杂志》2012,9(4):403-413

This paper presents the result of experiments conducted in mesh networks on different routing algorithms, traffic generation schemes and switching schemes. A new network on chip (NoC) topology based on partial interconnection of mesh network is proposed and a routing algorithm supporting the proposed architecture is developed. The proposed architecture is similar to standard mesh networks, where four extra bidirectional channels are added which remove the congestion and hotspots compared to standard mesh networks with fewer channels. Significant improvement in delay (60% reduction) and throughput (60% increase) was observed using the proposed network and routing when compared with the ideal mesh networks. An increase in number of channels makes the switches expensive and could increase the area and power consumption. However, the proposed network can be useful in high speed applications with some compromise on area and power. 相似文献

14.

Performance evaluation of the Cray X1 distributed shared-memory architecture

Dunigan T.H. Jr. Vetter J.S. White J.B. III Worley P.H. 《Micro, IEEE》2005,25(1):30-40

The Cray X1 supercomputer, introduced in 2002, has several interesting architectural features. Two key features are the X1's distributed shared memory and its vector multiprocessors. The Cray X1 supercomputer's distributed shared memory presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of 1 byte/flop. In this article, we characterize the performance of the X1's distributed shared-memory system and its interconnection network using microbench-marks and applications. 相似文献

15.

Performance evaluation of real-time decision-making architectures for computer-integrated manufacturing systems

Robert Y. Al-Jaar Alan A. Desrochers 《Robotics and Computer》1992,9(3):255-277

A general method is proposed for the performance evaluation of a decision-making architecture for computer-integrated manufacturing systems. A decision-making architecture broadens the concept of a control architecture by integrating control, communication and database functions. A modular modeling methodology is developed that captures these features and is applicable to an arbitrary computer-integrated manufacturing architecture. The model is based on generalized stochastic Petri nets and leads to a quantitative evaluation of such performance measures as response time, average utilization of a particular system component, average queue length, etc. The net result is a design tool that can be used to make tradeoffs among the system parameters.

The proposed technique is demonstrated using several real-time decision-making architectures. Several general conclusions are drawn from this investigation. Finally, a Petri net model reduction method is presented for this problem and used to compare the original performance evaluation results with those obtained from the simplified models. 相似文献

16.

基于改进正则表达式规则分组的内网行为审计方案

俞艺涵付钰吴晓平《计算机应用》2016,36(8):2241-2245

针对网络安全审计中对应用层协议审计能力不足的问题,提出一种基于改进正则表达式（RE）规则分组的内网行为审计方案。首先,通过正则表达式对需审计的协议进行描述,并设置相关参数,使内网中出现频率高和审计中相对重要的协议状态在正则表达式描述集中取得高优先级;然后,在正则表达式交互值小的前提下,尽可能地将高优先级协议状态表达式构建到相同自动机分组中以生成审计引擎;最后,根据审计需求,改变相关参数,实现对内网行为的安全审计。实验结果显示,所提出的自动机构建算法在转化时的状态数缩减为经典非确定有限状态自动机（NFA）转化算法Thompson的10%~20%,检测时的吞吐量约为传统自动机分组引擎的8到12倍;所提审计方案能够满足对应用层协议进行安全审计的需求,具有较高的准确性和效率。相似文献

17.

GUARDS: a generic upgradable architecture for real-time dependablesystems

Powell D. Arlat J. Beus-Dukic L. Bondavalli A. Coppola P. Fantechi A. Jenn E. Rabejac C. Wellings A. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(6):580-599

The development and validation of fault-tolerant computers for critical real-time applications are currently both costly and time consuming. Often, the underlying technology is out-of-date by the time the computers are ready for deployment. Obsolescence can become a chronic problem when the systems in which they are embedded have lifetimes of several decades. This paper gives an overview of the work carried out in a project that is tackling the issues of cost and rapid obsolescence by defining a generic fault-tolerant computer architecture based essentially on commercial off-the-shelf (COTS) components (both processor hardware boards and real-time operating systems). The architecture uses a limited number of specific, but generic, hardware and software components to implement an architecture that can be configured along three dimensions: redundant channels, redundant lanes, and integrity levels. The two dimensions of physical redundancy allow the definition of a wide variety of instances with different fault tolerance strategies. The integrity level dimension allows application components of different levels of criticality to coexist in the same instance. The paper describes the main concepts of the architecture, the supporting environments for development and validation, and the prototypes currently being implemented 相似文献

18.

A deliberative scheduling technique for a real-time agent architecture

《Engineering Applications of Artificial Intelligence》2006,19(5):521-534

In this paper, we present a heuristic to schedule complex task models (tasks that use Artificial Intelligence methods). These tasks are used in a real-time agent architecture called ARTIS. This architecture has been designed to build intelligent agents that work in hard real-time environments. To do this, the architecture provides scheduling at two levels. The first level assures the fulfilment of the hard temporal requirements, and the second level obtains a result of higher quality. The new heuristic, Slack-Slide Scheduling (SSS), works at the second level. It manages two types of methods: progressive refinement methods and multiple methods. The Slack-Slide Scheduling also attempts to reuse previous results in order to make better use of the existing CPU time while the first-level scheduler fulfils the deadlines. 相似文献

19.

Functional programming on a dataflow architecture: Applications in real-time image processing

Jocelyn Sérot Georges Quénot Bertrand Zavidovique 《Machine Vision and Applications》1993,7(1):44-56

This paper presents a dataflow functional computer (DFFC) developed at the Etablissement Technique Central de l'Armement (ETCA) and dedicated to real-time image processing. Two types of data-driven processing elements, dedicated respectively to low-level and mid-level processings are integrated in a regular 3D array. The design of the DFFC relies on a close integration of the dataflow-architecture principles and the functional programming concept. An image processing algorithm, expressed with a syntax similar to that of functional programming (FP) is first converted into a dataflow graph. The nodes of this graph are real-time operators that can be implemented on the physical processors of the dataflow machine. This dataflow graph is then mapped directly onto the processor array. The programming environment includes a complete compilation stream from the FP specification to hardware implementation, along with a global operator database. Apart from being a research tool for real-time image processing, the DFFC may also be used to perform the automatic synthesis of autonomous vision automata from a high-level functional specification. An experimental system, including 1024 lowlevel custom dataflow processors and 12 T800 transputers, was built and can perform up to 50 billion operations/s. Several image processing algorithms were implemented on this system and run in real-time at digital video speed. 相似文献

20.

Performance evaluation of two new disk scheduling algorithms for real-time systems

Shenze Chen John A. Stankovic James F. Kurose Don Towsley 《Real-Time Systems》1991,3(3):307-336

In this paper, we present two new disk scheduling algorithms for real-time systems. The two algorithms, called SSEDO (Shortest Seek and Earliest Deadline by Ordering) and SSEDV (Shortest Seek and Earliest Deadline by Value), combine deadline information and disk service time information in different ways. The basic idea behind these new algorithms is to give the disk I/O request with the earliest deadline a high priority, but if a request with a larger deadline is very close to the current disk arm position, then it may be assigned the highest priority. The performance of the SSEDO and SSEDV algorithms is compared with three other proposed real-time disk scheduling algorithms ED, P-SCAN, and FD-SCAN, as well as four conventional algorithms SSTF, SCAN, C-SCAN, and FCFS. An important aspect of the performance study is that the evaluation is not done in isolation with respect to the disk, but as part of an integrated collection of protocols necessary to support a real-time transaction system. The transaction system model is validated on an actual real-time transaction system testbed, called RT-CARAT. The performance results show that SSEDV outperforms SSEDO; that both of these new algorithms can improve performance of up to 38% over previously-known real-time disk scheduling algorithms; and that all of these real-time scheduling algorithms are significantly better than nonreal-time algorithms in the sense of minimizing the transaction loss ratio.This work is supported, in part, by the Office of Naval Research under contract N00014-87-K-796, by NSF under contract IRI-8908693, and by an NSF equipment grant CERDCR 8500332. 相似文献