期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Approximate Arithmetic for Low-Power Image Median Filtering

M. Monajati S. M. Fakhraie E. Kabir 《Circuits, Systems, and Signal Processing》2015,34(10):3191-3219

相似文献

2.

A Low-Cost At-Speed BIST Architecture for Embedded Processor and SRAM Cores

M.H. Tehranipour S.M. Fakhraie Z. Navabi M.R. Movahedin 《Journal of Electronic Testing》2004,20(2):155-168

We have introduced a low-cost at-speed BIST architecture that enables conventional microprocessors and DSP cores to test their functional blocks and embedded SRAMs in system-on-a-chip architectures using their existing hardware and software resources. To accommodate our proposed new test methodology, minor modifications should be applied to base processor within its test phase. That is, we modify the controller to interpret some of the instructions differently only within the initial test mode. In this paper, we have proposed a fuctional self-test methodology that is deterministic in nature. In our proposed architecture, a self test program called BIST Program is stored in an embedded ROM as a vehicle for applying tests. We first start with testing processor core using our proposed architedture. Once the testing of the processor core is completed, this core is used to test the embedded SRAMs. A test algorithm which utilizes a mixture of existing memory testing techniques and covers all important memory faults is presented in this paper. The proposed memory test algorithm covers 100% of the faults under the fault model plus a data retention test. The hardware overhead in the proposed architecture is shown to be negligible. This architecture is implemented on UTS-DSP (University of Tehran and Iran Communicaton Industries (SAMA)) IC which has been designed in VLSI Circuits and Systems Laboratory. 相似文献

3.

Designing an ultra-high-speed multiply-accumulate structure

Fatemeh Kashfi S. Mehdi Fakhraie Saeed Safari 《Microelectronics Journal》2008,39(12):1476-1484

In this article, an ultra-high-speed multiply-accumulate (MAC) structure is proposed. This fused MAC block uses low-voltage-swing (LVS) technique in the utilized carry-save adders and the final adder to improve its speed. Carry-save adders and the final adder are implemented with pass-transistor-based Manchester-carry-chain logic. Sense amplifiers are used in the output nodes to amplify the LVS signals to the standard levels of zero and one. With this technique, we achieved the outstanding clock frequency of 15 GHz for a five-stage pipelined MAC, which is 87.5% higher than the highest speed achieved for a pipelined multiplier in 65 nm technology and above, with the power consumption of 25 mW/GHz in 1.2 V voltage supply. 相似文献

4.

An analytical method for reliability aware instruction set extension

Ali Azarpeyvand Mostafa E. Salehi Sied Mehdi Fakhraie 《The Journal of supercomputing》2014,67(1):104-130

Random variations and low reliability of nanometer new silicons are the most important concerns for the fault-tolerant design of large-area powerful integrated circuits. Logic faults in terms of soft errors or transient faults are now serious problems for embedded processing cores. Recently, augmenting an embedded processor with application specific custom instructions is widely used for improving the performance of a processor. Although area, power, and performance of an augmented processor have been considered for efficient custom instruction selection, its reliability consideration is much needed. This is impeding because this action needs exhaustive fault injection and lengthy and expensive simulations. This demand becomes more serious in the case of many-core, larger area and, therefore, more fault-prone integrated circuits, e.g., tera-computing processors. In this work, we propose an analytical modeling solution for such a demanding problem. First, a simple analytical method is introduced that can evaluate the vulnerability of a custom instruction in a time-saving manner. Using this method and our configurable custom instruction vulnerability analysis framework, the effects of type, order, and word length of various operations of different custom instruction subgraphs on the vulnerability of an extensible processor have been explored analytically and experimentally. Based on our results, for example, replacing orders of operators in custom functional units could yield different vulnerabilities to soft errors. Therefore, our approach enables designers to optionally constrain the operand types and also the custom functional unit structures to reach an acceptable vulnerability level at low computational and design time costs. 相似文献

5.

Adaptive fault-tolerant DVFS with dynamic online AVF prediction

Farshad Firouzi Ali Azarpeyvand Mostafa E. Salehi Sied Mehdi Fakhraie 《Microelectronics Reliability》2012,52(6):1197-1208

Advances in silicon technology and shrinking the feature size to nanometer levels make random variations and low reliability of nano-devices the most important concern for fault-tolerant design. Design of reliable and fault-tolerant embedded processors is mostly based on developing techniques that compensate reliability shortcomings by adding hardware or software redundancy. The recently-proposed redundancy adding techniques are generally applied uniformly to all parts of a system and lead to heavy overheads and inefficiencies in terms of performance, power, and area. Efficient employment of non-uniform redundancy becomes possible when a quantitative analysis of a system behavior while encountering transient faults is provided. In this work, we present a quantitative analysis of the behavior of an embedded processor regarding transient faults and propose a new approach that accurately predicts the architecture vulnerability factor (AVF) in real-time. Another critical concern in design of new-silicon processors is power consumption issue. Dynamic voltage and frequency scaling (DVFS) is an effective method for controlling both energy consumption and performance of a system. Since rate of radiation-induced transient faults depends on operating frequency and supply voltage, DVFS techniques are recently shown to have compromising effects on electronic system reliability. Therefore, ignoring the effects of voltage scaling on fault rate could considerably degrade the system reliability. Here, by exploiting the proposed online AVF prediction methodology and based on analytic derivation, we propose a reliability-aware adaptive dynamic voltage and frequency scaling (DVFS) approach in case study of Multi-Processor System on Chip (MPSoC) with Multiple Clock Domain (MCD) pipeline architectures in which the frequency and voltage are scaled by simultaneously considering all three of power consumption, reliability, and performance. Comparing to the traditional methods of reliability-aware DVFS systems, the proposed reliability-aware DVFS method yields 50% better power saving at the same reliability level. 相似文献

6.

Instruction set architectural guidelines for embedded packet-processing engines

Mostafa E. Salehi Sied Mehdi Fakhraie Amir Yazdanbakhsh 《Journal of Systems Architecture》2012,58(3-4):112-125

This paper presents instruction set architectural guidelines for improving general-purpose embedded processors to optimally accommodate packet-processing applications. Similar to other embedded processors such as media processors, packet-processing engines are deployed in embedded applications, where cost and power are as important as performance. In this domain, the growing demands for higher bandwidth and performance besides the ongoing development of new networking protocols and applications call for flexible power- and performance-optimized engines.The instruction set architectural guidelines are extracted from an exhaustive simulation-based profile-driven quantitative analysis of different packet-processing workloads on 32-bit versions of two well-known general-purpose processors, ARM and MIPS. This extensive study has revealed the main performance challenges and tradeoffs in development of evolution path for survival of such general-purpose processors with optimum accommodation of packet-processing functions for future switching-intensive applications. Architectural guidelines include types of instructions, branch offset size, displacement and immediate addressing modes for memory access along with the effective size of these fields, data types of memory operations, and also new branch instructions.The effectiveness of the proposed guidelines is evaluated with the development of a retargetable compilation and simulation framework. Developing the HDL model of the optimized base processor for networking applications and using a logic synthesis tool, we show that enhanced area, power, delay, and power per watt measures are achieved. 相似文献

7.

Reliability aware throughput management of chip multi-processor architecture via thread migration

Fatemeh Pouyan Ali Azarpeyvand Saeed Safari Sied Mehdi Fakhraie 《The Journal of supercomputing》2016,72(4):1363-1380

相似文献

8.

Customized pipeline and instruction set architecture for embedded processing engines

Amir Yazdanbakhsh Mostafa E. Salehi Sied Mehdi Fakhraie 《The Journal of supercomputing》2014,68(2):948-977

Custom instructions potentially improve execution speed and code compression of embedded applications. However, more efficient custom instructions need higher number of simultaneous registerfile accesses. Larger registerfiles are more power hungry with complex forwarding interconnects. Therefore, due to the limited ports of the base processor registerfile, size and efficiency of custom instructions could be generally limited. Recent researches have focused on overcoming this limitation by some innovative architectural techniques supplemented with customized compilations. However, to the best of our knowledge there are few researches that take into account the complete pipeline design and implementation considerations. This paper proposes a customized instruction set and pipeline architecture for an optimized embedded engine. The proposed architecture increases the performance by enhancing the available registerfile data bandwidth through register access pipelining. The achieved improvements are made by introducing double-word custom instructions whose registerfile accesses are overlapped in the pipeline. Potential hazards in such instructions are resolved by the introduced pipeline backwarding concept, yielding higher performance and code compression. While we study the effectiveness of the proposed architecture on domain-specific workloads from packet-processing benchmarks, the developed framework and architecture are applicable to other embedded application domains. 相似文献

9.

Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization

Amin Farmahini-Farahani Shervin Vakili Sied Mehdi Fakhraie Saeed Safari Caro Lucas 《Engineering Applications of Artificial Intelligence》2010,23(2):177-187

This paper presents a novel hardware framework of particle swarm optimization (PSO) for various kinds of discrete optimization problems based on the system-on-a-programmable-chip (SOPC) concept. PSO is a new optimization algorithm with a growing field of applications. Nevertheless, similar to the other evolutionary algorithms, PSO is generally a computationally intensive method which suffers from long execution time. Hence, it is difficult to use PSO in real-time applications in which reaching a proper solution in a limited time is essential. SOPC offers a platform to effectively design flexible systems with a high degree of complexity. A hardware pipelined PSO (PPSO) Core is applied with which the required computational operations of the algorithm are performed. Embedded processors have also been employed to evaluate the fitness values by running programmed software codes. Applying the subparticle method brings the benefit of full scalability to the framework and makes it independent of the particle length. Therefore, more complex and larger problems can be addressed without modifying the architecture of the framework. To speed up the computations, the optimization architecture is implemented on a single chip master–slave multiprocessor structure. Moreover, the asynchronous model of PSO gains parallel efficacy and provides an approach to update particles continuously. Five benchmarks are exploited to evaluate the effectiveness and robustness of the system. The results indicate a speed-up of up to 98 times over the software implementation in the elapsed computation time. Besides, the PPSO Core has been employed for neural network training in an SOPC-based embedded system which approves the system applicability for real-world applications. 相似文献

10.

Scalable closed-boundary analog neural networks

Fakhraie S.M. Farshbaf H. Smith K.C. 《Neural Networks, IEEE Transactions on》2004,15(2):492-504

In many pattern-classification and recognition problems, separation of different swarms of class representatives is necessary. As well, in function-approximation problems, neurons with a local area of influence have demonstrated measurable success. In our previous work, we have shown how intrinsic quadratic characteristics of traditional metal-oxide-semiconductor (MOS) devices can be used to implement hyperspherical discriminating surfaces in hardware-implemented neurons. In this work, we further extend the concept from quadratic forms to more-arbitrary closed-boundary shapes. Accordingly, we demonstrate how intrinsic characteristics of submicron MOS devices can be utilized to implement efficient pattern discriminators for various applications and, through representative simulations, show their success in some typical function-approximation problems. Further, we offer two mathematical interpretations of possible roles for these networks: Geometrically, we show that our networks employ closed hypercone shapes as their discriminating surfaces; analytically, we show that a set of these synapses connected to a common integrating body calculates the distance between their inputs and weight vectors using a power norm. The feasibility of the idea is practically investigated by design, implementation, and test of a three-dimensional (3-D) closed-boundary pattern classifier, fabricated in 0.35-/spl mu/m complimentary MOS, whose results are reflected in this work. 相似文献