首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Leakage-Aware Multiprocessor Scheduling   总被引:2,自引:0,他引:2  
When peak performance is unnecessary, Dynamic Voltage Scaling (DVS) can be used to reduce the dynamic power consumption of embedded multiprocessors. In future technologies, however, static power consumption due to leakage current is expected to increase significantly. Then it will be more effective to limit the number of processors employed (i.e., turn some of them off), or to use a combination of DVS and processor shutdown. In this paper, leakage-aware scheduling heuristics are presented that determine the best trade-off between these three techniques: DVS, processor shutdown, and finding the optimal number of processors. Experimental results obtained using a public benchmark set of task graphs and real parallel applications show that our approach reduces the total energy consumption by up to 46% for tight deadlines (1.5× the critical path length) and by up to 73% for loose deadlines (8× the critical path length) compared to an approach that only employs DVS. We also compare the energy consumed by our scheduling algorithms to two absolute lower bounds, one for the case where all processors continuously run at the same frequency, and one for the case where the processors can run at different frequencies and these frequencies may change over time. The results show that the energy reduction achieved by our best approach is close to these theoretical limits.
Ben JuurlinkEmail:
  相似文献   

2.
This paper studies how to parallelize the emerging media mining workloads on existing small-scale multi-core processors and future large-scale platforms. Media mining is an emerging technology to extract meaningful knowledge from large amounts of multimedia data, aiming at helping end users search, browse, and manage multimedia data. Many of the media mining applications are very complicated and require a huge amount of computing power. The advent of multi-core architectures provides the acceleration opportunity for media mining. However, to efficiently utilize the multi-core processors, we must effectively execute many threads at the same time. In this paper, we present how to explore the multi-core processors to speed up the computation-intensive media mining applications. We first parallelize two media mining applications by extracting the coarse-grained parallelism and evaluate their parallel speedups on a small-scale multi-core system. Our experiment shows that the coarse-grained parallelization achieves good scaling performance, but not perfect. When examining the memory requirements, we find that these coarse-grained parallelized workloads expose high memory demand. Their working set sizes increase almost linearly with the degree of parallelism, and the instantaneous memory bandwidth usage prevents them from perfect scalability on the 8-core machine. To avoid the memory bandwidth bottleneck, we turn to exploit the fine-grained parallelism and evaluate the parallel performance on the 8-core machine and a simulated 64-core processor. Experimental data show that the fine-grained parallelization demonstrates much lower memory requirements than the coarse-grained one, but exhibits significant read-write data sharing behavior. Therefore, the expensive inter-thread communication limits the parallel speedup on the 8-core machine, while excellent speedup is observed on the large-scale processor as fast core-to-core communication is provided via a shared cache. Our study suggests that (1) extracting the coarse-grained parallelism scales well on small-scale platforms, but poorly on large-scale system; (2) exploiting the fine-grained parallelism is suitable to realize the power of large-scale platforms; (3) future many-core chips can provide shared cache and sufficient on-chip interconnect bandwidth to enable efficient inter-core communication for applications with significant amounts of shared data. In short, this work demonstrates proper parallelization techniques are critical to the performance of multi-core processors. We also demonstrate that one of the important factors in parallelization is the performance analysis. The parallelization principles, practice, and performance analysis methodology presented in this paper are also useful for everyone to exploit the thread-level parallelism in their applications.
Wenlong LiEmail:
  相似文献   

3.
Dynamic Voltage Scaling (DVS) is a promising method to achieve energy saving by slowing down the processor into multiple frequency levels in battery-operated embedded systems. However, the worst case execution time (WCET) of the tasks scheduled by DVS must be known ahead of time to ensure their schedulability. In reality, a system’s workloads may change significantly without satisfying any prediction. In other words, a task’s WCET may not provide useful information about its future real execution time (RET). This paper presents a novel Dynamic-Mode EDF scheduling algorithm when workloads change significantly. One of the Single-Mode, Dual-Mode, and Three-Mode frequency setting formats can be applied, based on the RET and the accumulated slack at run-time. Only one combination of the number of modes/speeds, speed-switching transition points, and the frequency scaling factor for each mode can lead to the best energy saving. Experimental results show that, given an RET pattern, our Dynamic-Mode DVS algorithm achieves an average 15% energy savings over the traditional two-mode DVS scheme on hard real-time systems. Additionally, we also consider speed-switching or energy transition overhead, and implement a preliminary test of our proposed algorithm. With a less aggressive voltage scaling strategy (fewer speed changes for each job), deadlines can still be strictly satisfied and an average of 14% energy consumption saving over a non-DVS scheme is observed.
Albert Mo Kim ChengEmail:
  相似文献   

4.
Vector digital signal processors (DSPs) offer a good performance to power consumption ratio. Therefore, they are suitable for mobile devices in software defined radio applications. These vector DSPs require input algorithms with vector operations. The performance of vectorized algorithms to a great extent depends on the distribution of data on vector elements. Traditional algorithms for vectorization focus on the extraction of parallelism from a program; we propose an analysis tool that focuses on the selection of an efficient dynamic data mapping for vector DSPs. We transferred Garcia’s communication parallelism graph (Garcia et al., IEEE Trans Parallel Distrib Syst 12: 416–431, 2001) for distributed memory multiprocessor systems to vector DSPs. By alternating the representation of two-dimensional data distributions and the cost models, we are able to determine a dynamic mapping of data on vector elements on the Embedded Vector Processor (EVP) (van Berkel et al., Proceedings of the 2004 software-defined radio technical conference SDR’04, 2004). Additionally, we propose a new efficient algorithm for processing the graph representation that operates in two steps. We demonstrate the capabilities of our tool by describing the vectorization of some MIMO OFDM algorithms.
Andre KaufmannEmail:
  相似文献   

5.
This paper presents a flexible controller structure for concurrent processing of memory centric coarse grain data flows. We propose a design flow based on block level pipelining where concurrency among processing blocks is fully maintained. The controller is dynamically reconfigurable to support dynamic data-flow structure changes by localizing control signals. The proposed control design method isolates controllers and processing logics such that system integration is simplified while controllers are locally configured from orthogonal global information. The controller also supports interfacing with external processors for asynchronous processing. The controller for heterogeneous processing blocks is synthesized and verified using Verilog and SystemC on FPGA. We present an example demonstrating the effectiveness of the controllers where dynamic reconfiguration of the execution is feasible.
Sangjin HongEmail:
  相似文献   

6.
Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) is a templatized coarse-grained reconfigurable processor architecture. It targets at embedded applications which demand high-performance, low-power and high-level language programmability. Compared with typical very long instruction word-based digital signal processor, ADRES can exploit higher parallelism by using more scalable hardware with support of novel compilation techniques. We developed a complete tool-chain, including compiler, simulator and HDL generator. This paper describes the design case of a media processor targeting at H.264 decoder and other video tasks based on the ADRES template. The whole processor design, hardware implementaiton and application mapping are done in a relative short period. Yet we obtain C-programmed real-time H.264/AVC CIF decoding at 50 MHz. The die size, clock speed and the power consumption are also very competitive compared with other processors.
S. DupontEmail:
  相似文献   

7.
Dynamic Voltage Scaling (DVS) is one of the techniques used to obtain energy-saving in real-time DSP systems. In many DSP systems, some tasks contain conditional instructions that have different execution times for different inputs. Due to the uncertainties in execution time of these tasks, this paper models each varied execution time as a probabilistic random variable and solves the Voltage Assignment with Probability (VAP) Problem. VAP problem involves finding a voltage level to be used for each node of an date flow graph (DFG) in uniprocessor and multiprocessor DSP systems. This paper proposes two optimal algorithms, one for uniprocessor and one for multiprocessor DSP systems, to minimize the expected total energy consumption while satisfying the timing constraint with a guaranteed confidence probability. The experimental results show that our approach achieves significant energy saving than previous work. For example, our algorithm for multiprocessor achieves an average improvement of 56.1% on total energy-saving with 0.80 probability satisfying timing constraint.
Edwin H.-M. ShaEmail:
  相似文献   

8.
This paper describes a framework for fixed- length frame scheduling in all-photonic networks with large propagation delays. We introduce the Fair Matching Algorithm a novel scheduling approach that results in weighted max-min fair allocation of extra slots, achieves zero rejection for admissible demands, and minimizes the maximum percentage rejection of any connection. We also propose the Minimum Rejection Algorithm, which minimizes total rejection but treats non-critical connections in a fair manner. Finally, we introduce a feedback control system based on Smith’s principle that reduces the effect of prediction errors and increases the speed of the response to the sudden changes in traffic arrival rates. Simulations performed using OPNET Modeler explore the performance of the scheduling and control algorithms we propose.
M. J. CoatesEmail:
  相似文献   

9.
10.
As technology scales, transient faults have emerged as a key challenge for reliable embedded system design. This paper proposes a design methodology that incorporates reliability into hardware–software co-design paradigm for embedded systems. We introduce an allocation and scheduling algorithm that efficiently handles conditional execution in multi-rate embedded systems, and selectively duplicates critical tasks to detect or correct transient errors, such that the reliability of the system is improved. Two methods are proposed to insert duplicated tasks into the schedule. The improved reliability is achieved by utilizing the otherwise idle computation resources and taking advantage of the overlapping schedule for mutually exclusive tasks in the conditional task graph, such that it incurs no resource or performance penalty.
M. J. IrwinEmail:
  相似文献   

11.
In this paper, we consider a joint packet scheduling algorithm for wireless networks and investigate its characteristics. The joint scheduling algorithm is a combination of the Knopp and Humblet (KH) scheduling, which fully exploits multiuser diversity, and the probabilistic weighted round-robin (WRR) scheduling, which does not use multiuser diversity at all. Under the assumption that the wireless channel process for each user is described by the Nakagami-m model, we develop a formula to estimate the tail distribution of the packet delay for an arbitrary user under the joint scheduling. Numerical results exhibit that under the joint scheduling, the ratio of the number of slots assigned for the WRR scheduling to that for the KH scheduling dominates the characteristics of the delay performance.
Gang Uk HwangEmail:
  相似文献   

12.
Range-Based Sleep Scheduling (RBSS) for Wireless Sensor Networks   总被引:3,自引:0,他引:3  
Sleep scheduling in a wireless sensor network is the process of deciding which nodes are eligible to sleep (enter power-saving mode) after random deployment to conserve energy while retaining network coverage. Most existing approaches toward this problem require sensor’s location information, which may be impractical considering costly locating overheads. This paper proposes range-based sleep scheduling (RBSS) protocol which needs sensor-to-sensor distance but no location information. RBSS attempts to approach an optimal sensor selection pattern that demands the fewest working (awake) sensors. Simulation results indicate that RBSS is comparable to its location-based counterpart in terms of coverage quality and the reduction of working sensors.
Yang-Min ChengEmail:
  相似文献   

13.
14.
Currently optical networks have been employed to meet the ever-increasing data transfer demands of grid applications and thus give rise to the concept of an “optical grid”. Task scheduling is an important issue for an optical grid, for it optimally allocates both grid and optical network resources to accelerate application execution and increase the resource utilization ratio. However, most task scheduling algorithms based on theoretical models may generate accuracy deviations between the scheduled results and the actual finish time of the applications. Accuracy deviations may lead to inefficient resources utilization and unsatisfied Quality of Service (QoS). This paper aims to improve the accuracy of task scheduling algorithms in optical grid environments. We first propose the theoretical task scheduling algorithm and demonstrate that the scheduling result is deviated with actual finish time in the real optical grid environment. Then, we reveal several factors which are likely to influence scheduling accuracy and develop a realistic task scheduling algorithm. We evaluate the theoretical and realistic task scheduling algorithms in our optical grid testbed. The experimental result shows the scheduling accuracy can be improved significantly by the realistic task scheduling algorithm.
Wei GuoEmail:
  相似文献   

15.
Besides energy constraint, wireless sensor networks should also be able to provide bounded communication delay when they are used to support real-time applications. In this paper, a new routing metric is proposed. It takes into account both energy and delay constraints. It can be used in AODV. By mathematical analysis and simulations, we have shown the efficiency of this new routing metric.
YeQiong SongEmail:
  相似文献   

16.
We study the problem of assigning clusterheads in a hierarchical Wireless Sensor Network (WSN). That is, for a given hierarchical WSN, how many clusterhead nodes we should assign, and how to geographically allocate these clusterheads. Since an assignment scheme optimizing all factors is impossible, we will focus on the crucial issue of energy efficiency of the WSN. Because it is mostly true that the nodes of WSN are powered by batteries, power saving is an especially important consideration in WSN architecture design. We will propose a hierarchical WSN architecture toward the end of saving energy of both sensor nodes and clusterheads. Using analytical result, experiments are conducted in which realistic scenarios are simulated.
Dajin WangEmail:
  相似文献   

17.
A scheme for reducing the hardware resources to implement on LUT-based FPGA devices the twiddle factors required in Fast Fourier Transform (FFT) processors is presented. The proposed scheme reduces the number of embedded block RAM for large FFTs and the number of slices for FFT lengths higher than 128 points. Results are given for Xilinx devices, but they can be generalized for other advanced LUT-based devices like ALTERA Stratix.
T. SansaloniEmail:
  相似文献   

18.
While the voice over Internet protocol (VoIP) services is expected to be widely supported in wireless mobile networks, the performance of VoIP services has not previously been evaluated in the IEEE 802.16e orthogonal frequency division multiple access (OFDMA) system taking the adaptive modulation and coding scheme into consideration. To support real-time uplink service flows, three different types of scheduling have been designed in the IEEE 802.16e standard: the unsolicited grant service (UGS), the real-time polling service (rtPS), and the extended rtPS (ertPS). In this paper, we compare the three real-time scheduling algorithms in terms of the performance of VoIP services by using the analytical and simulation models that we developed.
Jae-Woo SoEmail:
  相似文献   

19.
In this paper, we propose a new differentiated service model, referred to as Differentiated Service-EDCA (DS-EDCA), for the Enhanced Distributed Channel Access (EDCA) of IEEE 802.11e wireless local area networks (WLANs). With DS-EDCA, both strict priority and weighted fair service can be provided. The strict priority service is provided for high priority traffic through carefully setting the EDCA parameter sets of lower priority traffic; the proportional fairness service is enabled by determining the backoff intervals according to the distributed scheduling discipline (DFS). We also propose a hierarchical link sharing model for IEEE 802.11e WLANs, in which AP and mobile stations are allocated different amounts of link resource. The performance of DS-EDCA and EDCA is compared via ns-2 simulations. The results show that DS-EDCA outperforms the original EDCA in terms of its support for both strict priority and weighted fair service. More importantly, DS-EDCA can be easily implemented, and is compatible to the IEEE 802.11 Standard.
Meng Chang ChenEmail:
  相似文献   

20.
In this paper, several algorithms for compressing the feedback of channel quality information are presented and analyzed. These algorithms are developed for a proposed adaptive modulation scheme for future multi-carrier 4G mobile systems. These strategies compress the feedback data and, used together with opportunistic scheduling, drastically reduce the feedback data rate. Thus the adaptive modulation schemes become more suitable and efficient to be implemented in future mobile systems, increasing data throughput and overall system performance.
Arne SvenssonEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号