共查询到20条相似文献,搜索用时 31 毫秒
1.
Leakage-Aware Multiprocessor Scheduling 总被引:2,自引:0,他引:2
When peak performance is unnecessary, Dynamic Voltage Scaling (DVS) can be used to reduce the dynamic power consumption of
embedded multiprocessors. In future technologies, however, static power consumption due to leakage current is expected to
increase significantly. Then it will be more effective to limit the number of processors employed (i.e., turn some of them
off), or to use a combination of DVS and processor shutdown. In this paper, leakage-aware scheduling heuristics are presented
that determine the best trade-off between these three techniques: DVS, processor shutdown, and finding the optimal number
of processors. Experimental results obtained using a public benchmark set of task graphs and real parallel applications show
that our approach reduces the total energy consumption by up to 46% for tight deadlines (1.5× the critical path length) and
by up to 73% for loose deadlines (8× the critical path length) compared to an approach that only employs DVS. We also compare
the energy consumed by our scheduling algorithms to two absolute lower bounds, one for the case where all processors continuously
run at the same frequency, and one for the case where the processors can run at different frequencies and these frequencies
may change over time. The results show that the energy reduction achieved by our best approach is close to these theoretical
limits.
相似文献
Ben JuurlinkEmail: |
2.
Wenlong Li Xiaofeng Tong Tao Wang Yimin Zhang Yen-Kuang Chen 《Journal of Signal Processing Systems》2009,57(2):213-228
This paper studies how to parallelize the emerging media mining workloads on existing small-scale multi-core processors and
future large-scale platforms. Media mining is an emerging technology to extract meaningful knowledge from large amounts of
multimedia data, aiming at helping end users search, browse, and manage multimedia data. Many of the media mining applications
are very complicated and require a huge amount of computing power. The advent of multi-core architectures provides the acceleration
opportunity for media mining. However, to efficiently utilize the multi-core processors, we must effectively execute many
threads at the same time. In this paper, we present how to explore the multi-core processors to speed up the computation-intensive
media mining applications. We first parallelize two media mining applications by extracting the coarse-grained parallelism
and evaluate their parallel speedups on a small-scale multi-core system. Our experiment shows that the coarse-grained parallelization
achieves good scaling performance, but not perfect. When examining the memory requirements, we find that these coarse-grained
parallelized workloads expose high memory demand. Their working set sizes increase almost linearly with the degree of parallelism,
and the instantaneous memory bandwidth usage prevents them from perfect scalability on the 8-core machine. To avoid the memory
bandwidth bottleneck, we turn to exploit the fine-grained parallelism and evaluate the parallel performance on the 8-core
machine and a simulated 64-core processor. Experimental data show that the fine-grained parallelization demonstrates much
lower memory requirements than the coarse-grained one, but exhibits significant read-write data sharing behavior. Therefore,
the expensive inter-thread communication limits the parallel speedup on the 8-core machine, while excellent speedup is observed
on the large-scale processor as fast core-to-core communication is provided via a shared cache. Our study suggests that (1)
extracting the coarse-grained parallelism scales well on small-scale platforms, but poorly on large-scale system; (2) exploiting
the fine-grained parallelism is suitable to realize the power of large-scale platforms; (3) future many-core chips can provide
shared cache and sufficient on-chip interconnect bandwidth to enable efficient inter-core communication for applications with
significant amounts of shared data. In short, this work demonstrates proper parallelization techniques are critical to the
performance of multi-core processors. We also demonstrate that one of the important factors in parallelization is the performance
analysis. The parallelization principles, practice, and performance analysis methodology presented in this paper are also
useful for everyone to exploit the thread-level parallelism in their applications.
相似文献
Wenlong LiEmail: |
3.
Dynamic Voltage Scaling (DVS) is a promising method to achieve energy saving by slowing down the processor into multiple frequency
levels in battery-operated embedded systems. However, the worst case execution time (WCET) of the tasks scheduled by DVS must
be known ahead of time to ensure their schedulability. In reality, a system’s workloads may change significantly without satisfying
any prediction. In other words, a task’s WCET may not provide useful information about its future real execution time (RET).
This paper presents a novel Dynamic-Mode EDF scheduling algorithm when workloads change significantly. One of the Single-Mode,
Dual-Mode, and Three-Mode frequency setting formats can be applied, based on the RET and the accumulated slack at run-time.
Only one combination of the number of modes/speeds, speed-switching transition points, and the frequency scaling factor for
each mode can lead to the best energy saving. Experimental results show that, given an RET pattern, our Dynamic-Mode DVS algorithm
achieves an average 15% energy savings over the traditional two-mode DVS scheme on hard real-time systems. Additionally, we
also consider speed-switching or energy transition overhead, and implement a preliminary test of our proposed algorithm. With
a less aggressive voltage scaling strategy (fewer speed changes for each job), deadlines can still be strictly satisfied and
an average of 14% energy consumption saving over a non-DVS scheme is observed.
相似文献
Albert Mo Kim ChengEmail: |
4.
Peter Westermann Ludwig Schwoerer Andre Kaufmann 《Journal of Signal Processing Systems》2009,57(1):57-72
Vector digital signal processors (DSPs) offer a good performance to power consumption ratio. Therefore, they are suitable
for mobile devices in software defined radio applications. These vector DSPs require input algorithms with vector operations.
The performance of vectorized algorithms to a great extent depends on the distribution of data on vector elements. Traditional
algorithms for vectorization focus on the extraction of parallelism from a program; we propose an analysis tool that focuses
on the selection of an efficient dynamic data mapping for vector DSPs. We transferred Garcia’s communication parallelism graph
(Garcia et al., IEEE Trans Parallel Distrib Syst 12: 416–431, 2001) for distributed memory multiprocessor systems to vector DSPs. By alternating the representation of two-dimensional data
distributions and the cost models, we are able to determine a dynamic mapping of data on vector elements on the Embedded Vector
Processor (EVP) (van Berkel et al., Proceedings of the 2004 software-defined radio technical conference SDR’04, 2004). Additionally, we propose a new efficient algorithm for processing the graph representation that operates in two steps.
We demonstrate the capabilities of our tool by describing the vectorization of some MIMO OFDM algorithms.
相似文献
Andre KaufmannEmail: |
5.
This paper presents a flexible controller structure for concurrent processing of memory centric coarse grain data flows. We
propose a design flow based on block level pipelining where concurrency among processing blocks is fully maintained. The controller
is dynamically reconfigurable to support dynamic data-flow structure changes by localizing control signals. The proposed control
design method isolates controllers and processing logics such that system integration is simplified while controllers are
locally configured from orthogonal global information. The controller also supports interfacing with external processors for
asynchronous processing. The controller for heterogeneous processing blocks is synthesized and verified using Verilog and SystemC on FPGA. We present an example demonstrating the effectiveness of the controllers where dynamic reconfiguration of the execution
is feasible.
相似文献
Sangjin HongEmail: |
6.
B. Mei B. De Sutter T. Vander Aa M. Wouters A. Kanstein S. Dupont 《Journal of Signal Processing Systems》2008,51(3):225-243
Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) is a templatized coarse-grained reconfigurable processor
architecture. It targets at embedded applications which demand high-performance, low-power and high-level language programmability.
Compared with typical very long instruction word-based digital signal processor, ADRES can exploit higher parallelism by using
more scalable hardware with support of novel compilation techniques. We developed a complete tool-chain, including compiler,
simulator and HDL generator. This paper describes the design case of a media processor targeting at H.264 decoder and other
video tasks based on the ADRES template. The whole processor design, hardware implementaiton and application mapping are done
in a relative short period. Yet we obtain C-programmed real-time H.264/AVC CIF decoding at 50 MHz. The die size, clock speed
and the power consumption are also very competitive compared with other processors.
相似文献
S. DupontEmail: |
7.
Voltage Assignment with Guaranteed Probability Satisfying Timing Constraint for Real-time Multiproceesor DSP 总被引:1,自引:0,他引:1
Meikang Qiu Zhiping Jia Chun Xue Zili Shao Edwin H.-M. Sha 《The Journal of VLSI Signal Processing》2007,46(1):55-73
Dynamic Voltage Scaling (DVS) is one of the techniques used to obtain energy-saving in real-time DSP systems. In many DSP
systems, some tasks contain conditional instructions that have different execution times for different inputs. Due to the
uncertainties in execution time of these tasks, this paper models each varied execution time as a probabilistic random variable
and solves the Voltage Assignment with Probability (VAP) Problem. VAP problem involves finding a voltage level to be used for each node of an date flow graph (DFG) in uniprocessor
and multiprocessor DSP systems. This paper proposes two optimal algorithms, one for uniprocessor and one for multiprocessor
DSP systems, to minimize the expected total energy consumption while satisfying the timing constraint with a guaranteed confidence
probability. The experimental results show that our approach achieves significant energy saving than previous work. For example,
our algorithm for multiprocessor achieves an average improvement of 56.1% on total energy-saving with 0.80 probability satisfying
timing constraint.
相似文献
Edwin H.-M. ShaEmail: |
8.
This paper describes a framework for fixed- length frame scheduling in all-photonic networks with large propagation delays.
We introduce the Fair Matching Algorithm a novel scheduling approach that results in weighted max-min fair allocation of extra
slots, achieves zero rejection for admissible demands, and minimizes the maximum percentage rejection of any connection. We
also propose the Minimum Rejection Algorithm, which minimizes total rejection but treats non-critical connections in a fair
manner. Finally, we introduce a feedback control system based on Smith’s principle that reduces the effect of prediction errors
and increases the speed of the response to the sudden changes in traffic arrival rates. Simulations performed using OPNET
Modeler explore the performance of the scheduling and control algorithms we propose.
相似文献
M. J. CoatesEmail: |
9.
10.
Y. Xie L. Li M. Kandemir N. Vijaykrishnan M. J. Irwin 《The Journal of VLSI Signal Processing》2007,49(1):87-99
As technology scales, transient faults have emerged as a key challenge for reliable embedded system design. This paper proposes
a design methodology that incorporates reliability into hardware–software co-design paradigm for embedded systems. We introduce
an allocation and scheduling algorithm that efficiently handles conditional execution in multi-rate embedded systems, and
selectively duplicates critical tasks to detect or correct transient errors, such that the reliability of the system is improved.
Two methods are proposed to insert duplicated tasks into the schedule. The improved reliability is achieved by utilizing the
otherwise idle computation resources and taking advantage of the overlapping schedule for mutually exclusive tasks in the
conditional task graph, such that it incurs no resource or performance penalty.
相似文献
M. J. IrwinEmail: |
11.
In this paper, we consider a joint packet scheduling algorithm for wireless networks and investigate its characteristics.
The joint scheduling algorithm is a combination of the Knopp and Humblet (KH) scheduling, which fully exploits multiuser diversity,
and the probabilistic weighted round-robin (WRR) scheduling, which does not use multiuser diversity at all. Under the assumption
that the wireless channel process for each user is described by the Nakagami-m model, we develop a formula to estimate the tail distribution of the packet delay for an arbitrary user under the joint scheduling.
Numerical results exhibit that under the joint scheduling, the ratio of the number of slots assigned for the WRR scheduling
to that for the KH scheduling dominates the characteristics of the delay performance.
相似文献
Gang Uk HwangEmail: |
12.
Range-Based Sleep Scheduling (RBSS) for Wireless Sensor Networks 总被引:3,自引:0,他引:3
Sleep scheduling in a wireless sensor network is the process of deciding which nodes are eligible to sleep (enter power-saving
mode) after random deployment to conserve energy while retaining network coverage. Most existing approaches toward this problem
require sensor’s location information, which may be impractical considering costly locating overheads. This paper proposes
range-based sleep scheduling (RBSS) protocol which needs sensor-to-sensor distance but no location information. RBSS attempts
to approach an optimal sensor selection pattern that demands the fewest working (awake) sensors. Simulation results indicate
that RBSS is comparable to its location-based counterpart in terms of coverage quality and the reduction of working sensors.
相似文献
Yang-Min ChengEmail: |
13.
14.
Wei Guo Zhengyu Wang Zhenyu Sun Weiqiang Sun Yaohui Jin Weisheng Hu Chunming Qiao 《Photonic Network Communications》2009,17(3):209-217
Currently optical networks have been employed to meet the ever-increasing data transfer demands of grid applications and thus
give rise to the concept of an “optical grid”. Task scheduling is an important issue for an optical grid, for it optimally
allocates both grid and optical network resources to accelerate application execution and increase the resource utilization
ratio. However, most task scheduling algorithms based on theoretical models may generate accuracy deviations between the scheduled
results and the actual finish time of the applications. Accuracy deviations may lead to inefficient resources utilization
and unsatisfied Quality of Service (QoS). This paper aims to improve the accuracy of task scheduling algorithms in optical
grid environments. We first propose the theoretical task scheduling algorithm and demonstrate that the scheduling result is
deviated with actual finish time in the real optical grid environment. Then, we reveal several factors which are likely to
influence scheduling accuracy and develop a realistic task scheduling algorithm. We evaluate the theoretical and realistic
task scheduling algorithms in our optical grid testbed. The experimental result shows the scheduling accuracy can be improved
significantly by the realistic task scheduling algorithm.
相似文献
Wei GuoEmail: |
15.
A New Routing Metric for Satisfying Both Energy and Delay Constraints in Wireless Sensor Networks 总被引:1,自引:0,他引:1
Besides energy constraint, wireless sensor networks should also be able to provide bounded communication delay when they are
used to support real-time applications. In this paper, a new routing metric is proposed. It takes into account both energy
and delay constraints. It can be used in AODV. By mathematical analysis and simulations, we have shown the efficiency of this
new routing metric.
相似文献
YeQiong SongEmail: |
16.
Dajin Wang 《International Journal of Wireless Information Networks》2008,15(2):61-71
We study the problem of assigning clusterheads in a hierarchical Wireless Sensor Network (WSN). That is, for a given hierarchical
WSN, how many clusterhead nodes we should assign, and how to geographically allocate these clusterheads. Since an assignment
scheme optimizing all factors is impossible, we will focus on the crucial issue of energy efficiency of the WSN. Because it
is mostly true that the nodes of WSN are powered by batteries, power saving is an especially important consideration in WSN
architecture design. We will propose a hierarchical WSN architecture toward the end of saving energy of both sensor nodes
and clusterheads. Using analytical result, experiments are conducted in which realistic scenarios are simulated.
相似文献
Dajin WangEmail: |
17.
T. Sansaloni A. Pérez-Pascual V. Torres J. Valls 《The Journal of VLSI Signal Processing》2007,47(2):183-187
A scheme for reducing the hardware resources to implement on LUT-based FPGA devices the twiddle factors required in Fast Fourier
Transform (FFT) processors is presented. The proposed scheme reduces the number of embedded block RAM for large FFTs and the
number of slices for FFT lengths higher than 128 points. Results are given for Xilinx devices, but they can be generalized
for other advanced LUT-based devices like ALTERA Stratix.
相似文献
T. SansaloniEmail: |
18.
Jae-Woo So 《Wireless Personal Communications》2008,47(2):247-263
While the voice over Internet protocol (VoIP) services is expected to be widely supported in wireless mobile networks, the
performance of VoIP services has not previously been evaluated in the IEEE 802.16e orthogonal frequency division multiple
access (OFDMA) system taking the adaptive modulation and coding scheme into consideration. To support real-time uplink service
flows, three different types of scheduling have been designed in the IEEE 802.16e standard: the unsolicited grant service
(UGS), the real-time polling service (rtPS), and the extended rtPS (ertPS). In this paper, we compare the three real-time
scheduling algorithms in terms of the performance of VoIP services by using the analytical and simulation models that we developed.
相似文献
Jae-Woo SoEmail: |
19.
In this paper, we propose a new differentiated service model, referred to as Differentiated Service-EDCA (DS-EDCA), for the
Enhanced Distributed Channel Access (EDCA) of IEEE 802.11e wireless local area networks (WLANs). With DS-EDCA, both strict
priority and weighted fair service can be provided. The strict priority service is provided for high priority traffic through
carefully setting the EDCA parameter sets of lower priority traffic; the proportional fairness service is enabled by determining
the backoff intervals according to the distributed scheduling discipline (DFS). We also propose a hierarchical link sharing
model for IEEE 802.11e WLANs, in which AP and mobile stations are allocated different amounts of link resource. The performance
of DS-EDCA and EDCA is compared via ns-2 simulations. The results show that DS-EDCA outperforms the original EDCA in terms
of its support for both strict priority and weighted fair service. More importantly, DS-EDCA can be easily implemented, and
is compatible to the IEEE 802.11 Standard.
相似文献
Meng Chang ChenEmail: |
20.
Víctor P. Gil Jiménez Thomas Eriksson Ana García Armada M. Julia Fernández-Getino García Tony Ottosson Arne Svensson 《Wireless Personal Communications》2008,47(1):101-112
In this paper, several algorithms for compressing the feedback of channel quality information are presented and analyzed.
These algorithms are developed for a proposed adaptive modulation scheme for future multi-carrier 4G mobile systems. These
strategies compress the feedback data and, used together with opportunistic scheduling, drastically reduce the feedback data
rate. Thus the adaptive modulation schemes become more suitable and efficient to be implemented in future mobile systems,
increasing data throughput and overall system performance.
相似文献
Arne SvenssonEmail: |