首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Future computing devices are likely to be based on heterogeneous architectures, which comprise of multi-core CPUs accompanied with GPU or special purpose accelerators. A challenging issue for such devices is how to effectively manage the resources to achieve high efficiency and low energy consumption. With multiple new programming models and advanced framework support for heterogeneous computing, we have seen many regular applications benefit greatly from heterogeneous systems. However, migrating the success of heterogeneous computing to irregulars remains a challenge. An irregular program's attribute may vary during execution and are often unpredictable, making it difficult to allocate heterogeneous resources to achieve the highest efficiency. Moreover, the irregularity in applications may cause control flow divergence, load imbalance and low efficiency in parallel execution. To resolve these issues, we studied and proposed phase guided dynamic work partitioning, a light-weight and fast analysis technique, to collect information during program phases at runtime in order to guide work partitioning in subsequent phases for more efficient work dispatching on heterogeneous systems. We implemented an adaptive Runtime System based on the aforementioned technique and take Ray-Tracing to explore the performance potential of dynamic work distribution techniques in our framework. The experiments have shown that the performance gain of this approach can be as high as 5 times faster than the original system. The proposed techniques can be applied to other irregular applications with similar properties.  相似文献   

2.
This paper describes several challenges facing programmers of future edge computing systems, the diverse many-core devices that will soon exemplify commodity mainstream systems. To call attention to programming challenges ahead, this paper focuses on the most complex of such architectures: integrated, power-conserving systems, inherently parallel and heterogeneous, with distributed address spaces. When programming such complex systems, new concerns arise: computation partitioning across functional units, data movement and synchronization, managing a diversity of programming models for different devices, and reusing existing legacy and library software. We observe that many of these challenges are also faced in programming applications for large-scale heterogeneous distributed computing environments, and current solutions as well as future research directions in distributed computing can be adapted to commodity computing environments. Optimization decisions are inherently complex due to large search spaces of possible solutions and the difficulty of predicting performance on increasingly complex architectures. Cognitive techniques are well suited for managing systems of such complexity, citing recent trends of using cognitive techniques for code mapping and optimization support. Combining these, we describe a fundamentally new programming paradigm for complex heterogeneous systems, where programmers design self-configuring applications and the system automates optimization decisions and manages the allocation of heterogeneous resources.  相似文献   

3.
Cloud computing is the key and frontier field of the current domestic and international computer technology, workflow task scheduling plays an important part of cloud computing, which is a policy that maps tasks to appropriate resources to execute. Effective task scheduling is essential for obtaining high performance in cloud environment. In this paper, we present a workflow task scheduling algorithm based on the resources' fuzzy clustering named FCBWTS. The major objective of scheduling is to minimize makespan of the precedence constrained applications, which can be modeled as a directed acyclic graph. In FCBWTS, the resource characteristics of cloud computing are considered, a group of characteristics, which describe the synthetic performance of processing units in the resource system, are defined in this paper. With these characteristics and the execution time influence of the ready task in the critical path, processing unit network is pretreated by fuzzy clustering method in order to realize the reasonable partition of processor network. Therefore, it largely reduces the cost in deciding which processor to execute the current task. Comparison on performance evaluation using both the case data in the recent literature and randomly generated directed acyclic graphs shows that this algorithm has outperformed the HEFT, DLS algorithms both in makespan and scheduling time consumed. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
Object detection represents one of the most important and challenging task in computer vision applications. Boosting-based approaches deal with computational intensive operations and they involve several sequential tasks that make very difficult developing hardware implementations with high parallelism level. This work presents a new hardware architecture able to perform object detection based on a cascade classifier in real-time and resource-constrained systems. As case study, the proposed architecture has been tailored to accomplish the face detection task and integrated within a complete heterogeneous embedded system based on a Xilinx Zynq-7000 FPGA-based System-on-Chip. Experimental results show that, thanks to the proposed parallel processing scheme and the runtime adaptable strategy to slide sub-windows across the input image, the novel design achieves a frame rate up to 125fps for the QVGA resolution, thus significantly outperforming previous works. Such a performance is obtained by using less than 10% of on-chip available logic resources with a power consumption of 377  mW at the 100 MHz clock frequency.  相似文献   

5.
With the rapid development of the Internet of Things (IoT), there are several challenges pertaining to security in IoT applications. Compared with the characteristics of the traditional Internet, the IoT has many problems, such as large assets, complex and diverse structures, and lack of computing resources. Traditional network intrusion detection systems cannot meet the security needs of IoT applications. In view of this situation, this study applies cloud computing and machine learning to the intrusion detection system of IoT to improve detection performance. Usually, traditional intrusion detection algorithms require considerable time for training, and these intrusion detection algorithms are not suitable for cloud computing due to the limited computing power and storage capacity of cloud nodes; therefore, it is necessary to study intrusion detection algorithms with low weights, short training time, and high detection accuracy for deployment and application on cloud nodes. An appropriate classification algorithm is a primary factor for deploying cloud computing intrusion prevention systems and a prerequisite for the system to respond to intrusion and reduce intrusion threats. This paper discusses the problems related to IoT intrusion prevention in cloud computing environments. Based on the analysis of cloud computing security threats, this study extensively explores IoT intrusion detection, cloud node monitoring, and intrusion response in cloud computing environments by using cloud computing, an improved extreme learning machine, and other methods. We use the Multi-Feature Extraction Extreme Learning Machine (MFE-ELM) algorithm for cloud computing, which adds a multi-feature extraction process to cloud servers, and use the deployed MFE-ELM algorithm on cloud nodes to detect and discover network intrusions to cloud nodes. In our simulation experiments, a classical dataset for intrusion detection is selected as a test, and test steps such as data preprocessing, feature engineering, model training, and result analysis are performed. The experimental results show that the proposed algorithm can effectively detect and identify most network data packets with good model performance and achieve efficient intrusion detection for heterogeneous data of the IoT from cloud nodes. Furthermore, it can enable the cloud server to discover nodes with serious security threats in the cloud cluster in real time, so that further security protection measures can be taken to obtain the optimal intrusion response strategy for the cloud cluster.  相似文献   

6.
With the advent of the Internet‐of‐Things paradigm, the amount of data production has grown exponentially and the user demand for responsive consumption of data has increased significantly. Herein, we present DART, a fast and lightweight stream processing framework for the IoT environment. Because the DART framework targets a geospatially distributed environment of heterogeneous devices, the framework provides (1) an end‐user tool for device registration and application authoring, (2) automatic worker node monitoring and task allocations, and (3) runtime management of user applications with fault tolerance. To maximize performance, the DART framework adopts an actor model in which applications are segmented into microtasks and assigned to an actor following a single responsibility. To prove the feasibility of the proposed framework, we implemented the DART system. We also conducted experiments to show that the system can significantly reduce computing burdens and alleviate network load by utilizing the idle resources of intermediate edge devices.  相似文献   

7.
Vehicular Cloud Computing (VCC) facilitates real-time execution of many emerging user and intelligent transportation system (ITS) applications by exploiting under-utilized on-board computing resources available in nearby vehicles. These applications have heterogeneous time criticality, i.e., they demand different Quality-of-Service levels. In addition to that, mobility of the vehicles makes the problem of scheduling different application tasks on the vehicular computing resources a challenging one. In this article, we have formulated the task scheduling problem as a mixed integer linear program (MILP) optimization that increases the computation reliability even as reducing the job execution delay. Vehicular on-board units (OBUs), manufactured by different vendors, have different architecture and computing capabilities. We have exploited MapReduce computation model to address the problem of resource heterogeneity and to support computation parallelization. Performance of the proposed solution is evaluated in network simulator version 3 (ns-3) by running MapReduce applications in urban road environment and the results are compared with the state-of-the-art works. The results show that significant performance improvements in terms of reliability and job execution time can be achieved by the proposed task scheduling model.  相似文献   

8.
The co-synthesis of hardware–software systems for complex embedded applications has been studied extensively with focus on various qualitative system objectives such as high speed performance and low power dissipation. One of the main challenges in the construction of multiprocessor systems for complex real time applications is provide high levels of system availability that satisfies the users’ expectations. Even though the area of hardware software cosynthesis has been studied extensively in the recent past, the issues that specifically relate to design exploration for highly available architectures need to be addressed more systematically and in a manner that supports active user participation. In this paper, we propose a user-centric co-synthesis mechanism for generating gracefully degrading, heterogeneous multiprocessor architectures that fulfills the dual objectives of achieving real-time performance as well as ensuring high levels of system availability at acceptable cost. A flexible interface allows the user to specify rules that effectively capture the users’ perceived availability expectations under different working conditions. We propose an algorithm to map these user requirements to the importance attached to the subset of services provided during any functional state. The system availability is evaluated on the basis of these user-driven importance values and a CTMC model of the underlying fail-repair process. We employ a stochastic timing model in which all the relevant performance parameters such as task execution times, data arrival times and data communication times are taken to be random variables. A stochastic scheduling algorithm assigns start and completion time distributions to tasks. A hierarchical genetic algorithm optimizes the selections of resources, i.e. processors and busses, and the task allocations. We report the results of a number of experiments performed with representative task graphs. Analysis shows that the co-synthesis tool we have developed is effectively driven by the user’s availability requirements as well as by the topological characteristics of the task graph to yield high quality architectures. We experimentally demonstrate the edge provided by a stochastic timing model in terms of performance assessment, resource utilization, system-availability and cost. An erratum to this article is available at .  相似文献   

9.
The unabated flurry of research activities to augment various mobile devices in terms of compute‐intensive task execution by leveraging heterogeneous resources of available devices in the local vicinity has created a new research domain called mobile ad hoc cloud (MAC) or mobile cloud. It is a new type of mobile cloud computing (MCC). MAC is deemed to be a candidate blueprint for future compute‐intensive applications with the aim of delivering high functionalities and rich impressive experience to mobile users. However, MAC is yet in its infancy, and a comprehensive survey of the domain is still lacking. In this paper, we survey the state‐of‐the‐art research efforts carried out in the MAC domain. We analyze several problems inhibiting the adoption of MAC and review corresponding solutions by devising a taxonomy. Moreover, MAC roots are analyzed and taxonomized as architectural components, applications, objectives, characteristics, execution model, scheduling type, formation technologies, and node types. The similarities and differences among existing proposed solutions by highlighting the advantages and disadvantages are also investigated. We also compare the literature based on objectives. Furthermore, our study advocates that the problems stem from the intrinsic characteristics of MAC by identifying several new principles. Lastly, several open research challenges such as incentives, heterogeneity‐ware task allocation, mobility, minimal data exchange, and security and privacy are presented as future research directions. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

10.
Resource virtualization has become one of the key super‐power mobile computing architecture technologies. As mobile devices and multimedia traffic have increased dramatically, the load on mobile cloud computing systems has become heavier. Under such conditions, mobile cloud system reliability becomes a challenging task. In this paper, we propose a new model using a naive Bayes classifier for hypervisor failure prediction and prevention in mobile cloud computing. We exploit real‐time monitoring data in combination with historical maintenance data, which achieves higher accuracy in failure prediction and early failure‐risk detection. After detecting hypervisors at risk, we perform live migration of virtual servers within a cluster, which decreases the load and prevents failures in the cloud. We performed a simulation for verification. According to the experimental results, our proposed model shows good accuracy in failure prediction and the possibility of decreasing downtime in a hypervisor service. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

11.
黄涛 《电子质量》2002,(3):110-112
探讨了异构计算环境下的实时多媒体通信以及ATM(异步传输模式)网络上能提供服务保证的高性能宽带多媒体应用,设计了异构环境下基于Native-ATM(纯ATM)方式的实时多媒体通信系统,该系统在两不同的计算机系统之间实现了基于ATM AAL5协议的实时多媒体通信数据流的互通,为异构环境下宽带分布多媒体应用的研究创造了良好的基础。  相似文献   

12.
Data-intensive Grid applications require huge data transferring between multiple geographically separated computing nodes where computing tasks are executed. For a future WDM network to efficiently support this type of emerging applications, neither the traditional approaches to establishing lightpaths between given source destination pairs are sufficient, nor are those existing application level approaches that consider computing resources but ignore the optical layer connectivity. Instead, lightpath establishment has to be considered jointly with task scheduling to achieve best performance. In this paper, we study the optimization problems of jointly scheduling both computing resources and network resources. We first present the formulation of two optimization problems with the objectives being the minimization of the completion time of a job and minimization of the resource usage/cost to satisfy a job with a deadline. When the objective is to minimize the completion time, we devise an optimal algorithm for a special type of applications. Furthermore, we propose efficient heuristics to deal with general applications with either optimization objective and demonstrate their good performances in simulation.  相似文献   

13.
With the advance of network and computer techniques, the development of scalable computing becomes a new trend. To integrate and utilize distributed and heterogeneous resources efficiently, message broadcasting is an important and crucial technique for distributed computing systems such as grids and clouds. In this paper, we present a Location Aware Broadcasting Scheme (LABS) for performing message broadcast on irregular and heterogeneous networks in distributed systems. The LABS introduces a new scheduling scheme that based on heterogeneity of workstation and network topology. Together with a binomial tree optimization technique, the LABS is able to schedule communications to avoid both node and link contention. To evaluate the performance of the proposed techniques, we have implemented the LABS method along with some well‐known algorithms. These algorithms were performed in a variety of scenarios. Our extensive experiments show that the LABS is able to provide reliable performance with lower network latency in different circumstances. In particular, the LABS have significant improvements when the environment is with high heterogeneity. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

14.
刘昊 《电子质量》2010,(12):1-4
随着GPU的发展,其计算能力和访存带宽都超过了CPU,在GPU上进行通用计算具有成本低、性能高的特点。细胞神经网络由于其特有的性质,非常适合利用GPU进行并行计算,因此,该文提出了利用CU-DA实现的基于GPU的细胞神经网络异构算法,并应用在图像边缘检测上。实验结果证明,与传统的利用CPU实现的边缘检测方法相比,在速度上,基于GPU实现的图像边缘检测方法提高了数十倍,为细胞神经网络在实时图像、视频处理上的应用提供了新的方法。  相似文献   

15.
Wireless sensor network (WSN) technologies have enabled ubiquitous sensing to intersect many areas of modern day living. The creation of these devices offers the ability to get, gather, exchange, and consume environmental measurement from the physical world in a communicating‐actuating network, called the Internet of Things (IoT). As the number of physical world objects from heterogeneous network environments grows, the data produced by these objects raise uncontrollably, bringing a delicate challenge into scalability management in the IoT networks. Cloud computing is a much more mature technology, offering unlimited virtual capabilities in terms of storage capacity and processing power. Ostensibly, it seems that cloud computing and IoT are evolving independently on their own paths, but in reality, the integration of clouds with IoT will lead to deal with the inability to scale automatically depending on the overload caused by the drastic growth of the number of connected devices and/or by the huge amount of exchanged data in the IoT networks. In this paper, our objective is to promote the scalability management, using hybrid mechanism that will combine traffic‐oriented mechanism and resources‐oriented mechanism, with adaption actions. By the use of autonomic middleware within IoT systems, we seek to improve the monitoring components's architectural design, based on cloud computing‐oriented scalability solution. The intention is to maximize the number of satisfied requests, while maintaining at an acceptable QoS level of the system performances (RTT of the system, RAM, and CPU of the middleware). In order to evaluate our solution performance, we have performed different scenarios testbed experiments. Generally, our proposed results are better than those mentioned as reference.  相似文献   

16.
Unmanned aerial vehicles (UAVs) are utilized in the surveillance and reconnaissance system of hazardous locations by utilizing the feature that they can freely move away from space constraints. Furthermore, the application scope of the UAVs expanded not only for simple image data collection but also for analysis of complex image data without human intervention. However, mobile UAV systems, such as drone, have limited computing resources and battery power which makes it a challenge to use these systems for long periods of time. In this paper, we propose an AOM, Adaptive Offloading with MPTCP (Multipath TCP), architecture for increasing drone operating time. We design not only the task offloading management module via the MPTCP to utilize heterogeneous network but also the response time prediction module for mission critical task offloading decision. Through the prototype drone implementation, we show the AOM reduces the task response time and increases drone operation time.  相似文献   

17.
Cloud computing provides high accessibility, scalability, and flexibility in the era of computing for different practical applications. Internet of things (IoT) is a new technology that connects the devices and things to provide user required services. Due to data and information upsurge on IoT, cloud computing is usually used for managing these data, which is known as cloud‐based IoT. Due to the high volume of requirements, service diversity is one of the critical challenges in cloud‐based IoT. Since the load balancing issue is one of the NP‐hard problems in heterogeneous environments, this article provides a new method for response time reduction using a well‐known grey wolf optimization algorithm. In this paper, we supposed that the response time is the same as the execution time of all the tasks that this parameter must be minimized. The way is determining the status of virtual machines based on the current load. Then the tasks will be removed from the machine with the additional load depending on the condition of the virtual machine and will be transferred to the appropriate virtual machine, which is the criterion for assigning the task to the virtual machine based on the least distance. The results of the CloudSim simulation environment showed that the response time is developed in compared to the HBB‐LB and EBCA‐LB algorithm. Also, the load imbalancing degree is improved in comparison to TSLBACO and HJSA.  相似文献   

18.

随着物联网(IoT)迅速发展,移动边缘计算(MEC)在提供高性能、低延迟计算服务方面的作用日益明显。然而,在面向IoT业务的MEC(MEC-IoT)时变环境中,不同边缘设备和应用业务在时延和能耗等方面具有显著的异构性,对高效的任务卸载及资源分配构成严峻挑战。针对上述问题,该文提出一种动态的分布式异构任务卸载算法(D2HM),该算法利用分布式博弈机制并结合李雅普诺夫优化理论,设计了一种资源的动态报价机制,并实现了对不同业务类型差异化控制和计算资源的弹性按需分配,仿真结果表明,所提的算法可以满足异构任务的多样化计算需求,并在保证网络稳定性的前提下降低系统的平均时延。

  相似文献   

19.
高性能嵌入式图像处理系统研究   总被引:3,自引:0,他引:3  
为了提高计算机视觉中图像处理的速度,深入分析了计算机视觉中图像处理的三个层次的并行计算特征.以数据并行的处理元阵列芯片为基础,通过对其进行不同组合得到高性能嵌入式图像处理系统.该系统为图像处理提供了不同层次的数据并行性和任务并行性,满足了图像处理对并行计算的需求,为实时嵌入式图像处理提供了较高的计算性能.此外,处理元阵列芯片的实现方式又保证了其具有较小的体积,满足了嵌入性的要求.  相似文献   

20.
The present development of high data rate wireless applications has led to extra bandwidth demands. However, finding a new spectrum bandwidth to accommodate these applications and services is a challenging task because of the scarcity of spectrum resources. In fact, the spectrum is utilized inefficiently for conventional spectrum allocation, so Federal Communications Commission has proposed dynamic spectrum access mechanism in cognitive radio, where unlicensed users can opportunistically borrow unused licensed spectrum, which is a challenge to obtain contiguous frequency spectrum block. This also has a significant impact on multicarrier transmission systems such as orthogonal frequency division multiplexing (OFDM) and multicarrier code division multiple access (MC‐CDMA). As a solution, this paper develops non‐contiguous OFDM (NC‐OFDM) and non‐contiguous MC‐CDMA (NC‐MC‐CDMA) cognitive system. The implementation of NC‐OFDM and NC‐MC‐CDMA systems provides high data rate via a large number of non‐contiguous subcarriers without interfering with the existing transmissions. The system performance evaluates NC‐OFDM and NC‐MC‐CDMA for mobile scenario where each propagation path will experience Doppler frequency shift because of the relative motion between the transmitter and receiver. The simulation results of this paper proved that NC‐OFDM system is a superior candidate than NC‐MC‐CDMA system considering the mobility for cognitive users. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号