首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
众核软件被映射到众核处理器核心上并发执行,其架构不同于单核处理器软件。现有的基于单核处理器软件所建立的可靠性模型不适用于众核软件。针对这种现状,在分析众核软件映射基本流程和众核软件架构的基础上,本文建立了一个众核软件可靠性模型。该模型揭示了任务模块执行时间的不均衡性对系统可靠性的定量影响。该模型的建立,为众核软件建立一套可靠性设计与评估方法打下了基础,对于设计高可靠的众核系统具有一定的意义。  相似文献   

2.
任务绑定与调度是众核软件综合过程中要研究的关键问题,由于众核平台的多样性与特殊性,任务绑定与调度算法在设计时需要充分考虑任务集与物理平台的特性.本文针对2D-Torus同构众核处理器平台,提出一种基于BAMSE近似算法的任务绑定与调度方案,实现了具有通信开销的非独立任务集到物理内核的绑定,并通过实验探究了改进后的BAMSE算法在2D-Torus众核平台上实现任务绑定与调度的性能.  相似文献   

3.
为了降低核仿射投影P范数(KAPP)算法的计算量和存储容量,提高在输入信号强相关时KAPP算法的收敛速度和稳态性能,该文提出基于高斯核显性映射的核归一化解相关APP(KNDAPP-GKEM)算法。该算法利用归一化解相关方法预先解除输入信号的相关性;利用高斯核显式映射方法近似得到显式核函数,消除了对历史数据的依赖,解决了KAPP算法因结构不断生长导致的计算量和存储容量过大的问题。α稳定分布噪声背景下的非线性系统辨识仿真结果表明,在输入信号强相关时KNDAPP-GKEM算法收敛速度快,非线性系统辨识稳态均方误差小,训练所需时间呈线性缓慢增长,有利于实际非线性系统辨识的应用。  相似文献   

4.
稀疏多元逻辑回归(SMLR)作为一种广义的线性模型被广泛地应用于各种多分类任务场景中。SMLR通过将拉普拉斯先验引入多元逻辑回归(MLR)中使其解具有稀疏性,这使得该分类器可以在进行分类的过程中嵌入特征选择。为了使分类器能够解决非线性数据分类的问题,该文通过核技巧对SMLR进行核化扩充后得到了核稀疏多元逻辑回归(KSMLR)。KSMLR能够将非线性特征数据通过核函数映射到高维甚至无穷维的特征空间中,使其特征能够充分地表达并最终能进行有效的分类。此外,该文还利用了基于中心对齐的多核学习算法,通过不同的核函数对数据进行不同维度的映射,并用中心对齐相似度来灵活地选取多核学习权重系数,使得分类器具有更好的泛化能力。实验结果表明,该文提出的基于中心对齐多核学习的稀疏多元逻辑回归算法在分类的准确率指标上都优于目前常规的分类算法。  相似文献   

5.
伴随大数据量的应用任务在中央处理器(CPU)与图形处理器(GPU)组成的异构处理平台上的部署日益广泛,如何高效利用GPU硬件中的并行资源,成为亟待解决的问题。通过对单GPU任务映射策略进行研究,提出多Stream有向无环图(MS-DAG)任务映射策略。通过分析DAG图中的节点依赖关系,根据节点依赖关系的不同,划分合理的并行分支,利用多Stream流水线并行的方式,实现适合GPU硬件特点的任务映射策略。通过与HEFT在不同条件下的性能对比,可以看出:当HEFT算法中的各处理器性能不一致时,MS-DAG任务映射策略的任务映射效率相比HEFT算法有约10%的提升;当HEFT算法中的各处理器性能一致时,MS-DAG任务映射策略的任务映射效率相比HEFT算法有30%的提升。  相似文献   

6.
二维FFT是图像处理的典型算法,广泛应用于图像滤波、快速卷积、目标跟踪等领域.为满足高分辨率图像的实时处理需求,基于自主研制的FT-X众核DSP处理器,提出了一种二维FFT算法的多核并行实现方法.基于众核编程模型,通过多核任务部署、地址空间重映射等方式完成了任务初始化,实现了24核数据并行处理,加速比达到19.8倍.在此基础上,提出了基于DMA跨步传输的隐式转置方案,通过矩阵地址分配的方式,解决了大型矩阵跨步传输步长受限的问题.实验结果表明,在8 K×8 K的数据规模下,相对于直接转置和指令隐式转置分别节省了91%和65%的转置时间,同时识别并解决了某特殊情况下的多核负载不均衡的问题,将各核的用时差距从64%下降到了12%,整体用时下降了26%.  相似文献   

7.
基于拓扑划分的片上网络快速映射算法   总被引:1,自引:0,他引:1  
该文针对片上网络建立了以能耗和流量均衡为优化目标的映射模型,提出一种基于拓扑划分的快速映射算法(TPBMAP)。该算法不仅考虑芯片的布局特性从而产生规整的拓扑,还采用虚拟IP核技术修正通信核图以完成IP核和网络节点数不等的映射;通过引入以流量均衡为目标的优化模型同时将通信量大的IP核映射到拓扑边缘区域,有效地降低了网络中心的流量;采用迭代的拓扑划分方法以及将通信量大的IP核映射到网络相邻位置,可快速完成低能耗映射。仿真结果表明,相比现有算法,该文提出的算法在映射速度、全网能耗以及网络中心流量等方面有较大优势。  相似文献   

8.
机载雷达空时自适应处理的实时实现   总被引:2,自引:1,他引:1  
多通道机载雷达空时自适应 (STAP) 算法运算量巨大,不易实时实现。本文详细分析了部分自适应STAP的3个计算步骤,指出了其内在并行性。针对多片DSP系统提出了一种部分自适应STAP并行算法,给出了该算法的任务划分、执行模型、任务映射策略以及性能评价函数。该算法在不同的计算阶段之间进行数据重映射。通过数据试验证明这种基于多片DSP系统的STAP并行算法具有较高的实时性能。  相似文献   

9.
并行实现空时自适应算法   总被引:3,自引:0,他引:3       下载免费PDF全文
范西昆  王永良  陈辉  李强  母其勇 《电子学报》2005,33(12):2222-2225
针对多片DSP(数字信号处理器)并行处理系统,研究了空时自适应处理(STAP)算法的实时实现.在分析了部分自适应STAP算法内在并行性的基础上,提出了一种任务级的STAP并行处理算法.给出了该算法在多DSP并行处理系统上的算法映射.通过数据实验证明这种基于多DSP并行处理系统的STAP并行算法具有较高的实时性能.  相似文献   

10.
面向通信能耗的3D NoC映射研究   总被引:1,自引:0,他引:1  
李东生  刘琪 《半导体技术》2012,37(7):504-507
对于传统的平面结构,三维片上网络(3D NoC)具有更好的集成度和性能,在单芯片内部可以集成更多的处理器核。3D NoC作为2D NoC的结构拓展,在性能提高和低功耗设计方面更具优越性,成为多核系统芯片结构的主流架构。映射就是应用某种算法寻找一种最优方案,将通信任务图的子任务分配到NoC的资源节点上,保证NoC的通信能耗最小。参照2D NoC的研究方法,提出了针对3D网格NoC的通信能耗模型,采用蚁群算法实现了面向通信能耗的NoC映射。实验结果表明,面向不同网络规模的3D网格NoC平台,蚁群映射同随机映射相比,通信能耗降低可以达23%~42%。  相似文献   

11.
In the era of many-core chips, the problem of power density is a serious challenge. This is particularly important in Network-on-Chip (NoC)-based systems, where application mapping determines the resulting power patterns and the workload distribution across the entire chip. Despite this fact, the majority of mapping algorithms focus on performance, and the resulting power patterns are largely ignored. This work investigates this problem. Three different power pattern metrics with different scopes are defined, namely, power peak, power range, and regional power density. The results of using them as mapping objectives together with communication cost using a multi-objective evolutionary mapping approach are investigated. Results show that employing power patterns results-in Pareto fronts with different power patterns and features. Results are analysed and discussed. Moreover, a case study of thermal analysis of the resulting power patterns is performed. Results show that using communication cost only results-in large hotspots which translates into higher peak and range of chip temperatures. The proposed mapping objectives are shown to significantly improve thermal balancing (up to 55%) and peak temperature (up to 7.77%). These results indicate the importance of considering power patterns in the design of NoC-based many-core systems and their direct impact on the reliability and performance of such systems.  相似文献   

12.
With the continued scaling of the CMOS devices, the exponential increase in power density has strikingly elevated the temperature of on-chip systems. Thus, thermal-aware design has become a pressing research issue in computing system, especially for real-time embedded systems with limited cooling techniques. In this paper, the authors formulate the thermal-aware real-time multiprocessor system-on-chip (MPSoC) task allocation and scheduling problem, present a task-to-processor assignment heuristics that improves the thermal profiles of tasks, and propose a task splitting policy that reduces the on-chip peak temperature. The thermal profiles of tasks are improved via task mapping by minimizing task steady state temperatures, and the task splitting technique is applied to reduce the peak temperature by enabling the alternation of hot task execution and slack time. The proposed algorithms explicitly exploits thermal characteristics of both tasks and processors to minimize the peak temperature without incurring significant overheads. Extensive simulations of benchmarking tasks were performed to validate the effectiveness of the proposed algorithms. Experimental results have shown that the task steady state temperature achieved by the proposed algorithm is 3.57 °C lower on average as compared to the benchmarking schemes, and the peak temperature of the proposed algorithm can be up to 11.5 % lower than that of the benchmarking schemes  相似文献   

13.
This paper describes several challenges facing programmers of future edge computing systems, the diverse many-core devices that will soon exemplify commodity mainstream systems. To call attention to programming challenges ahead, this paper focuses on the most complex of such architectures: integrated, power-conserving systems, inherently parallel and heterogeneous, with distributed address spaces. When programming such complex systems, new concerns arise: computation partitioning across functional units, data movement and synchronization, managing a diversity of programming models for different devices, and reusing existing legacy and library software. We observe that many of these challenges are also faced in programming applications for large-scale heterogeneous distributed computing environments, and current solutions as well as future research directions in distributed computing can be adapted to commodity computing environments. Optimization decisions are inherently complex due to large search spaces of possible solutions and the difficulty of predicting performance on increasingly complex architectures. Cognitive techniques are well suited for managing systems of such complexity, citing recent trends of using cognitive techniques for code mapping and optimization support. Combining these, we describe a fundamentally new programming paradigm for complex heterogeneous systems, where programmers design self-configuring applications and the system automates optimization decisions and manages the allocation of heterogeneous resources.  相似文献   

14.
复杂应用领域中的一些具体计算任务不仅需要计算平台具备高效的计算能力,而且也应具有与计算任务特点相匹配的计算模式。依据NVIDIA Kepler GK110架构中Hyper-Q特性与CUDA流的关系,提出单任务并行、多任务并行与多任务流式计算三种计算模式。采用空位标记的方法对计算模式进行构建与切换,结合数据缓冲机制和计算任务加载方式,设计了众核多计算模式处理系统,实现了众核处理机多模式计算的功能。  相似文献   

15.
Today’s many-core processors are manufactured in inherently unreliable technologies. Massively defective technologies used for production of many-core processors are the direct consequence of the feature size shrinkage in today’s CMOS (complementary metal-oxide-semiconductor) technology. Due to these reliability problems, fault-tolerance of many-core processors becomes one of the major challenges. To reduce the probability of failures of many-core processors various fault tolerance techniques can be applied. The most preferable and promising techniques are the ones that can be easily implemented and have minimal cost while providing high level of processor fault tolerance. One of the promising techniques for detection of faulty cores, and consequently, for performing the first step in providing many-core processor fault tolerance is mutual testing among processor cores. Mutual testing can be performed either in a random manner or according to a deterministic scheduling policy. In the paper we deal with random execution of mutual tests. Effectiveness of such testing can be evaluated through its modeling. In the paper, we have shown how Stochastic Petri Nets can be used for this purpose and have obtained some results that can be useful for developing and implementation of testing procedure in many-core processors.  相似文献   

16.
Considerable research effort is being devoted to the development of image-enhancement algorithms, which improve the quality of displayed digital pictures. Reliable methods for measuring perceived image quality are needed to evaluate the performances of those algorithms, and such measurements require a univariant (i.e., no-reference) approach. The system presented in this paper applies concepts derived from computational intelligence, and supports an objective quality-assessment method based on a circular back-propagation (CBP) neural model. The network is trained to predict quality ratings, as scored by human assessors, from numerical features that characterize images. As such, the method aims at reproducing perceived image quality, rather than defining a comprehensive model of the human visual system. The connectionist approach allows one to decouple the task of feature selection from the consequent mapping of features into an objective quality score. Experimental results on the perceptual effects of a family of contrast-enhancement algorithms confirm the method effectiveness, as the system renders quite accurately the image quality perceived by human assessors.  相似文献   

17.
This paper deals with the problem of one-to-one mapping of 2n task modules of a parallel program to an n-dimensional hypercube multicomputer so as to minimize the total communication cost during the execution of the task. The problem of finding an optimal mapping has been proven to be NP-complete. First we show that the mapping problem in a hypercube multicomputer can be transformed into the problem of finding a set of maximum cutsets on a given task graph using a graph modification technique. Then we propose a repeated mapping scheme, using an existing graph bipartitioning algorithm, for the effective mapping of task modules onto the processors of a hypercube multicomputer. The repeated mapping scheme is shown to be highly effective on a number of test task graphs; it increasingly outperforms the greedy and recursive mapping algorithms as the number of processors increases. Our repeated mapping scheme is shown to be very effective for regular graphs, such as hypercube-isomorphic or ‘almost’ isomorphic graphs and meshes; it finds optimal mappings on almost all the regular task graphs considered.  相似文献   

18.
Many discriminative classification algorithms are designed for tasks where samples can be represented by fixed-length vectors. However, many examples in the fields of text processing, computational biology and speech recognition are best represented as variable-length sequences of vectors. Although several dynamic kernels have been proposed for mapping sequences of discrete observations into fixed-dimensional feature-spaces, few kernels exist for sequences of continuous observations. This paper introduces continuous rational kernels, an extension of standard rational kernels, as a general framework for classifying sequences of continuous observations. In addition to allowing new task-dependent kernels to be defined, continuous rational kernels allow existing continuous dynamic kernels, such as Fisher and generative kernels, to be calculated using standard weighted finite-state transducer algorithms. Preliminary results on both a large vocabulary continuous speech recognition (LVCSR) task and the TIMIT database are presented.  相似文献   

19.
基于专用与通用DSP的实时图像处理系统   总被引:3,自引:0,他引:3  
国澄明  吴涛 《通信学报》1994,15(6):29-36
在研究机器人视学过程中,研制成功两种基于DSP的实时图像处理系统。以4片专用DSPIMSA110阵列为核心的系统,达到了840MOPS的处理速度,可实时完成各种大小模板的卷积,相关,滤波,边缘检测,增强及阈值化,线性与非线性变换等图像预处理和图像。两种系统均以PC机为宿主机,均配有独立的A/D,D/A系统。在设计中采用了创新的循环往复式流水线结构和DMA,双口RAM,多口共享存储区以及窗口映射等先  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号