Design of the Kernel Language for the Parallel Inference Machine   总被引:1,自引:0,他引:1  
We have witnessed the tremendous momentum of the second spring of parallel computing in recent years. But, we should remember the low points of the field more than 20 years ago and review the lesson that has led to the question at that point whether “parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it” in an article entitled “the death of parallel computing” written by the late Ken Kennedy — a prominent leader of parallel computing in the world. Facing the new era of parallel computing, we should learn from the robust history of sequential computation in the past 60 years. We should study the foundation established by the model of Turing machine (1936) and its profound impact in this history. To this end, this paper examines the disappointing state of the work in parallel Turing machine models in the past 50 years of parallel computing research. Lacking a solid yet intuitive parallel Turing machine model will continue to be a serious challenge in the future parallel computing. Our paper presents an attempt to address this challenge by presenting a proposal of a parallel Turing machine model. We also discuss why we start our work in this paper from a parallel Turing machine model instead of other choices.  相似文献   

篇章是论元经过语义关联和结构化组织形成的自然语言文体.篇章分析研究的核心任务之一是解释论元的语义关系,其中,显式关系因具有直观线索而易于检测,目前检测精度高达90%;相对而言,隐式关系因缺乏直观线索而难于检测,目前精度仅约40%.针对这一问题,基于一种"论元平行则关系平行"的假设,并利用显式篇章关系易于检测的特点,通过平行论元的识别与平行关系的消歧,实现了一种显式关系平行推理隐式关系的隐式篇章关系检测方法.利用标准宾州篇章关系树库(Penn discourse Tree Bank,简称PDTB)对这一检测方法进行评测,结果显示,精确率提升达17.26%.  相似文献   

Anaphora is a discourse-level linguistic phenomenon.There is consensus that anaphora resolution shouldrely on prior sentences within the context of thediscourse. We propose to cast anaphora resolution asa semantic inference process in which a combination ofmultiple strategies, each exploiting different aspectsof linguistic knowledge, is employed to provide acoherent resolution of anaphora. A framework whichencompasses several salient linguistic parameters suchas grammatical role, proximity, repetition, sentencerecency and semantic cues is demonstrated. This workalso shows how an anaphora-resolution algorithm can beembedded within a framework which captures all theabove salient parameters, as well as remedies some ofthe inadequacies found in any monolithic resolutionsystem. A language-neutral semantic representationcharacterized by semantic cues is presented in orderto capture the distilled information after resolution.The effectiveness of the language-neutralrepresentation, both for machine translation andanaphora resolution, is demonstrated through a set ofsimulations and evaluations.  相似文献   

经典计算机的理论边界在1936年就由图灵确定了,冯·诺依曼体系结构计算机也受限于图灵机模型.囿于神经形态器件的缺失,神经网络模型一直在经典计算机上运行.然而,冯·诺依曼体系结构与神经网络的异步并行结构及通信机制并不匹配,表现之一是功耗巨大,发展面向神经网络的体系结构,对于人工智能乃至一般意义上的信息处理都是重要方向.类脑机是仿照生物神经网络、采用神经形态器件构造的、以时空信息处理为特征的智能机器.类脑机的思想在计算机发明之前就提出了,研究开发实践也已经进行了30多年,多台类脑系统已经上线运行,其中SpiNNaker专注于类脑系统的体系结构研究,提出了一种行之有效的类脑方案.未来20年左右,预计模式动物大脑和人脑的精细解析将逐步完成,模拟生物神经元和神经突触信息处理功能的神经形态器件及集成工艺将逐步成熟,结构逼近大脑、性能远超大脑的类脑机有望实现.类脑机像生物大脑一样都是脉冲神经网络,神经形态器件具有真正的随机性,因此类脑机具备丰富的非线性动力学行为.已证明任何图灵机均可由脉冲神经网络构造出来,类脑机在理论上是否能够超越图灵机,是需要突破的一个重大问题.  相似文献   

介绍了一种基于并行虚拟机结构的体绘制算法.该算法以切片为单位来划分和组织体数据,既 降低了通信代价,也保证了各子任务的数据局部性.在任务分配时,维护并使用性能指数数据 库,自适应式地确定各个子任务,实现了负载平衡.使用一种异步二分方法,所有局部图像可以 在O(logn)时间内完成合并.针对可视化算法在虚拟机环境中的并行化实现,自行设 计并实现了一个基于TCP/IP和Socket标准开发平台.所提出的算法利用该平台而实现,系统采 用客户/服务器结构.对系统在任务规模、虚拟机规模方  相似文献   

Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and parallelism is the key to continued performance scaling in modern microprocessors. In this paper, the achievements in our research project, which is supported by the National Basic Research 973 Program of China, on parallel architecture, are systematically presented. The innovative approaches and techniques to solve the significant problems in parallel architecture design are summarized, including architecture level optimization, compiler and language-supported technologies, reliability, power-performance efficient design, test and verification challenges, and platform building. Two prototype chips, a multi-heavy-core Godson-3 and a many-light-core Godson-T, are described to demonstrate the highly scalable and reconfigurable parallel architecture designs. We also present some of our achievements appearing in ISCA, MICRO, ISSCC, HPCA, PLDI, PACT, IJCAI, Hot Chips, DATE, IEEE Trans. VLSI, IEEE Micro, IEEE Trans. Computers, etc.  相似文献   

The method described in this paper enables the two end points of a straight line to be obtained by a Modified Double Hough Transform (MDHT). It consists respectively of line detection, followed by segment extraction. The significance of this work is that the hardware implementation is based on the Content Addressable Memory (CAM) concept. Hence, during the first HT, voting is achieved for the every scan line of image, not every edge pixel. Therefore, all the steps which form the first HT: voting, thresholding and local maximum are achieved in a low constant time. The two end points of the line are extracted through the second HT. Here, a local neighbor parallel search is also achieved at the end of each scan line of the image not at every edge pixel. Therefore, the execution time is low since the neighboring range does not exceed a few lines. Experimental results are given to show the accuracy of our approach for use in high performance pattern recognition systems.  相似文献   

随着计算机硬件性能的提高,目前在个人终端上也开始出现使用预训练机器学习模型进行推理的运用.Caffe是一款流行的深度学习框架,擅长图像分类等任务,但是在默认状态下只能单核运行,无法充分发挥异构并行计算设备的计算能力.深度学习对于计算性能的要求较高,如果能并行化以充分使用所有计算设备,就能提升计算速度和使用体验.由于CP...  相似文献   

1.引言 在人工智能领域中,产生式系统是一种比较有效的知识表示方法,并得到了广泛应用。目前许多较为成功的专家系统都是用产生式系统实现的~[1]。随着应用领域的不断扩士其知识座的抓橄称渐增  相似文献   

Minds and Machines - Two long-standing arguments in cognitive science invoke the assumption that holistic inference is computationally infeasible. The first is Fodor’s skeptical argument...  相似文献   

分析了目前流行的并行视频服务器体系结构:分布式结构、集群式结构、并行通用计算机结构和并行专用视频服务器结构。综合其优点,针对视频应用的特点,提出了可扩展并行视频服务器体系结构,并研制了基于该结构的并行服务系统。  相似文献   

The capability for understanding data passes through the ability of producing an effective and fast classification of the information in a time frame that allows to keep and preserve the value of the information itself and its potential. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. A powerful tool is provided by self-organizing maps (SOM). The goal of learning in the self-organizing map is to cause different parts of the network to respond similarly to certain input patterns. Because of its time complexity, often using this method is a critical challenge. In this paper we propose a parallel implementation for the SOM algorithm, using parallel processor architecture, as modern graphics processing units by CUDA. Experimental results show improvements in terms of execution time, with a promising speed up, compared to the CPU version and the widely used package SOM_PAK.  相似文献   

近年来,深度神经网络被广泛应用于各个领域并取得了极大的成功.由于神经网络模型的尺寸和计算量的不断增加,为了能够高效迅速地完成神经网络的计算,包括GPU和专用加速器在内的很多新型硬件处理器被用于深度学习的计算.尽管如此,通用处理器作为目前最为常见和易于获得的计算平台,探究如何高效地在其上运行神经网络算法同样具有重要意义.多核处理器在训练阶段可以采用数据并行的方式来提高数据吞吐量,加快训练速度.然而在推理阶段,相比吞吐量场景,端到端的时延往往更加重要,因为这决定了处理器在某个场景下的可用性.传统的数据并行方案不能满足推理场景下对处理器小数据、低延迟的要求.因此,对于多核的处理器结构,需要在算子内部对计算进行拆分,才能够充分利用多核结构的硬件资源.考虑到处理器的计算特点,需要一种精细的方法来对计算图中的算子进行合理的拆分,才能真正有效地发挥出多核处理器的计算潜能.提出一种基于算子拆分的并行框架,可以用较小的开销实现处理器由单核向多核结构上的扩展,并且能够针对给定的网络和底层处理器特点给出一种高效的拆分方案.实验结果表明:该方法能有效降低各种网络在多核处理器上的端到端时延.  相似文献   

基于逆向思维提出了一种满足工作空间要求的五轴并联机床的尺度综合方法.首先用极坐标来描述并联机床的姿态空间;然后基于工作空间的要求得到运动平台上铰链点与固定平台上铰链点的距离极值表达式;最后考虑到杆件的力传递性能,得到一组性能较优的参数.该方法对类似的并联机构的尺度综合具有较高的参考价值.  相似文献   

基于螺旋理论,将并联机床全部分支中可作为驱动输入的运动副锁定后,可以得到作用在动平台上的 约束螺旋.选出与机构的自由度数目相同的约束螺旋与机构本身固有的结构约束螺旋组成约束矩阵,根据约束矩阵 的秩确定输入的合理性.将约束力/力矩矩阵的最小奇异值与最大奇异值之比定义为约束力/力矩各向同性度,并 以此为指标对各驱动输入组合的优劣性进行评价.驱动输入选择结果表明,该并联机床存在5 种合理输入,其中以 UPS 分支中的5 个P 副为驱动输入是最优组合.  相似文献   

