首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 234 毫秒
1.
近年来,随着大数据、云计算技术的发展,应用系统越来越集中,规模亦越来越大,使得存 储系统的性能问题越来越突出。为应对其性能要求,并行文件系统得到了大量的应用。然而现有的并 行文件系统优化方法,大多只考虑应用系统或并行文件系统本身,较少考虑两者之间的协同。该文基 于应用系统在并行文件系统上的访问模式对存储系统的性能有显著影响这一特点,提出基于动态分区 的并行文件系统优化方法。首先,利用机器学习技术来分析挖掘各个性能影响因素和性能指标之间的 关系和规律,生成优化模型。其次,以优化模型为基础,辅助并行文件系统的参数调优工作。最后, 基于 Ceph 存储系统进行原型实现,并设计了三层架构应用系统进行了性能测试,最终达到优化并行 文件系统访问性能的目的。实验结果表明,所提出方法可以达到 85% 的预测优化准确率;在所提出模 型的辅助优化下,并行文件系统的吞吐量性能得到约 3.6 倍的提升。  相似文献   

2.
通常给定超参数的若干取值选取性能最大的为最优组合(称为直接选优法),但是此方法的稳健性差。为此,提出了一种基于稳健设计的超参数调优方法(称为稳健调优法)。具体地,以SGNS算法中的超参数调优为例,在词语推断任务上实验并得出:经方差分析得到SGNS算法中的七个超参数中的五个对算法预测性能有显著影响,确定为主控因子,其余两个确定为噪声因子,且主控因子中有三个对性能估计的方差有显著影响,因此,调优中仅从期望最大来直接选优是不合理的;稳健调优法与直接选优法两者在预测性能上没有显著差异,但稳健调优法对噪声因子具有较好的稳健性。稳健调优法对一般的深度神经网络的调参有实际的借鉴意义。  相似文献   

3.
任务导向对话系统的自然语言理解,其目的就是解析用户以自然语言形式输入的语句,并提取出可以被计算机所理解的结构化信息,其包含意图识别和槽填充两个子任务。BERT是近期提出来的一种自然语言处理预训练模型,已有研究者提出基于BERT的任务导向对话系统自然语言理解模型。在此基础上,该文提出一种改进的自然语言理解模型,其编码器使用BERT,而解码器基于LSTM与注意力机制构建。同时,该文提出了该模型的两种调优方法: 锁定模型参数的训练方法、使用区分大小写的预训练模型版本。在基线模型与改进模型上,这些调优方法均能够显著改进模型的性能。实验结果显示,利用改进后的模型与调优方法,可以分别在ATIS和Snips两个数据集上得到0.883 3和0.925 1的句子级准确率。  相似文献   

4.
分布式存储系统的I/O性能影响着分布式系统的执行效率。由于影响系统性能的潜在因素错综复杂,所以分布式存储系统的性能和建模技术一直是重点和难点。本文深入分布式存储系统建模技术研究现状和关键问题。为分布式存储系统的设计、调优和评估起到积极的指导作用。  相似文献   

5.
关系实体抽取旨在从非结构化文本中识别命名实体并抽取实体间语义关系,现有的两阶段关系实体抽取方法存在模型无法复用、调优参数量大等问题,不便于工程实现。利用提示调优对已有方法进行改进,提出两阶段模型复用的关系实体抽取方法REPT(a model-reused method of two-staged relations and entities extraction with prompt tuning)。首先微调预训练语言模型进行关系分类,而后利用提示调优并复用前一阶段微调的预训练语言模型抽取实体。实验结果表明,该方法在调优参数只占基线模型约50%的情况下,达到与SOTA模型相媲美的性能。  相似文献   

6.
SQL Server数据库的应用范围非常广泛,但性能常随着数据量的扩大与应用的深入而受到较大影响,数据库性能监控工具可分为系统级监控、数据库级监控和客户端应用程序监控3大类。探讨了各监控工具的主要功能与监控办法,利用监控工具提供的相关数据,进行综合分析,可有效地识别其性能瓶颈,同时提出了针对性的进行系统调优、设计调优与程序调优的方法。  相似文献   

7.
严家政  专祥涛   《智能系统学报》2022,17(2):341-347
传统PID控制算法在非线性时滞系统的应用中,存在参数整定及性能优化过程繁琐、控制效果不理想的问题。针对该问题,提出了一种基于强化学习的控制器参数自整定及优化算法。该算法引入系统动态性能指标计算奖励函数,通过学习周期性阶跃响应的经验数据,无需辨识被控对象模型的具体数据,即可实现控制器参数的在线自整定及优化。以水箱液位控制系统为实验对象,对不同类型的PID控制器使用该算法进行参数整定及优化的对比实验。实验结果表明,相比于传统的参数整定方法,所提出的算法能省去繁琐的人工调参过程,有效优化控制器参数,减少被控量的超调量,提升控制器动态响应性能。  相似文献   

8.
针对传统物体识别算法中只依赖于视觉特征进行识别的单一性缺陷,提出了一种结合先验关系的物体识别算法。在训练阶段,通过图模型结构化表示先验关系,分别构建了图像-图像、语义-语义两个子图以及两子图之间的联系,利用该图模型建立随机游走模型;在识别阶段,建立待识别图像与随机游走模型中的图像节点和语义节点的关系,在该概率模型上进行随机游走,将随机游走的结果作为物体识别的结果。实验结果证明了结合先验关系的物体识别算法的有效性;提出的物体识别算法具有较强的识别性能。  相似文献   

9.
存储IO的性能远远低于CPU、内存的性能,而且它们之间的差距还在扩大。因此,对于越来越多的数据密集型应用系统来说,存储系统往往是系统瓶颈。存储的性能不仅与存储子系统体系结构以及子系统中各部件的性能相关,还与系统的工作负载和应用环境相关。负载感知的性能调优指存储子系统通过对负载特征的分析实现对应用环境的动态感知,并根据负载特征动态调整系统运行策略。负载感知的性能调优使得存储子系统能够更合理地调度存储系统资源,从而提高IO性能。  相似文献   

10.
朱文俊  徐壮  秦家佳  李鹏 《计算机工程》2021,47(7):205-211,217
网络I/O在Redis存储过程中是限制存储性能的关键因素,而默认参数或人工参数配置会制约存储性能。针对参数配置不当导致存储吞吐性能下降及时延较高的问题,提出一种存储I/O优化方法GTS。考虑各阶段参数对存储性能的影响,在DPDK的优化原理基础上通过分析处理特性,采用分层模型策略实现对存储性能预测,从而寻找出最优参数调优方案。实验结果表明,与默认参数相比,GTS方法能够有效提升存储吞吐量,且在写密集下较ATH算法具有更低的时延。  相似文献   

11.
Designing a JEE (Java Enterprise Edition)-based enterprise application capable of achieving its performance objectives is rather hard. Predicting the performance of this type of systems at the design level is difficult and sometimes not viable, because this requires having precise knowledge of the expected load conditions and the underlying software infrastructure. Besides, the requirement for rapid time-to-market leads to postpone performance tuning until systems are developed, packaged and running. In this paper we present a novel approach for automatically detecting performance problems in JEE-based applications and, in turn, suggesting courses of actions to correct them. The idea is to allow developers to smoothly identify and eradicate performance anti-patterns by automatically analyzing execution traces. The approach has been implemented as a tool called JEETuningExpert, and validated using three well-known JEE reference applications. Specifically, we evaluated the effectiveness of JEETuningExpert for detecting performance problems, measured the overhead imposed by online monitoring each application and the improvements were achieved after following the suggested corrective actions. These results empirically showed that the refactored applications are 40.08%, 76.94% and 61.13% faster, on average.  相似文献   

12.
Scheduling large-scale applications in heterogeneous distributed computing systems is a fundamental NP-complete problem that is critical to obtaining good performance and execution cost. In this paper, we address the scheduling problem of an important class of large-scale Grid applications inspired by the real world, characterized by a huge number of homogeneous, concurrent, and computationally intensive tasks that are the main sources of performance, cost, and storage bottlenecks. We propose a new formulation of this problem based on a cooperative distributed game-theory-based method applied using three algorithms with low time complexity for optimizing three important metrics in scientific computing: execution time, economic cost, and storage requirements. We present comprehensive experiments using simulation and real-world applications that demonstrate the effectiveness of our approach in terms of time and fairness compared to other related algorithms.  相似文献   

13.
Understanding and tuning the performance of large-scale long-running applications is difficult, with both standard trace-based and statistical methods having substantial shortcomings that limit their usefulness. This paper describes a new performance monitoring approach called Embedded Gossip (EG) designed to enable lightweight online performance monitoring and tuning. EG works by piggybacking performance information on existing messages and performing information correlation online, giving each process in a parallel application a weakly consistent global view of the behavior of the entire application. To demonstrate the viability of EG, this paper presents the design and experimental evaluation of two different online monitoring systems and an online global adaptation system driven by Embedded Gossiping. In addition, we present a metric system for evaluating the suitability of an application to EG-based monitoring and adaptation, a general architecture for implementing EG-based monitoring systems, and a modified global commit algorithm appropriate for use in EG-based global adaptation systems. Together, these results demonstrate that EG is an efficient low-overhead approach for addressing a wide range of parallel performance monitoring tasks and that results from these systems can effectively drive online global adaptation.  相似文献   

14.
Two tuning techniques are proposed to design decentralized PID controllers for weakly coupled and general MIMO systems, respectively. Each SISO loop is designed separately, and the controller parameters are obtained as a solution of a linear programming optimization problem with constraints on the process stability margins. Despite the SISO approach, loop interactions are accounted for either by Gershgorin bands (non-iterative method) or an equivalent open-loop process (iterative method). The tuning results and performance from both methods are illustrated in four simulations of linear processes, and a laboratory-scale application in a Peltier process. Four applications contemplate closed-loop performance comparisons between the proposed techniques and techniques from the literature. One application illustrates the feasibility of the proposed iterative method, based on EOPs, in tuning decentralized PIDs for a 5 × 5 system. Moreover, an analysis of the effect of model uncertainty in the phase and gain margins of the closed-loop process is performed.  相似文献   

15.
Hardware monitoring through performance counters is available on almost all modern processors. Although these counters are originally designed for performance tuning, they have also been used for evaluating power consumption. We propose two approaches for modelling and understanding the behaviour of high performance computing (HPC) systems relying on hardware monitoring counters. We evaluate the effectiveness of our system modelling approach considering both optimizing the energy usage of HPC systems and predicting HPC applications’ energy consumption as target objectives. Although hardware monitoring counters are used for modelling the system, other methods–including partial phase recognition and cross platform energy prediction–are used for energy optimization and prediction. Experimental results for energy prediction demonstrate that we can accurately predict the peak energy consumption of an application on a target platform; whereas, results for energy optimization indicate that with no a priori knowledge of workloads sharing the platform we can save up to 24% of the overall HPC system’s energy consumption under benchmarks and real-life workloads.  相似文献   

16.
Parallel/distributed systems are continuously growing. This allows and enables the scalability of the applications, either by considering bigger problems in the same period of time or by solving the problem in a shorter time. In consequence, the methodologies, approaches and tools related to parallel paradigm should be brought up to date to support the increasing requirements of the applications and the users. MATE (Monitoring, Analysis and Tuning Environment) provides automatic and dynamic tuning for parallel/distributed applications. The tuning decisions are made according to performance models, which provide a fast means to decide what to improve in the execution. However, MATE presents some bottlenecks as the application grows, due to the fact that the analysis process is made in a full centralized manner. In this work, we propose a new approach to make MATE scalable. In addition, we present the experimental results and the analysis to validate the proposed approach against the original one.  相似文献   

17.
Network-on-Chip (NoC) interconnect fabrics are categorized according to trade-offs among latency, throughput, speed, and silicon area, and the correctness and performance of these fabrics in Field-Programmable Gate Array (FPGA) applications are assessed through experimentation and simulation. In this paper, we propose a consistent parametric method for evaluating the FPGA performance of three common on-chip interconnect architectures namely, the Mesh, Torus and Fat-tree architectures. We also investigate how NoC architectures are affected by interconnect and routing parameters, and demonstrate their flexibility and performance through FPGA synthesis and testing of 392 different NoC configurations. In this process, we found that the Flit Data Width (FDW) and Flit Buffer Depth (FBD) parameters have the heaviest impact on FPGA resources, and that these parameters, along with the number of Virtual Channels (VCs), significantly affect reassembly buffering and routing and logic requirements at NoC endpoints. Applying our evaluation technique to a detailed and flexible cycle accurate simulation, we drive the three NoC architectures using benign (Nearest Neighbor and Uniform) and adversarial (Tornado and Random Permutation) traffic patterns with different numbers of VCs, producing a set of load–delay curves. The results show that by strategically tuning the router and interconnect parameters, the Fat-tree network produces the best utilization of FPGA resources in terms of silicon area, clock frequency, critical path delays, network cost, saturation throughput, and latency, whereas the Mesh and Torus networks showed comparatively high resource costs and poor performance under adversarial traffic patterns. From our findings it is clear that the Fat-tree network proved to be more efficient in terms of FPGA resource utilization and is compliant with the current Xilinx FPGA devices. This approach will assist engineers and architects in establishing an early decision in the choice of right interconnects and router parameters for large and complex NoCs. We demonstrate that our approach substantially improves performance under a large variety of experimentation and simulation which confirm its suitability for real systems.  相似文献   

18.
Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes.The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code—MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks.  相似文献   

19.
This paper discusses an approach to modelling storage organizations of future computer systems. Our motivation comes from two general uncharted areas of computer performance analysis. The first is that of dynamic or time-dependent effects: certain parameters critically affect the behaviour of the system, and as they vary over time it becomes important to understand the transient behaviour and the stabilities of the system. The second area concerns the interaction between distinct intelligent components of the system, particularly in a stressed environment. Our approach, which is different from the more usual queuing approach, is that of Dynamics Systems Theory. We describe it with a particular, though very simple, example that focuses on the relationship between a CPU and an I/O Processor. By exploring this example, we are able to introduce a number of important concepts, to derive qualitative insights, and to identify barometers of performance and control mechanisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号