首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice.  相似文献   

2.
ContextSoftware clustering is a key technique that is used in reverse engineering to recover a high-level abstraction of the software in the case of limited resources. Very limited research has explicitly discussed the problem of finding the optimum set of clusters in the design and how to penalize for the formation of singleton clusters during clustering.ObjectiveThis paper attempts to enhance the existing agglomerative clustering algorithms by introducing a complementary mechanism. To solve the architecture recovery problem, the proposed approach focuses on minimizing redundant effort and penalizing for the formation of singleton clusters during clustering while maintaining the integrity of the results.MethodAn automated solution for cutting a dendrogram that is based on least-squares regression is presented in order to find the best cut level. A dendrogram is a tree diagram that shows the taxonomic relationships of clusters of software entities. Moreover, a factor to penalize clusters that will form singletons is introduced in this paper. Simulations were performed on two open-source projects. The proposed approach was compared against the exhaustive and highest gap dendrogram cutting methods, as well as two well-known cluster validity indices, namely, Dunn’s index and the Davies-Bouldin index.ResultsWhen comparing our clustering results against the original package diagram, our approach achieved an average accuracy rate of 90.07% from two simulations after the utility classes were removed. The utility classes in the source code affect the accuracy of the software clustering, owing to its omnipresent behavior. The proposed approach also successfully penalized the formation of singleton clusters during clustering.ConclusionThe evaluation indicates that the proposed approach can enhance the quality of the clustering results by guiding software maintainers through the cutting point selection process. The proposed approach can be used as a complementary mechanism to improve the effectiveness of existing clustering algorithms.  相似文献   

3.
A hybrid approach and methods for representing the VLSI floorplanning problem in the form of evolutionary processes based on the integration of adaptive behavior models of biological systems and on composite architectures of solution algorithms are described. This makes it possible to deal with large-scale problems and obtain high-quality results in reasonable time. The floorplan synthesis includes two phases. In the first phase, the cut tree is produced using the genetic techniques; in the second phase, the floorplan is formed using the convolution by the methods of collective adaptation of the tree cut. Variants of circuits with the variable module orientation of fixed or stochastic size are considered. The probability of obtaining an optimal solution is 0.9, and the average deviation of the solutions from the optimal ones is 1%.  相似文献   

4.
Effective data management is an important issue in a large-scale distributed environment such as distributed DBMS, Peer-to-Peer System (P2P), data grid, and World Wide Web (WWW). This can be achieved by using a replication protocol, which efficiently decrease the communication cost and increase the data availability. The Tree Quorum protocol is one of the representative replication protocols allowing low read cost in the best case but it has some drawbacks such as that the number of replicas grows rapidly as the level increases and root replica is a bottleneck. The Grid protocol requires fixed operation cost regardless of failure condition. In this paper we propose a new replication protocol called Dynamic Hybrid protocol, which efficiently improves the existing protocols. The proposed protocol effectively combines the grid and tree structure so that the overall topology can be flexibly adjusted using three configuration parameters; tree height, number of descendants and grid depth. For high read availability, the height of tree and number of descendants are decreased and depth of grid is increased. For high write availability, the height of tree and the depth of grid are decreased, while the number of descendant is increased. We present an analytical model of read/write availability and the average number of nodes accessed for each operation. We also employ computer simulation to estimate the throughput and communication overhead. The proposed protocol always allows much smaller communication and operation cost than earlier protocols.  相似文献   

5.
A heuristic is presented for the two-dimensional arbitrary stock-size cutting stock problem, where a set of rectangular items with specified demand are cut from plates of arbitrary sizes that confirm to the supplier’s provisions, such that the plate cost is minimized. The supplier’s provisions include: the lengths and widths of the plates must be in the specified ranges; the total area of the plates with the same size must reach the area threshold. The proposed algorithm uses a pattern-generation procedure with all-capacity property to obtain the patterns, and combines it with a sequential heuristic procedure to obtain the cutting plan, from which the purchasing decision can be made. Practical and random instances are used to compare the algorithm with a published approach. The results indicate that the trim loss can be reduced by more than half if the algorithm is used in the purchasing decision of the plates.  相似文献   

6.
移动手持设备因其屏幕小,有限的计算及存储能力而不便浏览普通Web页面;另一方面,对于PDA、手机用户,本着用户个性定制以及降低费用的原则也有必要对现有Web页面进行“裁减”。就以上问题,提出一种面向移动设备网页切割的解决方案:首先对半结构化的HTML文档进行结构化处理,接着基于DOM规范将HTML转化为DOM树并对其噪音清洗,然后对网页进行基于内容和基于链接的分块并对分块结果按照分层和用户定制的思想进行切割、重构,最后在开源项目HTMLParser基础上开发了原型系统并对系统执行效率和切割效果进行了评估。结果表明该方案切实可行,具有可观的应用价值。  相似文献   

7.
在分析现有程序代码抄袭检测系统的特点及局限性的基础上,提出一种综合文本分析、结构度量和属性计数技术的混合式程序抄袭检测方法。应用文档指纹技术和Winnowing算法计算程序的文本相似度;将程序代码表示成动态控制结构树(Dynamic Control Structure tree,DCS),运用Winnowing算法计算DCS树相似度,从而得到结构相似度;收集并统计程序中的每个变量信息,应用变量相似度算法分析变量信息节点获取变量相似度;分别赋予文本相似度、结构相似度和变量相似度一个权值,计算得到总体的代码相似度。实验结果表明,所提出的方法能够有效检测出各种抄袭行为。针对不同的抄袭门槛值,使用该方法的检测结果准确度和查全率高于JPLAG系统。特别对于结构简单的程序组,此方法和JPLAG系统检测结果的平均准确度分别为82.5%和69.5%,说明所提的方法更加有效。  相似文献   

8.
斯琴高娃 《软件》2012,(7):14-17
我们经常需要将大文件切割成若干小文件来携带或传输,然后在另一台计算机上再度合并。本文介绍了文件切割与合并的原理,以及自动产生.exe合并程序和自动删除合并程序等技巧,并提供了关键部分的C≠源代码。  相似文献   

9.
文本聚类是聚类的一个重要研究分支,在文本处理领域中有着广泛的应用。在描述聚类特征树与动态索引树的文本聚类方法后,将原动态索引树文本聚类方法中的合并阀值由单一线性依赖关系修改为依赖于聚类节点半径值。实验证明,改进后的算法在聚类结果精确率与聚类时间上都有明显提高。  相似文献   

10.
在软件同源性检测方法中,基于抽象语法树的比对方法能够有效地检测出基于代码全文拷贝、修改变量名、调整代码顺序等的抄袭手段,被广泛用于抄袭检测工具中。但基于抽象语法树的比对方法对于修改变量类型和添加无意义变量的抄袭手段束手无策。针对这种情况,提出了一种基于抽象语法树的改进思想,该思想通过剪去语法树中影响判断的叶子节点的手段来还原检测原文抄袭,能够达到有效检测修改变量类型和添加无意义变量等抄袭的目的。  相似文献   

11.
刀尖圆弧半径对加工精度、切削力等切削参数有重要影响,而主偏角直接影响切 削变形和切削力的变化。为了研究车刀刀尖圆弧半径对主偏角的影响,建立了刀具要素间的几 何关系。根据切削深度和刀尖圆弧半径大小,将切削条件划分为 4 种:①刀尖圆弧半径小于切 削深度,且主偏角为 90°;②刀尖圆弧半径小于切削深度,且主偏角小于 90°;③刀尖圆弧半径 小于切削深度,且主偏角大于 90°;④刀尖圆弧半径大于切削深度。根据刀尖圆弧半径和切削 深度之间的几何关系,分别计算了 4 种切削条件下刀尖圆弧半径导致的实际主偏角的变化。为 了验证分析结果,进行了切削实验,通过分析背向力和进给力的夹角计算实验主偏角。实验结 果证明,刀尖圆弧半径导致主偏角变小。  相似文献   

12.
This paper proposes a sliding window approach, whose length and time shift are dynamically adaptable in order to improve model confidence, speed and segmentation accuracy in human action sequences. Activity recognition is the process of inferring an action class from a set of observations acquired by sensors. We address the temporal segmentation problem of body part trajectories in Cartesian Space in which features are generated using Discrete Fast Fourier Transform (DFFT) and Power Spectrum (PS). We pose this as an entropy minimization problem. Using entropy from the classifier output as a feedback parameter, we continuously adjust the two key parameters in a sliding window approach, to maximize the model confidence at every step. The proposed classifier is a Dynamic Bayesian Network (DBN) model where classes are estimated using Bayesian inference. We compare our approach with our previously developed fixed window method. Experiments show that our method accurately recognizes and segments activities, with improved model confidence and faster convergence times, exhibiting anticipatory capabilities. Our work demonstrates that entropy feedback mitigates variability problems, and our method is applicable in research areas where action segmentation and classification is used. A working demo source code is provided online for academical dissemination purposes, by requesting the authors.  相似文献   

13.
《Software, IEEE》1992,9(1):83-85
The author examines a concept called source-code escrow-having a third party hold the source code in trust until and if certain conditions are fulfilled-which has recently been touted as a good compromise to the problem that software vendors often refuse to provide source code to their users, who typically want source code for security. The author discusses the issues involved in deciding when and under what circumstances customers get a vendor's source code. It is concluded that escrow can protect both the user's and vendor's concerns about source-code access as long as it is handled carefully  相似文献   

14.
数据包分类技术广泛应用于许多网络服务当中, HiCuts算法是多维包分类中最具有代表性的数据包分类算法。但由于规则集分布不均匀,通过简单地随机等分某个域很难将规则划分到不同的节点去,从而导致决策树树深度急剧增加,使算法查找的时间效率和空间效率大大降低。通过大量统计分析发现,规则集中的规则域并非均匀分布在其取值范围内,为此,在HiCuts算法的基础上提出了一种利用非均匀切割技术的N-HiCuts算法来构建决策树。算法对于分布不均匀的域依据统计规则进行非均匀切割,对规则集中分布均匀的某些域采用等分函数来进行切割,从而提高每次对规则集进行切割的效率。实验证明,该算法的整体性能得到较大的提高。  相似文献   

15.
In this paper we introduce four scenario Cluster based Lagrangian Decomposition procedures for obtaining strong lower bounds to the (optimal) solution value of two-stage stochastic mixed 0–1 problems. At each iteration of the Lagrangian based procedures, the traditional aim consists of obtaining the solution value of the corresponding Lagrangian dual via solving scenario submodels once the nonanticipativity constraints have been dualized. Instead of considering a splitting variable representation over the set of scenarios, we propose to decompose the model into a set of scenario clusters. We compare the computational performance of the four Lagrange multiplier updating procedures, namely the Subgradient Method, the Volume Algorithm, the Progressive Hedging Algorithm and the Dynamic Constrained Cutting Plane scheme for different numbers of scenario clusters and different dimensions of the original problem. Our computational experience shows that the Cluster based Lagrangian Decomposition bound and its computational effort depend on the number of scenario clusters to consider. In any case, our results show that the Cluster based Lagrangian Decomposition procedures outperform the traditional Lagrangian Decomposition scheme for single scenarios both in the quality of the bounds and computational effort. All the procedures have been implemented in a C++ experimental code. A broad computational experience is reported on a test of randomly generated instances by using the MIP solvers COIN-OR (2010, [18]) and CPLEX (2009, [17]) for the auxiliary mixed 0–1 cluster submodels, this last solver within the open source engine COIN-OR. We also give computational evidence of the model tightening effect that the preprocessing techniques, cut generation and appending and parallel computing tools have in stochastic integer optimization. Finally, we have observed that the plain use of both solvers does not provide the optimal solution of the instances included in the testbed with which we have experimented but for two toy instances in affordable elapsed time. On the other hand the proposed procedures provide strong lower bounds (or the same solution value) in a considerably shorter elapsed time for the quasi-optimal solution obtained by other means for the original stochastic problem.  相似文献   

16.
关系信息是体现代码结构和语义的最重要的一类信息,如继承、聚合、组合、依赖、调用和创建实例等。为了更好地支持开源代码的理解与复用,提出了一种基于UML2关系的代码库构造方法。它以图数据库为实现平台,采用语言工程中经典的抽象语法树作为基础,并针对Java语言的特性和机制,设计富语义的Java代码属性图数据模型,在此基础上使得Java代码的图结构持久化。同时,为了屏蔽各种编程语言社区对代码中关系信息理解的差异性,采用UML2.4国际标准版本中定义的关系类型及语义解释,设计相应的代码关系抽取算法,为图节点添加对应的关系边。针对代码图化后的膨胀及代码库的空间存储消耗情况,选取9个常见的开源项目进行了实验评估。最后,给出了基于此代码库的查询应用实例。  相似文献   

17.
在特定的密钥更新通信开销的情况下,研究了基于混合单向函数树的高效安全组播密钥管理方案问题。混合单向函数树方案将含N个成员的组划分为若干个含M个成员的簇,并将每个簇安置在密钥管理树的叶子结点上。根据簇大小,将组控制器的密钥最小存储开销表达为约束优化问题,再将约束优化问题转化为一个关于簇大小M的不动点方程,当密钥更新通信开销约束为O(logN)时,证明不动点方程的最大根为簇大小的最优值,它使得混合树的最小密钥存储开销为O(N/logN)。同时设计了一种构造具有最小存储开销的混合单向函数树的算法。  相似文献   

18.
通过对伐区设计资料,及实际生产码单数据进行学习,确定以平均胸径、平均树高、保留密度、蓄积量为输入神经元,分析了影响BP网络学习效率和预测精度的影响因素,主要从隐含层神经元数量、训练数、隐含层激励函数、学习样本数量几个方面对材种出材率预测BP网络模型进行了优化,确定了林分经验材种出材率预测人工神经网络模型。为林分经验材种出材率表的编制提供一种新的思路与方法。  相似文献   

19.
Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Manual categorization is expensive, tedious, and laborious – this is why automatic categorization approaches are gaining widespread importance. Unfortunately, for different legal and organizational reasons, the applications’ source code is often not available, thus making it difficult to automatically categorize these applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries for automatic categorization of software applications that use these API calls. Our approach is general since it enables different categorization algorithms to be applied to repositories that contain both source code and bytecode of applications, since API calls can be extracted from both the source code and byte-code. We compare our approach to a state-of-the-art approach that uses machine learning algorithms for software categorization, and conduct experiments on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source repository with 745 applications, where the source code was not available. Our contribution is twofold: we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, and furthermore we carried out a comprehensive empirical evaluation of automatic categorization approaches.  相似文献   

20.
王冬  赵同林 《数字社区&智能家居》2009,(11):8829-8830,8833
通过对伐区设计资料,及实际生产码单数据进行学习,确定以平均胸径、平均树高、保留密度、蓄积量为输入神经元,分析了影响BP网络学习效率和预测精度的影响因素,主要从隐含层神经元数量、训练数、隐舍层激励函数、学习样本数量几个方面对材种出材率预测BP网络模型进行了优化,确定了林分经验材种出材率预测人工神经网络模型。为林分经验材种出材率表的编制提供一种新的思路与方法  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号