期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Studying software logging using topic models

Heng?Li Tse-Hsun??Chen Weiyi?Shang Ahmed?E.?Hassan 《Empirical Software Engineering》2018,23(5):2655-2694

Software developers insert logging statements in their source code to record important runtime information; such logged information is valuable for understanding system usage in production and debugging system failures. However, providing proper logging statements remains a manual and challenging task. Missing an important logging statement may increase the difficulty of debugging a system failure, while too much logging can increase system overhead and mask the truly important information. Intuitively, the actual functionality of a software component is one of the major drivers behind logging decisions. For instance, a method maintaining network communications is more likely to be logged than getters and setters. In this paper, we used automatically-computed topics of a code snippet to approximate the functionality of a code snippet. We studied the relationship between the topics of a code snippet and the likelihood of a code snippet being logged (i.e., to contain a logging statement). Our driving intuition is that certain topics in the source code are more likely to be logged than others. To validate our intuition, we conducted a case study on six open source systems, and we found that i) there exists a small number of “log-intensive” topics that are more likely to be logged than other topics; ii) each pair of the studied systems share 12% to 62% common topics, and the likelihood of logging such common topics has a statistically significant correlation of 0.35 to 0.62 among all the studied systems; and iii) our topic-based metrics help explain the likelihood of a code snippet being logged, providing an improvement of 3% to 13% on AUC and 6% to 16% on balanced accuracy over a set of baseline metrics that capture the structural information of a code snippet. Our findings highlight that topics contain valuable information that can help guide and drive developers’ logging decisions. 相似文献

2.

Studying software evolution using artefacts’ shared information content

Tom Arbuckle 《Science of Computer Programming》2011,76(12):1078-1097

In order to study software evolution, it is necessary to measure artefacts representative of project releases. If we consider the process of software evolution to be copying with subsequent modification, then, by analogy, placing emphasis on what remains the same between releases will lead to focusing on similarity between artefacts. At the same time, software artefacts-stored digitally as binary strings-are all information. This paper introduces a new method for measuring software evolution in terms of artefacts’ shared information content. A similarity value representing the quantity of information shared between artefact pairs is produced using a calculation based on Kolmogorov complexity. Similarity values for releases are then collated over the software’s evolution to form a map quantifying change through lack of similarity. The method has general applicability: it can disregard otherwise salient software features such as programming paradigm, language or application domain because it considers software artefacts purely in terms of the mathematically justified concept of information content. Three open-source projects are analysed to show the method’s utility. Preliminary experiments on udev and git verify the measurement of the projects’ evolutions. An experiment on ArgoUML validates the measured evolution against experimental data from other studies. 相似文献

3.

Tracking the evolution of social emotions with topic models

Chen Zhu Hengshu Zhu Yong Ge Enhong Chen Qi Liu Tong Xu Hui Xiong 《Knowledge and Information Systems》2016,47(3):517-544

相似文献

4.

Static test case prioritization using topic models

Stephen W. Thomas Hadi Hemmati Ahmed E. Hassan Dorothea Blostein 《Empirical Software Engineering》2014,19(1):182-212

Software development teams use test suites to test changes to their source code. In many situations, the test suites are so large that executing every test for every source code change is infeasible, due to time and resource constraints. Development teams need to prioritize their test suite so that as many distinct faults as possible are detected early in the execution of the test suite. We consider the problem of static black-box test case prioritization (TCP), where test suites are prioritized without the availability of the source code of the system under test (SUT). We propose a new static black-box TCP technique that represents test cases using a previously unused data source in the test suite: the linguistic data of the test cases, i.e., their identifier names, comments, and string literals. Our technique applies a text analysis algorithm called topic modeling to the linguistic data to approximate the functionality of each test case, allowing our technique to give high priority to test cases that test different functionalities of the SUT. We compare our proposed technique with existing static black-box TCP techniques in a case study of multiple real-world open source systems: several versions of Apache Ant and Apache Derby. We find that our static black-box TCP technique outperforms existing static black-box TCP techniques, and has comparable or better performance than two existing execution-based TCP techniques. Static black-box TCP methods are widely applicable because the only input they require is the source code of the test cases themselves. This contrasts with other TCP techniques which require access to the SUT runtime behavior, to the SUT specification models, or to the SUT source code. 相似文献

5.

Modeling topic control to detect influence in conversations using nonparametric topic models

Viet-An Nguyen Jordan Boyd-Graber Philip Resnik Deborah A. Cai Jennifer E. Midberry Yuanxin Wang 《Machine Learning》2014,95(3):381-421

Identifying influential speakers in multi-party conversations has been the focus of research in communication, sociology, and psychology for decades. It has been long acknowledged qualitatively that controlling the topic of a conversation is a sign of influence. To capture who introduces new topics in conversations, we introduce SITS—Speaker Identity for Topic Segmentation—a nonparametric hierarchical Bayesian model that is capable of discovering (1) the topics used in a set of conversations, (2) how these topics are shared across conversations, (3) when these topics change during conversations, and (4) a speaker-specific measure of “topic control”. We validate the model via evaluations using multiple datasets, including work meetings, online discussions, and political debates. Experimental results confirm the effectiveness of SITS in both intrinsic and extrinsic evaluations. 相似文献

6.

MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks

《Information and Software Technology》2015

ContextMining software repositories has emerged as a research direction over the past decade, achieving substantial success in both research and practice to support various software maintenance tasks. Software repositories include bug repository, communication archives, source control repository, etc. When using these repositories to support software maintenance, inclusion of irrelevant information in each repository can lead to decreased effectiveness or even wrong results.ObjectiveThis article aims at selecting the relevant information from each of the repositories to improve effectiveness of software maintenance tasks.MethodFor a maintenance task at hand, maintainers need to implement the maintenance request on the current system. In this article, we propose an approach, MSR4SM, to extract the relevant information from each software repository based on the maintenance request and the current system. That is, if the information in a software repository is relevant to either the maintenance request or the current system, this information should be included to perform the current maintenance task. MSR4SM uses the topic model to extract the topics from these software repositories. Then, relevant information in each software repository is extracted based on the topics.ResultsMSR4SM is evaluated for two software maintenance tasks, feature location and change impact analysis, which are based on four subject systems, namely jEdit, ArgoUML, Rhino and KOffice. The empirical results show that the effectiveness of traditional software repositories based maintenance tasks can be greatly improved by MSR4SM.ConclusionsThere is a lot of irrelevant information in software repositories. Before we use them to implement a maintenance task at hand, we need to preprocess them. Then, the effectiveness of the software maintenance tasks can be improved. 相似文献

7.

A survey on the use of topic models when mining software repositories

Tse-Hsun Chen Stephen W. Thomas Ahmed E. Hassan 《Empirical Software Engineering》2016,21(5):1843-1919

Researchers in software engineering have attempted to improve software development by mining and analyzing software repositories. Since the majority of the software engineering data is unstructured, researchers have applied Information Retrieval (IR) techniques to help software development. The recent advances of IR, especially statistical topic models, have helped make sense of unstructured data in software repositories even more. However, even though there are hundreds of studies on applying topic models to software repositories, there is no study that shows how the models are used in the software engineering research community, and which software engineering tasks are being supported through topic models. Moreover, since the performance of these topic models is directly related to the model parameters and usage, knowing how researchers use the topic models may also help future studies make optimal use of such models. Thus, we surveyed 167 articles from the software engineering literature that make use of topic models. We find that i) most studies centre around a limited number of software engineering tasks; ii) most studies use only basic topic models; iii) and researchers usually treat topic models as black boxes without fully exploring their underlying assumptions and parameter values. Our paper provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task. 相似文献

8.

Estimating software readiness using predictive models

Tong-Seng Quah 《Information Sciences》2009,179(4):430-5009

In this study, defect tracking is used as a proxy method to predict software readiness. The number of remaining defects in an application under development is one of the most important factors that allow one to decide if a piece of software is ready to be released. By comparing predicted number of faults and number of faults discovered in testing, software manager can decide whether the software is likely ready to be released or not.The predictive model developed in this research can predict: (i) the number of faults (defects) likely to exist, (ii) the estimated number of code changes required to correct a fault and (iii) the estimated amount of time (in minutes) needed to make the changes in respective classes of the application. The model uses product metrics as independent variables to do predictions. These metrics are selected depending on the nature of source code with regards to architecture layers, types of faults and contribution factors of these metrics. The use of neural network model with genetic training strategy is introduced to improve prediction results for estimating software readiness in this study. This genetic-net combines a genetic algorithm with a statistical estimator to produce a model which also shows the usefulness of inputs.The model is divided into three parts: (1) prediction model for presentation logic tier (2) prediction model for business tier and (3) prediction model for data access tier. Existing object-oriented metrics and complexity software metrics are used in the business tier prediction model. New sets of metrics have been proposed for the presentation logic tier and data access tier. These metrics are validated using data extracted from real world applications. The trained models can be used as tools to assist software mangers in making software release decisions. 相似文献

9.

Studying just-in-time defect prediction using cross-project models

Yasutaka Kamei Takafumi Fukushima Shane McIntosh Kazuhiro Yamashita Naoyasu Ubayashi Ahmed E. Hassan 《Empirical Software Engineering》2016,21(5):2072-2106

Unlike traditional defect prediction models that identify defect-prone modules, Just-In-Time (JIT) defect prediction models identify defect-inducing changes. As such, JIT defect models can provide earlier feedback for developers, while design decisions are still fresh in their minds. Unfortunately, similar to traditional defect models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this limitation in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from other projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT models in a cross-project context. Through an empirical study on 11 open source projects, we find that while JIT models rarely perform well in a cross-project context, their performance tends to improve when using approaches that: (1) select models trained using other projects that are similar to the testing project, (2) combine the data of several other projects to produce a larger pool of training data, and (3) combine the models of several other projects to produce an ensemble model. Our findings empirically confirm that JIT models learned using other projects are a viable solution for projects with limited historical data. However, JIT models tend to perform best in a cross-project context when the data used to learn them are carefully selected. 相似文献

10.

Prediction of software reliability using connectionist models

Karunanithi N. Whitley D. Malaiya Y.K. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(7):563-574

The usefulness of connectionist models for software reliability growth prediction is illustrated. The applicability of the connectionist approach is explored using various network models, training regimes, and data representation methods. An empirical comparison is made between this approach and five well-known software reliability growth models using actual data sets from several different software projects. The results presented suggest that connectionist models may adapt well across different data sets and exhibit a better predictive accuracy. The analysis shows that the connectionist approach is capable of developing models of varying complexity 相似文献

11.

Evolution styles: foundations and models for software architecture evolution

Jeffrey M. Barnes David Garlan Bradley Schmerl 《Software and Systems Modeling》2014,13(2):649-678

As new market opportunities, technologies, platforms, and frameworks become available, systems require large-scale and systematic architectural restructuring to accommodate them. Today’s architects have few techniques to help them plan this architecture evolution. In particular, they have little assistance in planning alternative evolution paths, trading off various aspects of the different paths, or knowing best practices for particular domains. In this paper, we describe an approach for planning and reasoning about architecture evolution. Our approach focuses on providing architects with the means to model prospective evolution paths and supporting analysis to select among these candidate paths. To demonstrate the usefulness of our approach, we show how it can be applied to an actual architecture evolution. In addition, we present some theoretical results about our evolution path constraint specification language. 相似文献

12.

Developing flexible productivity measurement models using spreadsheet software

《Computers & Industrial Engineering》1988,14(2):161-170

This paper discusses the development of a series of interactive computer models for measuring productivity. By using LOTUS 123, a series of flexible models were developed which can easily be modified to fit the productivity measurement system used by most companies. Rather than force the company's productivity measurement system to fit an available computer model, a company can now tailor the computer model to exactly fit its productivity measurement system. 相似文献

13.

Additive regularization of topic models

Konstantin Vorontsov Anna Potapenko 《Machine Learning》2015,101(1-3):303-323

相似文献

14.

基于PSO的软件可靠性模型参数估计方法

下载免费PDF全文

张克涵李爱国宋保维《计算机工程与应用》2008,44(11):47-49

软件可靠性建模是一个重要的研究领域,现有的软件可靠性模型基本上是非线性函数模型,估计这些模型的参数比较困难。粒子群优化是一类适合求解非线性优化问题的随机优化方法,提出一种基于粒子群优化的软件可靠性模型估计参数方法,该方法的关键是构造合适的适应函数。用该方法分别估计了5个实际软件系统的指数软件可靠性模型以及对数泊松执行时间模型,实验结果表明：该方法参数估计的精度高,对模型的适应性强。相似文献

15.

Solving models with inequalities using standard econometric software

Arie ten Cate 《Computational statistics & data analysis》2009,53(6):2055-2060

Simultaneous econometric models may contain pairs of complementary inequalities. It is discussed how to reformulate such models and solve them with econometric software which can handle only equalities. Two approaches are applied: the normal map representation and the Fischer-Burmeister NCP function. The latter seems to work best. The software programs TSP, SAS/ETS and EViews are tested. The test model describes two markets for electricity, each with fluctuating demand and an endogenous production capacity; the capacity of the trade link between the regions is also endogenous. 相似文献

16.

Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models

Liangzhe Chen K. S. M. Tozammel Hossain Patrick Butler Naren Ramakrishnan B. Aditya Prakash 《Data mining and knowledge discovery》2016,30(3):681-710

Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for nowcasting the Flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemics in a population. There is a disconnect between data-driven methods for forecasting Flu incidence and epidemiological models that adopt a state based understanding of transitions, that can lead to sub-optimal predictions. Furthermore, models for epidemiological activity and social activity like on Twitter predict different shapes and have important differences. In this paper, we propose two temporal topic models (one unsupervised model as well as one improved weakly-supervised model) to capture hidden states of a user from his tweets and aggregate states in a geographical region for better estimation of trends. We show that our approaches help fill the gap between phenomenological methods for disease surveillance and epidemiological models. We validate our approaches by modeling the Flu using Twitter in multiple countries of South America. We demonstrate that our models can consistently outperform plain vocabulary assessment in Flu case-count predictions, and at the same time get better Flu-peak predictions than competitors. We also show that our fine-grained modeling can reconcile some contrasting behaviors between epidemiological and social models. 相似文献

17.

企业竞争情报主题挖掘与主题演化研究

杨秀璋武帅夏换于小民范郁锋《计算机时代》2021,(7):21-27

研究了我国企业竞争情报的热点主题和主题演化态势,利用主题挖掘与主题演化方法系统梳理了我国企业竞争情报领域的研究成果.通过Python自动提取及预处理文献数据,再利用共词分析、LDA模型和知识图谱挖掘该领域的核心科研群体和热点主题,最后结合主题演化方法梳理企业竞争情报的发展脉络.该研究可为企业竞争情报领域今后的相关探索提... 相似文献

18.

基于微博文本的词对主题演化模型

史庆伟刘雨诗张丰田《计算机应用》2017,37(5):1407-1412

针对传统主题模型忽略了微博短文本和文本动态演化的问题,提出了基于微博文本的词对主题演化（BToT）模型,并根据所提模型对数据集进行主题演化分析。BToT模型在文本生成过程中引入连续的时间变量具体描述时间维度上的主题动态演化,同时在文档中构成主题共享的“词对”结构,扩充了短文本特征。采用Gibbs采样方法对BToT参数进行估计,根据获得的主题-时间分布参数对主题进行演化分析。在真实微博数据集上进行验证,结果表明,BToT模型可以描述微博数据集中潜在的主题演化规律,获得的困惑度评价系数低于潜在狄利克雷分配（LDA）、词对主题模型（BTM）和主题演化模型（ToT）。相似文献

19.

Detection of unanticipated faults for autonomous underwater vehicles using online topic models

下载免费PDF全文

Ben‐Yair Raanan James Bellingham Yanwu Zhang Mathieu Kemp Brian Kieft Hanumant Singh Yogesh Girdhar 《野外机器人技术杂志》2018,35(5):705-716

For robots to succeed in complex missions, they must be reliable in the face of subsystem failures and environmental challenges. In this paper, we focus on autonomous underwater vehicle (AUV) autonomy as it pertains to self‐perception and health monitoring, and we argue that automatic classification of state‐sensor data represents an important enabling capability. We apply an online Bayesian nonparametric topic modeling technique to AUV sensor data in order to automatically characterize its performance patterns, then demonstrate how in combination with operator‐supplied semantic labels these patterns can be used for fault detection and diagnosis by means of a nearest‐neighbor classifier. The method is evaluated using data collected by the Monterey Bay Aquarium Research Institute's Tethys long‐range AUV in three separate field deployments. Our results show that the proposed method is able to accurately identify and characterize patterns that correspond to various states of the AUV, and classify faults at a high rate of correct detection with a very low false detection rate. 相似文献

20.

Hashtag-based topic evolution in social media

Md.?Hijbul?Alam Woo-Jong?Ryu SangKeun?Lee Email author 《World Wide Web》2017,20(6):1527-1549

The rise of online social media has led to an explosion of metadata-containing user generated content. The tracking of metadata distribution is essential to understand social media. This paper presents two statistical models that detect interpretable topics over time along with their hashtags distribution. A topic is represented by a cluster of words that frequently occur together, and a context is represented by a cluster of hashtags, i.e., the hashtag distribution. The models combine a context with a related topic by jointly modeling words with hashtags and time. Experiments with real-world datasets demonstrate that the proposed models discover topics over time with related contexts effectively. 相似文献