期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Artificial Intelligence Based Sentence Level Sentiment Analysis of COVID-19

Sundas Rukhsar Mazhar Javed Awan Usman Naseem Dilovan Asaad Zebari Mazin Abed Mohammed Marwan Ali Albahar Mohammed Thanoon Amena Mahmoud 《计算机系统科学与工程》2023,47(1):791-807

Web-blogging sites such as Twitter and Facebook are heavily influenced by emotions, sentiments, and data in the modern era. Twitter, a widely used microblogging site where individuals share their thoughts in the form of tweets, has become a major source for sentiment analysis. In recent years, there has been a significant increase in demand for sentiment analysis to identify and classify opinions or expressions in text or tweets. Opinions or expressions of people about a particular topic, situation, person, or product can be identified from sentences and divided into three categories: positive for good, negative for bad, and neutral for mixed or confusing opinions. The process of analyzing changes in sentiment and the combination of these categories is known as “sentiment analysis.” In this study, sentiment analysis was performed on a dataset of 90,000 tweets using both deep learning and machine learning methods. The deep learning-based model long-short-term memory (LSTM) performed better than machine learning approaches. Long short-term memory achieved 87% accuracy, and the support vector machine (SVM) classifier achieved slightly worse results than LSTM at 86%. The study also tested binary classes of positive and negative, where LSTM and SVM both achieved 90% accuracy. 相似文献

2.

Comparison of different machine learning techniques on location extraction by utilizing geo-tagged tweets: A case study

《Advanced Engineering Informatics》2020

In emergencies, Twitter is an important platform to get situational awareness simultaneously. Therefore, information about Twitter users’ location is a fundamental aspect to understand the disaster effects. But location extraction is a challenging task. Most of the Twitter users do not share their locations in their tweets. In that respect, there are different methods proposed for location extraction which cover different fields such as statistics, machine learning, etc. This study is a sample study that utilizes geo-tagged tweets to demonstrate the importance of the location in disaster management by taking three cases into consideration. In our study, tweets are obtained by utilizing the “earthquake” keyword to determine the location of Twitter users. Tweets are evaluated by utilizing the Latent Dirichlet Allocation (LDA) topic model and sentiment analysis through machine learning classification algorithms including the Multinomial and Gaussian Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, Extra Trees, Neural Network, k Nearest Neighbor (kNN), Stochastic Gradient Descent (SGD), and Adaptive Boosting (AdaBoost) classifications. Therefore, 10 different machine learning algorithms are applied in our study by utilizing sentiment analysis based on location-specific disaster-related tweets by aiming fast and correct response in a disaster situation. In addition, the effectiveness of each algorithm is evaluated in order to gather the right machine learning algorithm. Moreover, topic extraction via LDA is provided to comprehend the situation after a disaster. The gathered results from the application of three cases indicate that Multinomial Naïve Bayes and Extra Trees machine learning algorithms give the best results with an F-measure value over 80%. The study aims to provide a quick response to earthquakes by applying the aforementioned techniques. 相似文献

3.

Sentiment Analysis with Tweets Behaviour in Twitter Streaming API

Kuldeep Chouhan Mukesh Yadav Ranjeet Kumar Rout Kshira Sagar Sahoo NZ Jhanjhi Mehedi Masud Sultan Aljahdali 《计算机系统科学与工程》2023,45(2):1113-1128

Twitter is a radiant platform with a quick and effective technique to analyze users’ perceptions of activities on social media. Many researchers and industry experts show their attention to Twitter sentiment analysis to recognize the stakeholder group. The sentiment analysis needs an advanced level of approaches including adoption to encompass data sentiment analysis and various machine learning tools. An assessment of sentiment analysis in multiple fields that affect their elevations among the people in real-time by using Naive Bayes and Support Vector Machine (SVM). This paper focused on analysing the distinguished sentiment techniques in tweets behaviour datasets for various spheres such as healthcare, behaviour estimation, etc. In addition, the results in this work explore and validate the statistical machine learning classifiers that provide the accuracy percentages attained in terms of positive, negative and neutral tweets. In this work, we obligated Twitter Application Programming Interface (API) account and programmed in python for sentiment analysis approach for the computational measure of user’s perceptions that extract a massive number of tweets and provide market value to the Twitter account proprietor. To distinguish the results in terms of the performance evaluation, an error analysis investigates the features of various stakeholders comprising social media analytics researchers, Natural Language Processing (NLP) developers, engineering managers and experts involved to have a decision-making approach. 相似文献

4.

Game-based crowdsourcing to support collaborative customization of the definition of sustainability

《Advanced Engineering Informatics》2018

Successful adoption and management of sustainable urban systems hinges on the community embracing these systems. Capturing citizens’ ideas, views, and assessments of the built environment will be essential to this goal. In collaborative city planning, these are qualified and valued forms of partial knowledge that should be collectively used to shape the decision making process of urban planning. Among other tools, social media and online social network analytics can provide means to capture elements of such a distributed knowledge. While a structured definition of sustainability (normally dictated in a top-down fashion) may not sufficiently respond well to the pluralist nature of such knowledge acquisition; dealing with the unstructured community inputs, assessments and contributions on social media can be confusing. We can detect fully relevant topics/ideas in community discussions; but they typically suffer from lack of coherence.In this paper, we advocate the use of a semi-structured approach for capturing, analyzing, and interpreting citizens’ inputs. Public officials and professionals can develop the main elements (topical aspects) of sustainability, which can act as the skeleton of a taxonomy. It is however, the community inputs/ideas (in our case collected via social media and parsed), that can shape-up that skeleton and augment those topical aspects with adding the required semantic depth. In more specific terms, we collected tweets for four urban infrastructure mega-projects in North America. Then we used a game-with-a-purpose to crowdsource the identification of topics for a training set of tweets. This was then used to train machine learning algorithms to cluster the rest of collected tweets. We studied the semantic (finding the topics) of tweets as well as their sentiment (in terms of being opposing or supportive of a project). Our classification tested different decision trees with different topic hierarchies. We considered/extracted eight different linguistic features in studying contents of a tweet. Finally, we examined the accuracy of three algorithms in classifying tweets according to the sequence in the tree, and based on the extracted features. These are: K-nearest neighbors, Naïve Bayes classifiers and Support Vector Machines (SVM).Respective to our data set, SVM outperformed other algorithms. Semantic analysis was insensitive to the depth/number of linguistic features considered. In contrast, sentiment analysis was enhanced when part of speech (PoS) was tracked. Interestingly, our work shows that considering the topic (semantic) of a tweet helped enhance the accuracy of sentiment analysis: including topical class as a feature in conducting sentiment analysis results in higher accuracies. This could be used as means to detect the evolution of community opinion: that topic-based social networks are evolving within the communities tweeting about urban projects. It could also be used to identify the topics of top priority to the community or the ones that have the widest spread of views. In our case, these were mainly the impacts of the design and engineering features on social issues. 相似文献

5.

Leveraging social media to gain insights into service delivery: a study on Airbnb

Moritz von Hoffen Marvin Hagge Jan Hendrik Betzing Friedrich Chasin 《Information Systems and E-Business Management》2018,16(2):247-269

Consumers increasingly rely on reviews and social media posts provided by others to get information about a service. Especially in the Sharing Economy, the quality of service delivery varies widely; no common quality standard can be expected. Because of the rapidly increasing number of reviews and tweets regarding a particular service, the available information becomes unmanageable for a single individual. However, this data contains valuable insights for platform operators to improve the service and educate individual providers. Therefore, an automated tool to summarize this flood of information is needed. Various approaches to aggregating and analyzing unstructured texts like reviews and tweets have already been proposed. In this research, we present a software toolkit that supports the sentiment analysis workflow informed by the current state-of-the-art. Our holistic toolkit embraces the entire process, from data collection and filtering to automated analysis to an interactive visualization of the results to guide researchers and practitioners in interpreting the results. We give an example of how the tool works by identifying positive and negative sentiments from reviews and tweets regarding Airbnb and delivering insights into the features of service delivery its users most value and most dislike. In doing so, we lay the foundation for learning why people participate in the Sharing Economy and for showing how to use the data. Beyond its application on the Sharing Economy, the proposed toolkit is a step toward providing the research community with an instrument for a holistic sentiment analysis of individual domains of interest. 相似文献

6.

Discovering filter keywords for company name disambiguation in twitter

Damiano Spina Julio Gonzalo Enrique Amigó 《Expert systems with applications》2013,40(12):4986-5003

A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate. In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company. Our approach relies on the identification of filter keywords: those whose presence in a tweet reliably confirm (positive keywords) or discard (negative keywords) that the tweet refers to the company.We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algorithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart.Our study also shows that (i) filter keywords for Twitter does not directly derive from the public information about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy; (ii) filter keywords can indeed be a productive way of classifying tweets: the five best possible keywords cover, in average, 28% of the tweets for a company in our test collection. 相似文献

7.

More than words: Social networks’ text mining for consumer brand sentiments

Mohamed M. Mostafa 《Expert systems with applications》2013,40(10):4241-4251

Blogs and social networks have recently become a valuable resource for mining sentiments in fields as diverse as customer relationship management, public opinion tracking and text filtering. In fact knowledge obtained from social networks such as Twitter and Facebook has been shown to be extremely valuable to marketing research companies, public opinion organizations and other text mining entities. However, Web texts have been classified as noisy as they represent considerable problems both at the lexical and the syntactic levels. In this research we used a random sample of 3516 tweets to evaluate consumers’ sentiment towards well-known brands such as Nokia, T-Mobile, IBM, KLM and DHL. We used an expert-predefined lexicon including around 6800 seed adjectives with known orientation to conduct the analysis. Our results indicate a generally positive consumer sentiment towards several famous brands. By using both a qualitative and quantitative methodology to analyze brands’ tweets, this study adds breadth and depth to the debate over attitudes towards cosmopolitan brands. 相似文献

8.

A spark-based big data analysis framework for real-time sentiment prediction on streaming data

Deniz Kılınç 《Software》2019,49(9):1352-1364

There are many data sources that produce large volumes of data. The Big Data nature requires new distributed processing approaches to extract the valuable information. Real-time sentiment analysis is one of the most demanding research areas that requires powerful Big Data analytics tools such as Spark. Prior literature survey work has shown that, though there are many conventional sentiment analysis researches, there are only few works realizing sentiment analysis in real time. One major point that affects the quality of real-time sentiment analysis is the confidence of the generated data. In more clear terms, it is a valuable research question to determine whether the owner that generates sentiment is genuine or not. Since data generated by fake personalities may decrease accuracy of the outcome, a smart/intelligent service that can identify the source of data is one of the key points in the analysis. In this context, we include a fake account detection service to the proposed framework. Both sentiment analysis and fake account detection systems are trained and tested using Naïve Bayes model from Apache Spark's machine learning library. The developed system consists of four integrated software components, ie, (i) machine learning and streaming service for sentiment prediction, (ii) a Twitter streaming service to retrieve tweets, (iii) a Twitter fake account detection service to assess the owner of the retrieved tweet, and (iv) a real-time reporting and dashboard component to visualize the results of sentiment analysis. The sentiment classification performances of the system for offline and real-time modes are 86.77% and 80.93%, respectively. 相似文献

9.

App store mining is not enough for app improvement

Maleknaz Nayebi Henry Cho Guenther Ruhe 《Empirical Software Engineering》2018,23(5):2764-2794

The rise in popularity of mobile devices has led to a parallel growth in the size of the app store market, intriguing several research studies and commercial platforms on mining app stores. App store reviews are used to analyze different aspects of app development and evolution. However, app users’ feedback does not only exist on the app store. In fact, despite the large quantity of posts that are made daily on social media, the importance and value that these discussions provide remain mostly unused in the context of mobile app development. In this paper, we study how Twitter can provide complementary information to support mobile app development. By analyzing a total of 30,793 apps over a period of six weeks, we found strong correlations between the number of reviews and tweets for most apps. Moreover, through applying machine learning classifiers, topic modeling and subsequent crowd-sourcing, we successfully mined 22.4% additional feature requests and 12.89% additional bug reports from Twitter. We also found that 52.1% of all feature requests and bug reports were discussed on both tweets and reviews. In addition to finding common and unique information from Twitter and the app store, sentiment and content analysis were also performed for 70 randomly selected apps. From this, we found that tweets provided more critical and objective views on apps than reviews from the app store. These results show that app store review mining is indeed not enough; other information sources ultimately provide added value and information for app developers. 相似文献

10.

MISNIS: An intelligent platform for twitter topic mining

《Expert systems with applications》2017

Twitter has become a major tool for spreading news, for dissemination of positions and ideas, and for the commenting and analysis of current world events. However, with more than 500 million tweets flowing per day, it is necessary to find efficient ways of collecting, storing, managing, mining and visualizing all this information. This is especially relevant if one considers that Twitter has no ways of indexing tweet contents, and that the only available categorization “mechanism” is the #hashtag, which is totally dependent of a user's will to use it. This paper presents an intelligent platform and framework, named MISNIS - Intelligent Mining of Public Social Networks’ Influence in Society - that facilitates these issues and allows a non-technical user to easily mine a given topic from a very large tweet's corpus and obtain relevant contents and indicators such as user influence or sentiment analysis.When compared to other existent similar platforms, MISNIS is an expert system that includes specifically developed intelligent techniques that: (1) Circumvent the Twitter API restrictions that limit access to 1% of all flowing tweets. The platform has been able to collect more than 80% of all flowing portuguese language tweets in Portugal when online; (2) Intelligently retrieve most tweets related to a given topic even when the tweets do not contain the topic #hashtag or user indicated keywords. A 40% increase in the number of retrieved relevant tweets has been reported in real world case studies.The platform is currently focused on Portuguese language tweets posted in Portugal. However, most developed technologies are language independent (e.g. intelligent retrieval, sentiment analysis, etc.), and technically MISNIS can be easily expanded to cover other languages and locations. 相似文献

11.

基于多任务学习的时序多模态情感分析模型

章荪尹春勇《计算机应用》2021,41(6):1631-1639

针对时序多模态情感分析中存在的单模态特征表示和跨模态特征融合问题,结合多头注意力机制,提出一种基于多任务学习的情感分析模型。首先,使用卷积神经网络（CNN）、双向门控循环神经网络（BiGRU）和多头自注意力（MHSA）实现了对时序单模态的特征表示;然后,利用多头注意力实现跨模态的双向信息融合;最后,基于多任务学习思想,添加额外的情感极性分类和情感强度回归任务作为辅助,从而提升情感评分回归主任务的综合性能。实验结果表明,相较于多模态分解模型,所提模型的二分类准确度指标在CMU-MOSEI和CMU-MOSI多模态数据集上分别提高了7.8个百分点和3.1个百分点。该模型适用于多模态场景下的情感分析问题,能够为商品推荐、股市预测、舆情监控等应用提供决策支持。相似文献

12.

Ontology-based sentiment analysis of twitter posts

Efstratios Kontopoulos Christos Berberidis Theologos Dergiades Nick Bassiliades 《Expert systems with applications》2013,40(10):4065-4074

The emergence of Web 2.0 has drastically altered the way users perceive the Internet, by improving information sharing, collaboration and interoperability. Micro-blogging is one of the most popular Web 2.0 applications and related services, like Twitter, have evolved into a practical means for sharing opinions on almost all aspects of everyday life. Consequently, micro-blogging web sites have since become rich data sources for opinion mining and sentiment analysis. Towards this direction, text-based sentiment classifiers often prove inefficient, since tweets typically do not consist of representative and syntactically consistent words, due to the imposed character limit. This paper proposes the deployment of original ontology-based techniques towards a more efficient sentiment analysis of Twitter posts. The novelty of the proposed approach is that posts are not simply characterized by a sentiment score, as is the case with machine learning-based classifiers, but instead receive a sentiment grade for each distinct notion in the post. Overall, our proposed architecture results in a more detailed analysis of post opinions regarding a specific topic. 相似文献

13.

Discovering public sentiment in social media for predicting stock movement of publicly listed companies

《Information Systems》2017

The popularity of many social media sites has prompted both academic and practical research on the possibility of mining social media data for the analysis of public sentiment. Studies have suggested that public emotions shown through Twitter could be well correlated with the Dow Jones Industrial Average. However, it remains unclear how public sentiment, as reflected on social media, can be used to predict stock price movement of a particular publicly-listed company. In this study, we attempt to fill this research void by proposing a technique, called SMeDA-SA, to mine Twitter data for sentiment analysis and then predict the stock movement of specific listed companies. For the purpose of experimentation, we collected 200 million tweets that mentioned one or more of 30 companies that were listed in NASDAQ or the New York Stock Exchange. SMeDA-SA performs its task by first extracting ambiguous textual messages from these tweets to create a list of words that reflects public sentiment. SMeDA-SA then made use of a data mining algorithm to expand the word list by adding emotional phrases so as to better classify sentiments in the tweets. With SMeDA-SA, we discover that the stock movement of many companies can be predicted rather accurately with an average accuracy over 70%. This paper describes how SMeDA-SA can be used to mine social media date for sentiments. It also presents the key implications of our study. 相似文献

14.

Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis

《Computer Speech and Language》2014,28(1):56-75

Sentiment analysis is the natural language processing task dealing with sentiment detection and classification from texts. In recent years, due to the growth in the quantity and fast spreading of user-generated contents online and the impact such information has on events, people and companies worldwide, this task has been approached in an important body of research in the field. Despite different methods having been proposed for distinct types of text, the research community has concentrated less on developing methods for languages other than English. In the above-mentioned context, the present work studies the possibility to employ machine translation systems and supervised methods to build models able to detect and classify sentiment in languages for which less/no resources are available for this task when compared to English, stressing upon the impact of translation quality on the sentiment classification performance. Our extensive evaluation scenarios show that machine translation systems are approaching a good level of maturity and that they can, in combination to appropriate machine learning algorithms and carefully chosen features, be used to build sentiment analysis systems that can obtain comparable performances to the one obtained for English. 相似文献

15.

Speech-acts based analysis for requirements discovery from online discussions

《Information Systems》2019

Online discussions about software applications and services that take place on web-based communication platforms represent an invaluable knowledge source for diverse software engineering tasks, including requirements elicitation. The amount of research work on developing effective tool-supported analysis methods is rapidly increasing, as part of the so called software analytics. Textual messages in App store reviews, tweets, online discussions taking place in mailing lists and user forums, are processed by combining natural language techniques to filter out irrelevant data; text mining and machine learning algorithms to classify messages into different categories, such as bug report and feature request.Our research objective is to exploit a linguistic technique based on speech-acts for the analysis of online discussions with the ultimate goal of discovering requirements-relevant information. In this paper, we present a revised and extended version of the speech-acts based analysis technique, which we previously presented at CAiSE 2017, together with a detailed experimental characterisation of its properties. Datasets used in the experimental evaluation are taken from a widely used open source software project (161120 textual comments), as well as from an industrial project in the home energy management domain. We make them available for experiment replication purposes. On these datasets, our approach is able to successfully classify messages into Feature/Enhancement and Other, with F-measure of 0.81 and 0.84 respectively. We also found evidence that there is an association between types of speech-acts and categories of issues, and that there is correlation between some of the speech-acts and issue priority, thus motivating further research on the exploitation of our speech-acts based analysis technique in semi-automated multi-criteria requirements prioritisation. 相似文献

16.

RedTweet: recommendation engine for reddit

Hoang Nguyen Rachel Richards Chien-Chung Chan Kathy J. Liszka 《Journal of Intelligent Information Systems》2016,47(2):247-265

Twitter and Reddit are two of the most popular social media sites used today. In this paper, we study the use of machine learning and WordNet-based classifiers to generate an interest profile from a user’s tweets and use this to recommend loosely related Reddit threads which the reader is most likely to be interested in. We introduce a genre classification algorithm using a similarity measure derived from WordNet lexical database for English to label genres for nouns in tweets. The proposed algorithm generates a user’s interest profile from their tweets based on a referencing taxonomy of genres derived from the genre-tagged Brown Corpus augmented with a technology genre. The top K genres of a user’s interest profile can be used for recommending subreddit articles in those genres. Experiments using real life test cases collected from Twitter have been done to compare the performance on genre classification by using the WordNet classifier and machine learning classifiers such as SVM, Random Forests, and an ensemble of Bayesian classifiers. Empirically, we have obtained similar results from the two different approaches with a sufficient number of tweets. It seems that machine learning algorithms as well as the WordNet ontology are viable tools for developing recommendation engine based on genre classification. One advantage of the WordNet approach is simplicity and no learning is required. However, the WordNet classifier tends to have poor precision on users with very few tweets. 相似文献

17.

SA-MSVM: Hybrid Heuristic Algorithm-based Feature Selection for Sentiment Analysis in Twitter

C. P. Thamil Selvi R. PushpaLakshmi 《计算机系统科学与工程》2023,44(3):2439-2456

One of the drastically growing and emerging research areas used in most information technology industries is Bigdata analytics. Bigdata is created from social websites like Facebook, WhatsApp, Twitter, etc. Opinions about products, persons, initiatives, political issues, research achievements, and entertainment are discussed on social websites. The unique data analytics method cannot be applied to various social websites since the data formats are different. Several approaches, techniques, and tools have been used for big data analytics, opinion mining, or sentiment analysis, but the accuracy is yet to be improved. The proposed work is motivated to do sentiment analysis on Twitter data for cloth products using Simulated Annealing incorporated with the Multiclass Support Vector Machine (SA-MSVM) approach. SA-MSVM is a hybrid heuristic approach for selecting and classifying text-based sentimental words following the Natural Language Processing (NLP) process applied on tweets extracted from the Twitter dataset. A simulated annealing algorithm searches for relevant features and selects and identifies sentimental terms that customers criticize. SA-MSVM is implemented, experimented with MATLAB, and the results are verified. The results concluded that SA-MSVM has more potential in sentiment analysis and classification than the existing Support Vector Machine (SVM) approach. SA-MSVM has obtained 96.34% accuracy in classifying the product review compared with the existing systems. 相似文献

18.

The pandemic semesters: Examining public opinion regarding online learning amidst COVID-19

Andy Ohemeng Asare Robin Yap Ngoc Truong Eric Ohemeng Sarpong 《Journal of Computer Assisted Learning》2021,37(6):1591-1605

The current educational disruption caused by the COVID-19 pandemic has fuelled a plethora of investments and the use of educational technologies for Emergency Remote Learning (ERL). Despite the significance of online learning for ERL across most educational institutions, there are wide mixed perceptions about online learning during this pandemic. This study, therefore, aims at examining public perception about online learning for ERL during COVID-19. The study sample included 31,009 English language Tweets extracted and cleaned using Twitter API, Python libraries and NVivo, from 10 March 2020 to 25 July 2020, using keywords: COVID-19, Corona, e-learning, online learning, distance learning. Collected tweets were analysed using word frequencies of unigrams and bigrams, sentiment analysis, topic modelling, and sentiment labeling, cluster, and trend analysis. The results identified more positive and negative sentiments within the dataset and identified topics. Further, the identified topics which are learning support, COVID-19, online learning, schools, distance learning, e-learning, students, and education were clustered among each other. The number of daily COVID-19 related cases had a weak linear relationship with the number of online learning tweets due to the low number of tweets during the vacation period from April to June 2020. The number of tweets increased during the early weeks of July 2020 as a result of the increasing number of mixed reactions to the reopening of schools. The study findings and recommendations underscore the need for educational systems, government agencies, and other stakeholders to practically implement online learning measures and strategies for ERL in the quest of reopening of schools. 相似文献

19.

An empirical study of software release notes

Surafel Lemma Abebe Nasir Ali Ahmed E. Hassan 《Empirical Software Engineering》2016,21(3):1107-1142

Release notes are an important source of information about a new software release. Such notes contain information regarding what is new, changed, and/or got fixed in a release. Despite the importance of release notes, they are rarely explored in the research literature. Little is known about the contained information, e.g., contents and structure, in release notes. To better understand the types of contained information in release notes, we manually analyzed 85 release notes across 15 different software systems. In our manual analysis, we identify six different types of information (e.g., caveats and addressed issues) that are contained in release notes. Addressed issues refer to new features, bugs, and improvements that were integrated in that particular release. We observe that most release notes list only a selected number of addressed issues (i.e., 6-26 % of all addressed issues in a release). We investigated nine different factors (e.g., issue priority and type) to better understand the likelihood of an issue being listed in release notes. The investigation is conducted on eight release notes of three software systems using four machine learning techniques. Results show that certain factors, e.g., issue type, have higher influence on the likelihood of an issue to be listed in release notes. We use machine learning techniques to automatically suggest the issues to be listed in release notes. Our results show that issues listed in all release notes can be automatically determined with an average precision of 84 % and an average recall of 90 %. To train and build the classification models, we also explored three scenarios: (a) having the user label some issues for a release and automatically suggest the remaining issues for that particular release, (b) using the previous release notes for the same software system, and (c) using prior releases for the current software system and the rest of the studied software systems. Our results show that the content of release notes vary between software systems and across the versions of the same software system. Nevertheless, automated techniques can provide reasonable support to the writers of such notes with little training data. Our study provides developers with empirically-supported advice about release notes instead of simply relying on adhoc advice from on-line inquiries. 相似文献

20.

Building layered,multilingual sentiment lexicons at synset and lemma levels

《Expert systems with applications》2014,41(13):5984-5994

Many tasks related to sentiment analysis rely on sentiment lexicons, lexical resources containing information about the emotional implications of words (e.g., sentiment orientation of words, positive or negative). In this work, we present an automatic method for building lemma-level sentiment lexicons, which has been applied to obtain lexicons for English, Spanish and other three official languages in Spain. Our lexicons are multi-layered, allowing applications to trade off between the amount of available words and the accuracy of the estimations. Our evaluations show high accuracy values in all cases. As a previous step to the lemma-level lexicons, we have built a synset-level lexicon for English similar to SentiWordNet 3.0, one of the most used sentiment lexicons nowadays. We have made several improvements in the original SentiWordNet 3.0 building method, reflecting significantly better estimations of positivity and negativity, according to our evaluations. The resource containing all the lexicons, ML-SentiCon, is publicly available. 相似文献