首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Feature selection and sentiment analysis are two common studies that are currently being conducted; consistent with the advancements in computing and growing the use of social media. High dimensional or large feature sets is a key issue in sentiment analysis as it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimal subset of the features. Furthermore, most reviews from social media carry a lot of noise and irrelevant information. Therefore, this study proposes a new text-feature selection method that uses a combination of rough set theory (RST) and teaching-learning based optimization (TLBO), which is known as RSTLBO. The framework to develop the proposed RSTLBO includes numerous stages: (1) acquiring the standard datasets (user reviews of six major U.S. airlines) which are used to validate search result feature selection methods, (2) pre-processing of the dataset using text processing methods. This involves applying text processing methods from natural language processing techniques, combined with linguistic processing techniques to produce high classification results, (3) employing the RSTLBO method, and (4) using the selected features from the previous process for sentiment classification using the Support Vector Machine (SVM) technique. Results show an improvement in sentiment analysis when combining natural language processing with linguistic processing for text processing. More importantly, the proposed RSTLBO feature selection algorithm is able to produce an improved sentiment analysis.  相似文献   

2.
The localization of clinically important points in brain images is crucial for many neurological studies. Conventional manual landmark annotation requires expertise and is often time‐consuming. In this work, we propose an automatic approach for interest point localization in brain image using landmark‐annotated atlas (LAA). The landmark detection procedure is formulated as a problem of finding corresponding points of the atlas. The LAA is constructed from a set of brain images with clinically relevant landmarks annotated. It provides not only the spatial information of the interest points of the brain but also the optimal features for landmark detection through a learning process. Evaluation was performed on 3D magnetic resonance (MR) data using cross‐validation. Obtained results demonstrate that the proposed method achieves the accuracy of ~ 2 mm, which outperforms the traditional methods such as block matching technique and direct image registration. © 2012 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 22, 145–152, 2012  相似文献   

3.
In this article, we describe a process for applying sentiment analysis to accelerate the identification of topics and trends in datasets of critical incident responses. Using our work applying the critical incident technique (CIT) at two hospital systems as a case study, we propose a practical methodology to identify trends in narrative data using emotional content assessment and data visualization. An overall framework is provided for engineering managers that is generalizable to other large narrative datasets, such as consumer complaints or survey results, for identifying the patterns in narrative responses more quickly and economically than by manual reviews.  相似文献   

4.
The feedback collection and analysis has remained an important subject matter for long. The traditional techniques for student feedback analysis are based on questionnaire-based data collection and analysis. However, the student expresses their feedback opinions on online social media sites, which need to be analyzed. This study aims at the development of fuzzy-based sentiment analysis system for analyzing student feedback and satisfaction by assigning proper sentiment score to opinion words and polarity shifters present in the input reviews. Our technique computes the sentiment score of student feedback reviews and then applies a fuzzy-logic module to analyze and quantify student’s satisfaction at the fine-grained level. The experimental results reveal that the proposed work has outperformed the baseline studies as well as state-of-the-art machine learning classifiers.  相似文献   

5.
The semantic gap problem in image retrieval has motivated much work focusing on automatic image annotation, aimed at facilitating computers to automatically assign keywords to images. The basic measure for evaluating the annotation performance is usually to examine the annotation accuracy. To do this, the fraction of the relevant images, which have been correctly classified by a specific classifier or image annotation system, is measured. Consequently, the evaluation result can be thought of as a surrogate for the judgment of real users. However, the ability of this kind of quantitative evaluation measure to fully evaluate the performance and value of image annotation systems is limited. This paper introduces two complementary metrics related to the rates of annotation accuracy, which can help to further assess the robustness and stability of image annotation systems. They are: (i) the number of annotated keywords with zero-rate accuracy and (ii) the coefficient of variation of annotation accuracy. The evaluation results based on three datasets show that these two metrics are very useful to make a more reliable conclusion for image annotation systems.  相似文献   

6.
As the COVID-19 pandemic swept the globe, social media platforms became an essential source of information and communication for many. International students, particularly, turned to Twitter to express their struggles and hardships during this difficult time. To better understand the sentiments and experiences of these international students, we developed the Situational Aspect-Based Annotation and Classification (SABAC) text mining framework. This framework uses a three-layer approach, combining baseline Deep Learning (DL) models with Machine Learning (ML) models as meta-classifiers to accurately predict the sentiments and aspects expressed in tweets from our collected Student-COVID-19 dataset. Using the proposed aspect2class annotation algorithm, we labeled bulk unlabeled tweets according to their contained aspect terms. However, we also recognized the challenges of reducing data’s high dimensionality and sparsity to improve performance and annotation on unlabeled datasets. To address this issue, we proposed the Volatile Stopwords Filtering (VSF) technique to reduce sparsity and enhance classifier performance. The resulting Student-COVID Twitter dataset achieved a sophisticated accuracy of 93.21% when using the random forest as a meta-classifier. Through testing on three benchmark datasets, we found that the SABAC ensemble framework performed exceptionally well. Our findings showed that international students during the pandemic faced various issues, including stress, uncertainty, health concerns, financial stress, and difficulties with online classes and returning to school. By analyzing and summarizing these annotated tweets, decision-makers can better understand and address the real-time problems international students face during the ongoing pandemic.  相似文献   

7.
Supervised machine learning approaches are effective in text mining, but their success relies heavily on manually annotated corpora. However, there are limited numbers of annotated biomedical event corpora, and the available datasets contain insufficient examples for training classifiers; the common cure is to seek large amounts of training samples from unlabeled data, but such data sets often contain many mislabeled samples, which will degrade the performance of classifiers. Therefore, this study proposes a novel error data detection approach suitable for reducing noise in unlabeled biomedical event data. First, we construct the mislabeled dataset through error data analysis with the development dataset. The sample pairs’ vector representations are then obtained by the means of sequence patterns and the joint model of convolutional neural network and long short-term memory recurrent neural network. Following this, the sample identification strategy is proposed, using error detection based on pair representation for unlabeled data. With the latter, the selected samples are added to enrich the training dataset and improve the classification performance. In the BioNLP Shared Task GENIA, the experiments results indicate that the proposed approach is competent in extract the biomedical event from biomedical literature. Our approach can effectively filter some noisy examples and build a satisfactory prediction model.  相似文献   

8.
实现工艺信息在三维模型上的标识是目前三维数字化工艺研究的热点.针对此问题,论文提出了一种基于MBD技术和GB/T 24734-2009的机加工工艺信息标识方法.该方法有效地规范了机加工工艺信息标识符号的构建和组合方法,并通过三维标注实现工艺信息与模型的关联,使工艺设计人员和制造人员能快速直观地浏览与获取加工工艺信息.在以上研究的基础上,基于Pro/E平台进行二次开发,并通过实例验证了该方法的可行性和实用性.  相似文献   

9.
Sentiment classification is a useful tool to classify reviews about sentiments and attitudes towards a product or service. Existing studies heavily rely on sentiment classification methods that require fully annotated inputs. However, there is limited labelled text available, making the acquirement process of the fully annotated input costly and labour-intensive. Lately, semi-supervised methods emerge as they require only partially labelled input but perform comparably to supervised methods. Nevertheless, some works reported that the performance of the semi-supervised model degraded after adding unlabelled instances into training. Literature also shows that not all unlabelled instances are equally useful; thus identifying the informative unlabelled instances is beneficial in training a semi-supervised model. To achieve this, an informative score is proposed and incorporated into semi-supervised sentiment classification. The evaluation is performed on a semi-supervised method without an informative score and with an informative score. By using the informative score in the instance selection strategy to identify informative unlabelled instances, semi-supervised models perform better compared to models that do not incorporate informative scores into their training. Although the performance of semi-supervised models incorporated with an informative score is not able to surpass the supervised models, the results are still found promising as the differences in performance are subtle with a small difference of 2% to 5%, but the number of labelled instances used is greatly reduced from 100% to 40%. The best finding of the proposed instance selection strategy is achieved when incorporating an informative score with a baseline confidence score at a 0.5:0.5 ratio using only 40% labelled data.  相似文献   

10.
Liquid chromatography coupled to mass spectrometry is routinely used for metabolomics experiments. In contrast to the fairly routine and automated data acquisition steps, subsequent compound annotation and identification require extensive manual analysis and thus form a major bottleneck in data interpretation. Here we present CAMERA, a Bioconductor package integrating algorithms to extract compound spectra, annotate isotope and adduct peaks, and propose the accurate compound mass even in highly complex data. To evaluate the algorithms, we compared the annotation of CAMERA against a manually defined annotation for a mixture of known compounds spiked into a complex matrix at different concentrations. CAMERA successfully extracted accurate masses for 89.7% and 90.3% of the annotatable compounds in positive and negative ion modes, respectively. Furthermore, we present a novel annotation approach that combines spectral information of data acquired in opposite ion modes to further improve the annotation rate. We demonstrate the utility of CAMERA in two different, easily adoptable plant metabolomics experiments, where the application of CAMERA drastically reduced the amount of manual analysis.  相似文献   

11.
Sentiment analysis (AS) is one of the basic research directions in natural language processing (NLP), it is widely adopted for news, product review, and politics. Aspect-based sentiment analysis (ABSA) aims at identifying the sentiment polarity of a given target context, previous existing model of sentiment analysis possesses the issue of the insufficient exaction of features which results in low accuracy. Hence this research work develops a deep-semantic and contextual knowledge networks (DSCNet). DSCNet tends to exploit the semantic and contextual knowledge to understand the context and enhance the accuracy based on given aspects. At first temporal relationships are established then deep semantic knowledge and contextual knowledge are introduced. Further, a deep integration layer is introduced to measure the importance of features for efficient extraction of different dimensions. Novelty of DSCNet model lies in introducing the deep contextual. DSCNet is evaluated on three datasets i.e., Restaurant, Laptop, and Twitter dataset considering different deep learning (DL) metrics like precision, recall, accuracy, and Macro-F1 score. Also, comparative analysis is carried out with different baseline methods in terms of accuracy and Macro-F1 score. DSCNet achieves 92.59% of accuracy on restaurant dataset, 86.99% of accuracy on laptop dataset and 78.76% of accuracy on Twitter dataset.  相似文献   

12.
Anatomical analysis of liver region is an essential and key step for liver-related disease diagnosis and treatment. One of the challenging issues is to annotate the functional regions of liver automatically or semi- automatically by analyzing Computed Tomography (CT) images. The present study developed a complete liver annotation system with an improved vessel-skeletonization method is proposed for CT images. In the first step, an automatic level set method and a customized region-growing method are applied to extract the liver region including vessels and tumors. Next, a modified iterative thinning method is developed to obtain the geometric structure of liver vessels and mark a vessel skeleton. The three-dimensional information is transformed into a tree data structure for storage. Based on the branch distribution of portal vein skeleton, a model-based method with a modified nearest neighbor segment approximation (NNSA) algorithm is adopted for the functional liver anatomy. Three experiments involving five 64-row liver CT datasets are performed. The accuracies of segmentation and annotation results were validated by an experienced doctor. Compared with different methods, our proposed vessel skeletonization method can simultaneously preserve the connectivity of the vasculature topology and generate the skeleton in a shorter time. Furthermore, our proposed annotation system can provide both visual and measurable information of livers. These experimental results demonstrate the usefulness and effectiveness of our proposed method. Our liver annotation system is helpful to evaluate the function of liver system and support diagnosis of liver disease.  相似文献   

13.
Owing to the continuous barrage of cyber threats, there is a massive amount of cyber threat intelligence. However, a great deal of cyber threat intelligence come from textual sources. For analysis of cyber threat intelligence, many security analysts rely on cumbersome and time-consuming manual efforts. Cybersecurity knowledge graph plays a significant role in automatics analysis of cyber threat intelligence. As the foundation for constructing cybersecurity knowledge graph, named entity recognition (NER) is required for identifying critical threat-related elements from textual cyber threat intelligence. Recently, deep neural network-based models have attained very good results in NER. However, the performance of these models relies heavily on the amount of labeled data. Since labeled data in cybersecurity is scarce, in this paper, we propose an adversarial active learning framework to effectively select the informative samples for further annotation. In addition, leveraging the long short-term memory (LSTM) network and the bidirectional LSTM (BiLSTM) network, we propose a novel NER model by introducing a dynamic attention mechanism into the BiLSTM-LSTM encoderdecoder. With the selected informative samples annotated, the proposed NER model is retrained. As a result, the performance of the NER model is incrementally enhanced with low labeling cost. Experimental results show the effectiveness of the proposed method.  相似文献   

14.
Tay CJ  Quan C  Chen L 《Applied optics》2005,44(8):1401-1409
A three-frame phase-shifting algorithm with a constant but unknown phase shift is proposed. The algorithm is based on background-intensity removal prior to phase retrieval to eliminate an undetermined factor in a fringe pattern. The proposed method is validated on three-dimensional profilometry by fringe projection and on deformation measurement by means of digital speckle shearing interferometry. For a fringe pattern with slow-varying background intensity, the background removal is achieved in the frequency domain. For a speckle pattern, a background removal technique is integrated with the three-frame algorithm. In this process, manual intervention is minimal, and high computational speed is achieved. In addition, high-frequency phase signals would not be removed in the noise-reduction process as is the case in the bandpass-filtering technique. Accuracy of the method is discussed.  相似文献   

15.
Nowadays, the amount of wed data is increasing at a rapid speed, which presents a serious challenge to the web monitoring. Text sentiment analysis, an important research topic in the area of natural language processing, is a crucial task in the web monitoring area. The accuracy of traditional text sentiment analysis methods might be degraded in dealing with mass data. Deep learning is a hot research topic of the artificial intelligence in the recent years. By now, several research groups have studied the sentiment analysis of English texts using deep learning methods. In contrary, relatively few works have so far considered the Chinese text sentiment analysis toward this direction. In this paper, a method for analyzing the Chinese text sentiment is proposed based on the convolutional neural network (CNN) in deep learning in order to improve the analysis accuracy. The feature values of the CNN after the training process are nonuniformly distributed. In order to overcome this problem, a method for normalizing the feature values is proposed. Moreover, the dimensions of the text features are optimized through simulations. Finally, a method for updating the learning rate in the training process of the CNN is presented in order to achieve better performances. Experiment results on the typical datasets indicate that the accuracy of the proposed method can be improved compared with that of the traditional supervised machine learning methods, e.g., the support vector machine method.  相似文献   

16.
Atlas‐based segmentation is a high level segmentation technique which has become a standard paradigm for exploiting prior knowledge in image segmentation. Recent multiatlas‐based methods have provided greatly accurate segmentations of different parts of the human body by propagating manual delineations from multiple atlases in a data set to a query subject and fusing them. The female pelvic region is known to be of high variability which makes the segmentation task difficult. We propose, here, an approach for the segmentation of magnetic resonance imaging (MRI) called multiatlas‐based segmentation using online machine learning (OML). The proposed approach allows separating regions which may be affected by cervical cancer in a female pelvic MRI. The suggested approach is based on an online learning method for the construction of the dataset of atlases. The experiments demonstrate the higher accuracy of the suggested approach compared to a segmentation technique based on a fixed dataset of atlases and single‐atlas‐based segmentation technique.  相似文献   

17.
User‐generated reviews can serve as an efficient tool for evaluating the customer‐perceived quality of online products and services. This article proposes a joint control chart for monitoring the quantitative evolution of document‐level topics and sentiments in online customer reviews. A sequential model is constructed to convert the temporally correlated document collections to topic and sentiment distributions, which are subsequently used to monitor the topics that users are concerned about and the topic‐specific opinions in an ongoing product and service process. Simulation studies on various data scenarios demonstrate the superior performance of the proposed control chart in terms of both detecting shifts and identifying truly out‐of‐control terms.  相似文献   

18.
Social networking services (SNSs) provide massive data that can be a very influential source of information during pandemic outbreaks. This study shows that social media analysis can be used as a crisis detector (e.g., understanding the sentiment of social media users regarding various pandemic outbreaks). The novel Coronavirus Disease-19 (COVID-19), commonly known as coronavirus, has affected everyone worldwide in 2020. Streaming Twitter data have revealed the status of the COVID-19 outbreak in the most affected regions. This study focuses on identifying COVID-19 patients using tweets without requiring medical records to find the COVID-19 pandemic in Twitter messages (tweets). For this purpose, we propose herein an intelligent model using traditional machine learning-based approaches, such as support vector machine (SVM), logistic regression (LR), naïve Bayes (NB), random forest (RF), and decision tree (DT) with the help of the term frequency inverse document frequency (TF-IDF) to detect the COVID-19 pandemic in Twitter messages. The proposed intelligent traditional machine learning-based model classifies Twitter messages into four categories, namely, confirmed deaths, recovered, and suspected. For the experimental analysis, the tweet data on the COVID-19 pandemic are analyzed to evaluate the results of traditional machine learning approaches. A benchmark dataset for COVID-19 on Twitter messages is developed and can be used for future research studies. The experiments show that the results of the proposed approach are promising in detecting the COVID-19 pandemic in Twitter messages with overall accuracy, precision, recall, and F1 score between 70% and 80% and the confusion matrix for machine learning approaches (i.e., SVM, NB, LR, RF, and DT) with the TF-IDF feature extraction technique.  相似文献   

19.
Printed Circuit Boards (PCBs) are very important for proper functioning of any electronic device. PCBs are installed in almost all the electronic device and their functionality is dependent on the perfection of PCBs. If PCBs do not function properly then the whole electric machine might fail. So, keeping this in mind researchers are working in this field to develop error free PCBs. Initially these PCBs were examined by the human beings manually, but the human error did not give good results as sometime defected PCBs were categorized as non-defective. So, researchers and experts transformed this manual traditional examination to automated systems. Further to this research image processing and computer vision came into actions where the computer vision experts applied image processing techniques to extract the defects. But, this also did not yield good results. So, to further explore this area Machine Learning and Artificial Intelligence Techniques were applied. In this study we have applied Deep Neural Networks to detect the defects in the PCBS. Pretrained VGG16 and Inception networks were applied to extract the relevant features. DeepPCB dataset was used in this study, it has 1500 pairs of both defected and non-defected images. Image pre-processing and data augmentation techniques were applied to increase the training set. Convolution neural networks were applied to classify the test data. The results were compared with state-of-the art technique and it proved that the proposed methodology outperformed it. Performance evaluation metrics were applied to evaluate the proposed methodology. Precision 94.11%, Recall 89.23%, F-Measure 91.91%, and Accuracy 92.67%.  相似文献   

20.
We propose to perform an image-based framework for electrical energy meter reading. Our aim is to extract the image region that depicts the digits and then recognize them to record the consumed units. Combining the readings of serial numbers and energy meter units, an automatic billing system using the Internet of Things and a graphical user interface is deployable in a real-time setup. However, such region extraction and character recognition become challenging due to image variations caused by several factors such as partial occlusion due to dust on the meter display, orientation and scale variations caused by camera positioning, and non-uniform illumination caused by shades. To this end, our work evaluates and compares the state-of-the art deep learning algorithm You Only Look Once (YOLO ) along with traditional handcrafted features for text extraction and recognition. Our image dataset contains 10,000 images of electrical energy meters and is further expanded by data augmentation such as in-plane rotation and scaling to make the deep learning algorithms robust to these image variations. For training and evaluation, the image dataset is annotated to produce the ground truth of all the images. Consequently, YOLO achieves superior performance over the traditional handcrafted features with an average recognition rate of 98% for all the digits. It proves to be robust against the mentioned image variations compared with the traditional handcrafted features. Our proposed method can be highly instrumental in reducing the time and effort involved in the current meter reading, where workers visit door to door, take images of meters and manually extract readings from these images.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号