首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rule-based classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge by training a supervised classifier with in-domain features, such as bag of words, on instances labeled by a rule-based classifier. Thus, this approach can be considered as a simple and effective method for domain adaptation. Among the list of components of this approach, we investigate how important the quality of the rule-based classifier is and what features are useful for the supervised classifier. In particular, the former addresses the issue in how far linguistic modeling is relevant for this task. We not only examine how this method performs under more difficult settings in which classes are not balanced and mixed reviews are included in the data set but also compare how this linguistically-driven method relates to state-of-the-art statistical domain adaptation.  相似文献   

2.
Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.  相似文献   

3.
In this paper, we present human emotion recognition systems based on audio and spatio-temporal visual features. The proposed system has been tested on audio visual emotion data set with different subjects for both genders. The mel-frequency cepstral coefficient (MFCC) and prosodic features are first identified and then extracted from emotional speech. For facial expressions spatio-temporal features are extracted from visual streams. Principal component analysis (PCA) is applied for dimensionality reduction of the visual features and capturing 97 % of variances. Codebook is constructed for both audio and visual features using Euclidean space. Then occurrences of the histograms are employed as input to the state-of-the-art SVM classifier to realize the judgment of each classifier. Moreover, the judgments from each classifier are combined using Bayes sum rule (BSR) as a final decision step. The proposed system is tested on public data set to recognize the human emotions. Experimental results and simulations proved that using visual features only yields on average 74.15 % accuracy, while using audio features only gives recognition average accuracy of 67.39 %. Whereas by combining both audio and visual features, the overall system accuracy has been significantly improved up to 80.27 %.  相似文献   

4.
Advances in high-throughput genome sequencing technology have led to an explosion in the amount of sequence data that are available. The determination of protein function using experimental techniques is time-consuming and expensive; the use of machine-learning techniques rapidly to assess protein function may be useful in streamlining this process. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions. We have developed a tree-based classifier that is capable of handling multiple-labelled data and gaining an insight into the multi-functional nature of proteins. We call the resulting tree a recursive maximum contrast tree (RMCT) and the resulting classifier a multiple-labelled instance classifier (MLIC). We investigate the synergy of machine-learning-based ensemble methods and physiochemical-based feature augments. We test our algorithm on protein phylogenetic profiles generated from 60 completely sequenced genomes and we compare our results with those achieved by algorithms such as support vector machines and decision trees.  相似文献   

5.
Hand gesture recognition provides an alternative way to many devices for human computer interaction. In this work, we have developed a classifier fusion based dynamic free-air hand gesture recognition system to identify the isolated gestures. Different users gesticulate at different speed for the same gesture. Hence, when comparing different samples of the same gesture, variations due to difference in gesturing speed should not contribute to the dissimilarity score. Thus, we have introduced a two-level speed normalization procedure using DTW and Euclidean distance-based techniques. Three features such as ‘orientation between consecutive points’, ‘speed’ and ‘orientation between first and every trajectory points’ were used for the speed normalization. Moreover, in feature extraction stage, 44 features were selected from the existing literatures. Use of total feature set could lead to overfitting, information redundancy and may increase the computational complexity due to higher dimension. Thus, we have tried to overcome this difficulty by selecting optimal set of features using analysis of variance and incremental feature selection techniques. The performance of the system was evaluated using this optimal set of features for different individual classifiers such as ANN, SVM, k-NN and Naïve Bayes. Finally, the decisions of the individual classifiers were combined using classifier fusion model. Based on the experimental results it may be concluded that classifier fusion provides satisfactory results compared to other individual classifiers. An accuracy of 94.78 % was achieved using the classifier fusion technique as compared to baseline CRF (85.07 %) and HCRF (89.91 %) models.  相似文献   

6.
This paper proposes a system for the early automatic recognition of health problems that manifest themselves in distinctive form of gait. Purpose of the system is to prolong the autonomous living of the elderly at home. When the system identifies a health problem, it automatically notifies a physician and provides an explanation of the automatic diagnosis. The gait of the elderly user is captured using a motion-capture system, which consists of body-worn tags and wall-mounted sensors. The positions of the tags are acquired by the sensors and the resulting time series of position coordinates are analyzed with machine-learning algorithms in order to recognize a specific health problem. Novel semantic features based on medical knowledge for training a machine-learning classifier are proposed in this paper. The classifier classifies the user’s gait into: 1) normal, 2) with hemiplegia, 3) with Parkinson’s disease, 4) with pain in the back and 5) with pain in the leg. The studies of 1) the feasibility of automatic recognition and 2) the impact of tag placement and noise level on the accuracy of the recognition of health problems are presented. The experimental results of the first study (12 tags, no noise) showed that the k-nearest neighbors and neural network algorithms achieved classification accuracies of 100%. The experimental results of the second study showed that classification accuracy of over 99% is achievable using several machine-learning algorithms and 8 or more tags with up to 15 mm standard deviation of noise. The results show that the proposed approach achieves high classification accuracy and can be used as a guide for further studies in the increasingly important area of Ambient Assisted Living. Since the system uses semantic features and an artificial-intelligence approach to interpret the health state, provides a natural explanation of the hypothesis and is embedded in the domestic environment of the elderly person; it is an example of the semantic ambient media for Ambient Assisted Living.  相似文献   

7.
与核酸作用的蛋白质在基因功能许多方面扮演着极其重要的角色,预测蛋白质是否与核酸作用在生物信息学领域受到广泛关注。本文用氨基酸组成、氨基酸物化特性和蛋白质结构等信息作为特征参数,通过支持向量机方法预测了与核酸作用的蛋白质。分别取与rRNA,RNA和DNA作用的3个蛋白质数据集,用SVM训练,筛选最优核函数,优化核函数参数,建立分类判别模型,并用于预测蛋白质是否与核酸作用。结果表明:即使对同源相似性低于40%的蛋白质,通过用10-crossvalidation(交叉验证)方法测试上述3个数据集都分别有93.75%、83.41%、81.85%的预测正确率。用外部测试集测试所得模型分别有93.8%、84.2%、81.9%的预测正确率。在此基础上,我们建立了1个预测蛋白质与核酸是否作用的网上在线软件系统。网址是:http://chemdata.shu.edu.cn/protein_na。  相似文献   

8.
This paper presents a fault diagnosis technique based on acoustic emission (AE) analysis with the Hilbert–Huang Transform (HHT) and data mining tool. HHT analyzes the AE signal using intrinsic mode functions (IMFs), which are extracted using the process of Empirical Mode Decomposition (EMD). Instead of time domain approach with Hilbert transform, FFT of IMFs from HHT process are utilized to represent the time frequency domain approach for efficient signal response from rolling element bearing. Further, extracted statistical and acoustic features are used to select proper data mining based fault classifier with or without filter. K-nearest neighbor algorithm is observed to be more efficient classifier with default setting parameters in WEKA. APF-KNN approach, which is based on asymmetric proximity function with optimize feature selection shows better classification accuracy is used. Experimental evaluation for time frequency approach is presented for five bearing conditions such as healthy bearing, bearing with outer race, inner race, ball and combined defect. The experimental results show that the proposed method can increase reliability for the faults diagnosis of ball bearing.  相似文献   

9.
An SVM-AdaBoost facial expression recognition system   总被引:1,自引:0,他引:1  
This study is focused on improving the recognition rate and processing time of facial recognition systems. First, the skin is detected by pixel based methods to reduce the searching space for maximum rejection classifier (MRC) which detects the face. The detected face is normalized by a discrete cosine transform (DCT) and down-sampled by Bessel transform. Gabor feature extraction techniques were utilized to extract thousands of facial features that represent facial deformation patterns. An AdaBoost-based hypothesis is formulated to select a few hundreds of Gabor features which are potential candidates for expression recognition. The selected features were fed into a saturated vector machine (SVM) classifier to train it. An average recognition rate of 97.57 % and 92.33 % are registered in JAFFE and Yale databases respectively. The execution time of the proposed method is also significantly lower than others. Generally, the proposed method exhibits superior performance than other methods.  相似文献   

10.
虽然近年来情感分析相关研究取得很大进展,但跨领域属性情感分析仍是一个挑战。现有的方法主要关注源领域和目标领域的共有信息,忽略了目标领域的特有信息。此外,情感词作为句子中的重要信息,不仅能反映属性的情感极性,而且可以被划分为共有情感词和特有情感词。针对目标领域的特有信息和情感词,该文提出领域特有情感词注意力模型(DSSW-ATT)。该模型设立两个独立的子空间,分别使用注意力机制提取共有情感词特征和特有情感词特征,并建立相应的共有特征分类器和特有特征分类器,同时使用协同训练方法融合这两种特征。该文还构建了酒店领域(源领域)和手机领域(目标领域)的属性级用户评论数据集。在该数据集上的实验结果表明,该方法明显优于基线方法。  相似文献   

11.
从Web中快速、准确地检索出所需信息的迫切需求催生了专业搜索引擎技术。在专业搜索引擎中,网络爬虫(Crawler)负责在Web上搜集特定专业领域的信息,是专业搜索引擎的重要核心部件。该文对中文专业网页的爬取问题进行了研究,基于KL距离验证了网页内容与链接前后文在分布上的差异,在此基础上提出了以链接锚文本及其前后文为特征、Nave Bayes分类器制导的中文专业网页爬取算法,设计了自动获取带链接类标的训练数据的算法。以金融专业网页的爬取为例,分别对所提出的算法进行了离线和在线测试,结果表明,Nave Bayes分类器制导的网络爬虫可以达到近90%的专业网页收割率。  相似文献   

12.
It is widely believed that human brain is a complicated network and many neurological disorders such as Alzheimer’s disease (AD) are related to abnormal changes of the brain network architecture. In this work, we present a kernel-based method to establish a network for each subject using mean cortical thickness, which we refer to hereafter as the individual’s network. We construct individual networks for 83 subjects, including AD patients and normal controls (NC), which are taken from the Open Access Series of Imaging Studies database. The network edge features are used to make prediction of AD/NC through the sophisticated machine learning technology. As the number of edge features is much more than that of samples, feature selection is applied to avoid the adverse impact of high-dimensional data on the performance of classifier. We use a hybrid feature selection that combines filter and wrapper methods, and compare the performance of six different combinations of them. Finally, support vector machines are trained using the selected features. To obtain an unbiased evaluation of our method, we use a nested cross validation framework to choose the optimal hyper-parameters of classifier and evaluate the generalization of the method. We report the best accuracy of 90.4 % using the proposed method in the leave-one-out analysis, outperforming that using the raw cortical thickness data by more than 10 %.  相似文献   

13.
14.
The security of handwritten documents is very important in authentication systems. In this paper, a forgery detection method is proposed for verifying handwritten documents. This method proposes two types of novel features: macro and micro. Macro features extract the structure of handwritten while micro features extract more detailed information. Also, the micro features try to extract some properties similar to online properties from offline data such as pen pressure and velocity. After extracting those features a PCA is applied to them which resulted in reducing the feature vector. A simple positive classifier is used separately to detect forgeries. It is very important that the weights of this classifier have been adjusted based on positive data because it is not possible to use forgery samples in adjusting phase. To test the proposed method a Persian handwritten data set was prepared using four kinds of forgeries; random, unskilled, skilled, and mimic. This data set consists of numbers written by text as reference words. The method performance using these different reference words showed the best result in correct rejection was 87 % while the correct acceptance was 97 %. We believe the proposed method can be applied to other languages by adjusting some parameters but because it is very important to have the data in high resolution format (e.g. 1,200 dpi) and none of databases have such resolution, the method was only applied to the dataset we gathered.  相似文献   

15.
利用机器学习的乳腺癌组织病理图像诊断节省了大量的人力物力,因此提高乳腺癌组织病理图像识别准确率有很好的现实意义;针对单一分类器和集成学习分类器模型观测域有限容易陷入局部最优的问题,提出一种基于联合训练的分类器模型;通过单一分类器相互影响扩大观测感知域来寻找损失最小的估计点,根据估计点来迭代优化超参数进而联合训练出拟合性能最好的分类器,这样既汲取不同分类器模型的可取之处来增强泛化能力,又加大了模型观测域在可以更快的得到全局最优的同时提升了识别准确率;实验表明,提出的联合训练的分类器能够提升乳腺癌组织病理学图像的分类性能,在不同放大倍数40×、100×、200×、400×下图像良恶性分类准确率分别为99.67%、98.08%、99.01%、96.34%。  相似文献   

16.
17.
In this paper, we perform a noise analysis to assess the degree of robustness to noise of a neural classifier aimed at performing multi-class diagnosis of rolling element bearings. We work on vibration signals collected by means of two accelerometers and we consider ten levels of noise, each of which characterized by a different signal-to-noise ratio ranging from 40.55 to ?11.35 db. We classify the noisy signals by means of a neural classifier initially trained on signals without noise and then we repeat the training process with signals affected by increasing levels of noise. We show that adding noisy signals to the training set we can significantly increase the classification accuracy of a single classifier. Finally, we apply the two most used strategies to combine classifiers: classifier fusion and classifier selection, and show that, in both cases, we can significantly increase the performance of the single best classifier. In particular, classifier selection achieves the best results for low and medium levels of noise, while classifier fusion is the most accurate for high levels of noise. The analysis presented in the paper can be profitably used to identify both the type of classifier (e.g., single classifier or classifier ensemble) and how many and which noise levels should be used in the training phase in order to achieve the desired classification accuracy in the application domain of interest.  相似文献   

18.
19.
Video indexing is employed to represent the features of video sequences. Motion vectors derived from compressed video are preferred for video indexing because they can be accessed by partial decoding; thus, they are used extensively in various video analysis and indexing applications. In this study, we introduce an efficient compressed domain video indexing method and implement it on the H.264/AVC coded videos. The video retrieval experimental evaluations indicate that the video retrieval based on the proposed indexing method outperforms motion vector based video retrieval in 74 % of queries with little increase in computation time. Furthermore, we compared our method with a pixel level video indexing method which employs both temporal and spatial features. Experimental evaluation results indicate that our method outperforms the pixel level method both in performance and speed. Hence considering the speed and precision characteristics of indexing methods, the proposed method is an efficient indexing method which can be used in various video indexing and retrieval applications.  相似文献   

20.
针对传统机器学习算法中仍需手工操作表示特征的问题,提出了一种基于堆栈式降噪自编码器(SDAE)深度网络的蛋白质亚细胞定位算法。首先,分别利用改进型伪氨基酸组成法(PseAAC)、伪位置特异性得分矩阵法(PsePSSM)和三联体编码法(CT)对蛋白质序列进行特征提取,并将这三种方法得到的特征向量进行融合,以得到一个全新的蛋白质序列特征表达模型;接着,将融合后的特征向量输入到SDAE深度网络里自动学习更有效的特征表示;然后选用Softmax回归分类器进行亚细胞的分类预测,并采用留一法在Viral proteins和Plant proteins两个数据集上进行交叉验证;最后,将所提算法的结果与mGOASVM、HybridGO-Loc等多种现有算法的结果进行比较。实验结果表明,所提算法在Viral proteins数据集上取得了98.24%的准确率,与mGOASVM算法相比提高了9.35个百分点;同时所提算法在Plant proteins数据集上取得了97.63%的准确率,比mGOASVM算法和HybridGO-Loc算法分别提高了10.21个百分点和4.07个百分点。综上说明所提算法可以有效提高蛋白质亚细胞定位预测的准确性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号