首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rule-based classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge by training a supervised classifier with in-domain features, such as bag of words, on instances labeled by a rule-based classifier. Thus, this approach can be considered as a simple and effective method for domain adaptation. Among the list of components of this approach, we investigate how important the quality of the rule-based classifier is and what features are useful for the supervised classifier. In particular, the former addresses the issue in how far linguistic modeling is relevant for this task. We not only examine how this method performs under more difficult settings in which classes are not balanced and mixed reviews are included in the data set but also compare how this linguistically-driven method relates to state-of-the-art statistical domain adaptation.  相似文献   

2.
Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.  相似文献   

3.
In this paper, we present human emotion recognition systems based on audio and spatio-temporal visual features. The proposed system has been tested on audio visual emotion data set with different subjects for both genders. The mel-frequency cepstral coefficient (MFCC) and prosodic features are first identified and then extracted from emotional speech. For facial expressions spatio-temporal features are extracted from visual streams. Principal component analysis (PCA) is applied for dimensionality reduction of the visual features and capturing 97 % of variances. Codebook is constructed for both audio and visual features using Euclidean space. Then occurrences of the histograms are employed as input to the state-of-the-art SVM classifier to realize the judgment of each classifier. Moreover, the judgments from each classifier are combined using Bayes sum rule (BSR) as a final decision step. The proposed system is tested on public data set to recognize the human emotions. Experimental results and simulations proved that using visual features only yields on average 74.15 % accuracy, while using audio features only gives recognition average accuracy of 67.39 %. Whereas by combining both audio and visual features, the overall system accuracy has been significantly improved up to 80.27 %.  相似文献   

4.
Advances in high-throughput genome sequencing technology have led to an explosion in the amount of sequence data that are available. The determination of protein function using experimental techniques is time-consuming and expensive; the use of machine-learning techniques rapidly to assess protein function may be useful in streamlining this process. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions. We have developed a tree-based classifier that is capable of handling multiple-labelled data and gaining an insight into the multi-functional nature of proteins. We call the resulting tree a recursive maximum contrast tree (RMCT) and the resulting classifier a multiple-labelled instance classifier (MLIC). We investigate the synergy of machine-learning-based ensemble methods and physiochemical-based feature augments. We test our algorithm on protein phylogenetic profiles generated from 60 completely sequenced genomes and we compare our results with those achieved by algorithms such as support vector machines and decision trees.  相似文献   

5.
Hand gesture recognition provides an alternative way to many devices for human computer interaction. In this work, we have developed a classifier fusion based dynamic free-air hand gesture recognition system to identify the isolated gestures. Different users gesticulate at different speed for the same gesture. Hence, when comparing different samples of the same gesture, variations due to difference in gesturing speed should not contribute to the dissimilarity score. Thus, we have introduced a two-level speed normalization procedure using DTW and Euclidean distance-based techniques. Three features such as ‘orientation between consecutive points’, ‘speed’ and ‘orientation between first and every trajectory points’ were used for the speed normalization. Moreover, in feature extraction stage, 44 features were selected from the existing literatures. Use of total feature set could lead to overfitting, information redundancy and may increase the computational complexity due to higher dimension. Thus, we have tried to overcome this difficulty by selecting optimal set of features using analysis of variance and incremental feature selection techniques. The performance of the system was evaluated using this optimal set of features for different individual classifiers such as ANN, SVM, k-NN and Naïve Bayes. Finally, the decisions of the individual classifiers were combined using classifier fusion model. Based on the experimental results it may be concluded that classifier fusion provides satisfactory results compared to other individual classifiers. An accuracy of 94.78 % was achieved using the classifier fusion technique as compared to baseline CRF (85.07 %) and HCRF (89.91 %) models.  相似文献   

6.
This paper proposes a system for the early automatic recognition of health problems that manifest themselves in distinctive form of gait. Purpose of the system is to prolong the autonomous living of the elderly at home. When the system identifies a health problem, it automatically notifies a physician and provides an explanation of the automatic diagnosis. The gait of the elderly user is captured using a motion-capture system, which consists of body-worn tags and wall-mounted sensors. The positions of the tags are acquired by the sensors and the resulting time series of position coordinates are analyzed with machine-learning algorithms in order to recognize a specific health problem. Novel semantic features based on medical knowledge for training a machine-learning classifier are proposed in this paper. The classifier classifies the user’s gait into: 1) normal, 2) with hemiplegia, 3) with Parkinson’s disease, 4) with pain in the back and 5) with pain in the leg. The studies of 1) the feasibility of automatic recognition and 2) the impact of tag placement and noise level on the accuracy of the recognition of health problems are presented. The experimental results of the first study (12 tags, no noise) showed that the k-nearest neighbors and neural network algorithms achieved classification accuracies of 100%. The experimental results of the second study showed that classification accuracy of over 99% is achievable using several machine-learning algorithms and 8 or more tags with up to 15 mm standard deviation of noise. The results show that the proposed approach achieves high classification accuracy and can be used as a guide for further studies in the increasingly important area of Ambient Assisted Living. Since the system uses semantic features and an artificial-intelligence approach to interpret the health state, provides a natural explanation of the hypothesis and is embedded in the domestic environment of the elderly person; it is an example of the semantic ambient media for Ambient Assisted Living.  相似文献   

7.
与核酸作用的蛋白质在基因功能许多方面扮演着极其重要的角色,预测蛋白质是否与核酸作用在生物信息学领域受到广泛关注。本文用氨基酸组成、氨基酸物化特性和蛋白质结构等信息作为特征参数,通过支持向量机方法预测了与核酸作用的蛋白质。分别取与rRNA,RNA和DNA作用的3个蛋白质数据集,用SVM训练,筛选最优核函数,优化核函数参数,建立分类判别模型,并用于预测蛋白质是否与核酸作用。结果表明:即使对同源相似性低于40%的蛋白质,通过用10-crossvalidation(交叉验证)方法测试上述3个数据集都分别有93.75%、83.41%、81.85%的预测正确率。用外部测试集测试所得模型分别有93.8%、84.2%、81.9%的预测正确率。在此基础上,我们建立了1个预测蛋白质与核酸是否作用的网上在线软件系统。网址是:http://chemdata.shu.edu.cn/protein_na。  相似文献   

8.
An SVM-AdaBoost facial expression recognition system   总被引:1,自引:0,他引:1  
This study is focused on improving the recognition rate and processing time of facial recognition systems. First, the skin is detected by pixel based methods to reduce the searching space for maximum rejection classifier (MRC) which detects the face. The detected face is normalized by a discrete cosine transform (DCT) and down-sampled by Bessel transform. Gabor feature extraction techniques were utilized to extract thousands of facial features that represent facial deformation patterns. An AdaBoost-based hypothesis is formulated to select a few hundreds of Gabor features which are potential candidates for expression recognition. The selected features were fed into a saturated vector machine (SVM) classifier to train it. An average recognition rate of 97.57 % and 92.33 % are registered in JAFFE and Yale databases respectively. The execution time of the proposed method is also significantly lower than others. Generally, the proposed method exhibits superior performance than other methods.  相似文献   

9.
This paper presents a fault diagnosis technique based on acoustic emission (AE) analysis with the Hilbert–Huang Transform (HHT) and data mining tool. HHT analyzes the AE signal using intrinsic mode functions (IMFs), which are extracted using the process of Empirical Mode Decomposition (EMD). Instead of time domain approach with Hilbert transform, FFT of IMFs from HHT process are utilized to represent the time frequency domain approach for efficient signal response from rolling element bearing. Further, extracted statistical and acoustic features are used to select proper data mining based fault classifier with or without filter. K-nearest neighbor algorithm is observed to be more efficient classifier with default setting parameters in WEKA. APF-KNN approach, which is based on asymmetric proximity function with optimize feature selection shows better classification accuracy is used. Experimental evaluation for time frequency approach is presented for five bearing conditions such as healthy bearing, bearing with outer race, inner race, ball and combined defect. The experimental results show that the proposed method can increase reliability for the faults diagnosis of ball bearing.  相似文献   

10.
虽然近年来情感分析相关研究取得很大进展,但跨领域属性情感分析仍是一个挑战.现有的方法主要关注源领域和目标领域的共有信息,忽略了目标领域的特有信息.此外,情感词作为句子中的重要信息,不仅能反映属性的情感极性,而且可以被划分为共有情感词和特有情感词.针对目标领域的特有信息和情感词,该文提出领域特有情感词注意力模型(DSSW...  相似文献   

11.
It is widely believed that human brain is a complicated network and many neurological disorders such as Alzheimer’s disease (AD) are related to abnormal changes of the brain network architecture. In this work, we present a kernel-based method to establish a network for each subject using mean cortical thickness, which we refer to hereafter as the individual’s network. We construct individual networks for 83 subjects, including AD patients and normal controls (NC), which are taken from the Open Access Series of Imaging Studies database. The network edge features are used to make prediction of AD/NC through the sophisticated machine learning technology. As the number of edge features is much more than that of samples, feature selection is applied to avoid the adverse impact of high-dimensional data on the performance of classifier. We use a hybrid feature selection that combines filter and wrapper methods, and compare the performance of six different combinations of them. Finally, support vector machines are trained using the selected features. To obtain an unbiased evaluation of our method, we use a nested cross validation framework to choose the optimal hyper-parameters of classifier and evaluate the generalization of the method. We report the best accuracy of 90.4 % using the proposed method in the leave-one-out analysis, outperforming that using the raw cortical thickness data by more than 10 %.  相似文献   

12.
从Web中快速、准确地检索出所需信息的迫切需求催生了专业搜索引擎技术。在专业搜索引擎中,网络爬虫(Crawler)负责在Web上搜集特定专业领域的信息,是专业搜索引擎的重要核心部件。该文对中文专业网页的爬取问题进行了研究,基于KL距离验证了网页内容与链接前后文在分布上的差异,在此基础上提出了以链接锚文本及其前后文为特征、Nave Bayes分类器制导的中文专业网页爬取算法,设计了自动获取带链接类标的训练数据的算法。以金融专业网页的爬取为例,分别对所提出的算法进行了离线和在线测试,结果表明,Nave Bayes分类器制导的网络爬虫可以达到近90%的专业网页收割率。  相似文献   

13.
14.
The security of handwritten documents is very important in authentication systems. In this paper, a forgery detection method is proposed for verifying handwritten documents. This method proposes two types of novel features: macro and micro. Macro features extract the structure of handwritten while micro features extract more detailed information. Also, the micro features try to extract some properties similar to online properties from offline data such as pen pressure and velocity. After extracting those features a PCA is applied to them which resulted in reducing the feature vector. A simple positive classifier is used separately to detect forgeries. It is very important that the weights of this classifier have been adjusted based on positive data because it is not possible to use forgery samples in adjusting phase. To test the proposed method a Persian handwritten data set was prepared using four kinds of forgeries; random, unskilled, skilled, and mimic. This data set consists of numbers written by text as reference words. The method performance using these different reference words showed the best result in correct rejection was 87 % while the correct acceptance was 97 %. We believe the proposed method can be applied to other languages by adjusting some parameters but because it is very important to have the data in high resolution format (e.g. 1,200 dpi) and none of databases have such resolution, the method was only applied to the dataset we gathered.  相似文献   

15.
利用机器学习的乳腺癌组织病理图像诊断节省了大量的人力物力,因此提高乳腺癌组织病理图像识别准确率有很好的现实意义;针对单一分类器和集成学习分类器模型观测域有限容易陷入局部最优的问题,提出一种基于联合训练的分类器模型;通过单一分类器相互影响扩大观测感知域来寻找损失最小的估计点,根据估计点来迭代优化超参数进而联合训练出拟合性能最好的分类器,这样既汲取不同分类器模型的可取之处来增强泛化能力,又加大了模型观测域在可以更快的得到全局最优的同时提升了识别准确率;实验表明,提出的联合训练的分类器能够提升乳腺癌组织病理学图像的分类性能,在不同放大倍数40×、100×、200×、400×下图像良恶性分类准确率分别为99.67%、98.08%、99.01%、96.34%。  相似文献   

16.
17.
18.
In this paper, we perform a noise analysis to assess the degree of robustness to noise of a neural classifier aimed at performing multi-class diagnosis of rolling element bearings. We work on vibration signals collected by means of two accelerometers and we consider ten levels of noise, each of which characterized by a different signal-to-noise ratio ranging from 40.55 to ?11.35 db. We classify the noisy signals by means of a neural classifier initially trained on signals without noise and then we repeat the training process with signals affected by increasing levels of noise. We show that adding noisy signals to the training set we can significantly increase the classification accuracy of a single classifier. Finally, we apply the two most used strategies to combine classifiers: classifier fusion and classifier selection, and show that, in both cases, we can significantly increase the performance of the single best classifier. In particular, classifier selection achieves the best results for low and medium levels of noise, while classifier fusion is the most accurate for high levels of noise. The analysis presented in the paper can be profitably used to identify both the type of classifier (e.g., single classifier or classifier ensemble) and how many and which noise levels should be used in the training phase in order to achieve the desired classification accuracy in the application domain of interest.  相似文献   

19.
Video indexing is employed to represent the features of video sequences. Motion vectors derived from compressed video are preferred for video indexing because they can be accessed by partial decoding; thus, they are used extensively in various video analysis and indexing applications. In this study, we introduce an efficient compressed domain video indexing method and implement it on the H.264/AVC coded videos. The video retrieval experimental evaluations indicate that the video retrieval based on the proposed indexing method outperforms motion vector based video retrieval in 74 % of queries with little increase in computation time. Furthermore, we compared our method with a pixel level video indexing method which employs both temporal and spatial features. Experimental evaluation results indicate that our method outperforms the pixel level method both in performance and speed. Hence considering the speed and precision characteristics of indexing methods, the proposed method is an efficient indexing method which can be used in various video indexing and retrieval applications.  相似文献   

20.
提出了两层混合分类器来预测蛋白质半胱氨酸氧化还原状态,第一层总体线性分类器利用氨基酸百分含量作为输入信息,第二层局部SVM分类器利用半胱氨酸周围局部序列作为输入信息。以2002年4月份的PISCES culled PDB数据库中的 639条蛋白质多肽链作为研究对象,共含有584条二硫键,2 904个半胱氨酸。经严格的折叠刀方法检验,预测半胱氨酸的氧化还原状态准确率最高可达84.1%(半胱氨酸水平)和80.1%(蛋白质水平)。结果表明这种将蛋白质总体信息与局部上下文序列信息结合起来构建的两层混和分类器具有较高的预测准确率。研究结果也表明总体氨基酸百分含量和半胱氨酸周围局部序列都携带有二硫键形成的相关信息,暗示了半胱氨酸是否形成二硫键不但取决于蛋白质全局的结构信息同时也受到局部序列信息的影响。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号