首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
《Computers & chemistry》1993,17(2):219-227
A neural network classification method has been developed as an alternative approach to the search/organization problem of large molecular databases. Two artificial neural systems have been implemented on a Cray for rapid protein/nucleic acid classification of unknown sequences. The system employs a n-gram hashing function for sequence encoding and modular back-propagation networks for classification. The protein system, which classifies proteins into PIR (Protein Identification Resource) superfamilies, has achieved 82–100% sensitivity at a speed that is about an order of magnitude faster than other search methods. The pilot nucleic acid system showed a 91–97% classification accuracy. The software tool could be used as a filter program to reduce the database search time and help organize the molecular sequence databases. The tool is generally applicable to any databases that are organized according to family relationships.  相似文献   

2.
A modified counter-propagation (CP) algorithm with supervised learning vector quantizer (LVQ) and dynamic node allocation has been developed for rapid classification of molecular sequences. The molecular sequences were encoded into neural input vectors using an n–gram hashing method for word extraction and a singular value decomposition (SVD) method for vector compression. The neural networks used were three-layered, forward-only CP networks that performed nearest neighbor classification. Several factors affecting the CP performance were evaluated, including weight initialization, Kohonen layer dimensioning, winner selection and weight update mechanisms. The performance of the modified CP network was compared with the back-propagation (BP) neural network and the k–nearest neighbor method. The major advantages of the CP network are its training and classification speed and its capability to extract statistical properties of the input data. The combined BP and CP networks can classify nucleic acid or protein sequences with a close to 100% accuracy at a rate of about one order of magnitude faster than other currently available methods.  相似文献   

3.
提出了基于Levenberg-Marquardt(LM)算法的BP神经网络对蛋白质序列进行家族分类的新方法.该方法采用二肽含量对蛋白质序列进行特征提取,根据影响因子评价特征的相对重要性,用改进的BP神经网络LM优化算法构造一个三层人工神经网络,通过对PIR数据库中三类家族的学习,该网络对未知蛋白质序列分类的准确率分别达到了98.9%.98.1%,97.8%。  相似文献   

4.
目前基于词嵌入的卷积神经网络文本分类方法已经在情感分析研究中取得了很好的效果。此类方法主要使用基于上下文的词嵌入特征,但在词嵌入过程中通常并未考虑词语本身的情感极性,同时此类方法往往缺乏对大量人工构建情感词典等资源的有效利用。针对这些问题,该文提出了一种结合情感词典和卷积神经网络的情感分类方法,利用情感词典中的词条对文本中的词语进行抽象表示,在此基础上利用卷积神经网络提取抽象词语的序列特征,并用于情感极性分类。该文提出的相关方法在中文倾向性分析评测COAE2014数据集上取得了比目前主流的卷积神经网络以及朴素贝叶斯支持向量机更好的性能。  相似文献   

5.
传统的双流卷积神经网络存在难以理解长动作信息的问题,并且当长时间流信息损失时,模型泛化能力降低.针对此问题,文中提出基于双流网络与支持向量机融合的人体行为识别方法.首先,提取视频中每帧RGB图像及其对应垂直方向的稠密光流序列图,得到视频中动作的空间信息和时间信息,分别输入空间域和时间域网络进行预训练,预训练完成后进行特征提取.然后,针对双流网络提取的维度相同的特征向量执行并联融合策略,提高特征向量的表征能力.最后,将融合后的特征向量输入线性支持向量机中进行训练及分类处理.在KTH、UCF sports数据集上的实验表明文中方法具有较好的分类效果.  相似文献   

6.
This paper considers the n-job, m-machine permutation flowshop with the objective of minimizing the mean flowtime. Initial sequences that are structured to enhance the performance of local search techniques are constructed from job rankings delivered by a trained neural network. The network's training is done by using data collected from optimal sequences obtained from solved examples of flowshop problems. Once trained, the neural network provides rankable measures that can be used to construct a sequence in which jobs are located as close as possible to the positions they would occupy in an optimal sequence. The contribution of these ‘neural’ sequences in improving the performance of some common local search techniques, such as adjacent pairwise interchange and tabu search, is examined. Tests using initial sequences generated by different heuristics show that the sequences suggested by the neural networks are more effective in directing neighborhood search methods to lower local optima.  相似文献   

7.
Machine learning is being implemented in bioinformatics and computational biology to solve challenging problems emerged in the analysis and modeling of biological data such as DNA, RNA, and protein. The major problems in classifying protein sequences into existing families/superfamilies are the following: the selection of a suitable sequence encoding method, the extraction of an optimized subset of features that possesses significant discriminatory information, and the adaptation of an appropriate learning algorithm that classifies protein sequences with higher classification accuracy. The accurate classification of protein sequence would be helpful in determining the structure and function of novel protein sequences. In this article, we have proposed a distance‐based sequence encoding algorithm that captures the sequence's statistical characteristics along with amino acids sequence order information. A statistical metric‐based feature selection algorithm is then adopted to identify the reduced set of features to represent the original feature space. The performance of the proposed technique is validated using some of the best performing classifiers implemented previously for protein sequence classification. An average classification accuracy of 92% was achieved on the yeast protein sequence data set downloaded from the benchmark UniProtKB database.  相似文献   

8.
Backpropagation neural networks have been applied to prediction and classification problems in many real world situations. However, a drawback of this type of neural network is that it requires a full set of input data, and real world data is seldom complete. We have investigated two ways of dealing with incomplete data — network reduction using multiple neural network classifiers, and value substitution using estimated values from predictor networks — and compared their performance with an induction method. On a thyroid disease database collected in a clinical situation, we found that the network reduction method was superior. We conclude that network reduction can be a useful method for dealing with missing values in diagnostic systems based on backpropagation neural networks.  相似文献   

9.
The paper presents a neural network based multi-classifier system for the identification of Escherichia coli promoter sequences in strings of DNA. As each gene in DNA is preceded by a promoter sequence, the successful location of an E. coli promoter leads to the identification of the corresponding E. coli gene in the DNA sequence. A set of 324 known E. coli promoters and a set of 429 known non-promoter sequences were encoded using four different encoding methods. The encoded sequences were then used to train four different neural networks. The classification results of the four individual neural networks were then combined through an aggregation function, which used a variation of the logarithmic opinion pool method. The weights of this function were determined by a genetic algorithm. The multi-classifier system was then tested on 159 known promoter sequences and 171 non-promoter sequences not contained in the training set. The results obtained through this study proved that the same data set, when presented to neural networks in different forms, can provide slightly varying results. It also proves that when different opinions of more classifiers on the same input data are integrated within a multi-classifier system, we can obtain results that are better than the individual performances of the neural networks. The performances of our multi-classifier system outperform the results of other prediction systems for E. coli promoters developed so far.
Vasile PaladeEmail:
  相似文献   

10.
Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST) based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectively.  相似文献   

11.
In this paper we describe an elegant and efficient approach to coupling reordering and decoding in statistical machine translation, where the n-gram translation model is also employed as distortion model. The reordering search problem is tackled through a set of linguistically motivated rewrite rules, which are used to extend a monotonic search graph with reordering hypotheses. The extended graph is traversed in the global search when a fully informed decision can be taken. Further experiments show that the n-gram translation model can be successfully used as reordering model when estimated with reordered source words. Experiments are reported on the Europarl task (Spanish–English and English–Spanish). Results are presented regarding translation accuracy and computational efficiency, showing significant improvements in translation quality with respect to monotonic search for both translation directions at a very low computational cost.  相似文献   

12.
This research is concerned with a gradient descent training algorithm for a target network that makes use of a helper feed-forward network (FFN) to represent the cost function required for training the target network. A helper FFN is trained because the cost relation for the target is not differentiable. The transfer function of the trained helper FFN provides a differentiable cost function of the parameter vector for the target network allowing gradient search methods for finding the optimum values of the parameters. The method is applied to the training of discrete recurrent networks (DRNNs) that are used as a tool for classification of temporal sequences of characters from some alphabet and identification of a finite state machine (FSM) that may have produced all the sequences. Classification of sequences that are input to the DRNN is based on the terminal state of the network after the last element in the input sequence has been processed. If the DRNN is to be used for classifying sequences the terminal states for class 0 sequences must be distinct from the terminal states for class 1 sequences. The cost value to be used in training must therefore be a function of this disjointedness and no more. The outcome of this is a cost relationship that is not continuous but discrete and therefore derivative free methods have to be used or alternatively the method suggested in this paper. In the latter case the transform function of the helper FFN that is trained using the cost function is a differentiable function that can be used in the training of the DRNN using gradient descent.Acknowledgement. This work was supported by a discovery grant from the Government of Canada. The comments made by the reviewers are also greatly appreciated and have proven to be quite useful.  相似文献   

13.
为提高文本分类的准确性,本文提出了一种基于量子PSO和RBF神经网络的新的文本分类方法.首先建立描述样本类别的关键词集合,并采用模糊向量空间模型建立每类样本的特征向量,然后采用RBF神经网络实施文本自动分类,采用改进的量子PSO优化RBF神经网络的参数,以提高其逼近能力.选取中国期刊网的部分文献作为实验数据,实验结果说明本文所提出方法的分类精准度与其他同类方法相比有明显的提高.  相似文献   

14.
基于梯度攻击对图像进行修改,可造成基于神经网络的分类技术的精确度降低10%左右,针对这一问题,提出利用网络空间领域里移动目标防御思想来增加神经网络对抗该类攻击的鲁棒性。定义整体网络集的“区别免疫”概念,将网络中防御方和用户之间的交互模拟为一个重复贝叶斯-斯坦科尔伯格博弈过程。基于此从该组网络集中挑选出一个受训练的网络对输入图像进行分类。该防御方法能减少MNIST数据库中受干扰图像的分类错误,同时对于正常的测试图像保持较高的分类精度。该方法可以与现有的防御机制结合使用,确保神经网络安全性。  相似文献   

15.

In order to improve the accuracy of rolling bearing fault diagnosis in mechanical equipment, a new fault diagnosis method based on back propagation neural network optimized by cuckoo search algorithm is proposed. This method use the global search ability of the cuckoo search algorithm to constantly search for the best weights and thresholds, and then give it to the back propagation neural network. In this paper, wavelet packet decomposition is used for feature extraction of vibration signals. The energy values of different frequency bands are obtained through wavelet packet decomposition, and they are input as feature vectors into optimized back propagation neural network to identify different fault types of rolling bearings. Through the three sets of simulation comparison experiments of Matlab, the experimental results show that, Under the same conditions, compared with the other five models, the proposed back propagation neural network optimized by cuckoo search algorithm has the least number of training iterations and the highest diagnostic accuracy rate. And in the complex classification experiment with the same fault location but different bearing diameters, the fault recognition correct rate of the back propagation neural network optimized by cuckoo search algorithm is 96.25%.

  相似文献   

16.
提出一种基于强化学习的生成对抗网络(Reinforcement learning-based Generative Adversarial Networks,Re-GAN)能耗预测方法.该算法将强化学习与生成对抗网络相结合,将GAN(Generative Adversarial Nets)中的生成器以及判别器分别构建为强化学习中Agent(生成器)以及奖赏函数.在训练过程中,将当前的真实能耗序列作为Agent的输入状态,构建一组固定长度的生成序列,结合判别器及蒙特卡洛搜索方法进一步构建当前序列的奖赏函数,并以此作为真实样本序列后续第一个能耗值的奖赏.在此基础之上,构建关于奖赏的目标函数,并求解最优参数.最后使用所提算法对唐宁街综合大楼公开的建筑能耗数据进行预测试验,实验结果表明,所提算法比多层感知机、门控循环神经网络和卷积神经网络具有更高的预测精度.  相似文献   

17.
提出了一种基于多重结构神经网络的故障检测方法。针对以歼击机为代表的非线性系统中存在的突发结构故障,构造了一个多重结构神经网络,在输入层对残差信号进行二进离散小波变换,提取其在多尺度下的细节系数作为故障特征向量,并将其输入到神经网络分类器进行相应的模式分类,之后再利用下一级的故障度辨识神经网络对故障的大小进行辨识。仿真结果表明,本文方法为歼击机组合结构故障的检测提供了有效的方法和途径。  相似文献   

18.
Many machine learning problems in natural language processing, transaction-log analysis, or computational biology, require the analysis of variable-length sequences, or, more generally, distributions of variable-length sequences.Kernel methods introduced for fixed-size vectors have proven very successful in a variety of machine learning tasks. We recently introduced a new and general kernel framework, rational kernels, to extend these methods to the analysis of variable-length sequences or more generally distributions given by weighted automata. These kernels are efficient to compute and have been successfully used in applications such as spoken-dialog classification with Support Vector Machines.However, the rational kernels previously introduced in these applications do not fully encompass distributions over alternate sequences. They are based only on the counts of co-occurring subsequences averaged over the alternate paths without taking into accounts information about the higher-order moments of the distributions of these counts.In this paper, we introduce a new family of rational kernels, moment kernels, that precisely exploits this additional information. These kernels are distribution kernels based on moments of counts of strings. We describe efficient algorithms to compute moment kernels and apply them to several difficult spoken-dialog classification tasks. Our experiments show that using the second moment of the counts of n-gram sequences consistently improves the classification accuracy in these tasks.Editors: Dan Roth and Pascale Fung  相似文献   

19.

In many classification problems, it is necessary to consider the specific location of an n-dimensional space from which features have been calculated. For example, considering the location of features extracted from specific areas of a two-dimensional space, as an image, could improve the understanding of a scene for a video surveillance system. In the same way, the same features extracted from different locations could mean different actions for a 3D HCI system. In this paper, we present a self-organizing feature map able to preserve the topology of locations of an n-dimensional space in which the vector of features have been extracted. The main contribution is to implicitly preserving the topology of the original space because considering the locations of the extracted features and their topology could ease the solution to certain problems. Specifically, the paper proposes the n-dimensional constrained self-organizing map preserving the input topology (nD-SOM-PINT). Features in adjacent areas of the n-dimensional space, used to extract the feature vectors, are explicitly in adjacent areas of the nD-SOM-PINT constraining the neural network structure and learning. As a study case, the neural network has been instantiate to represent and classify features as trajectories extracted from a sequence of images into a high level of semantic understanding. Experiments have been thoroughly carried out using the CAVIAR datasets (Corridor, Frontal and Inria) taken into account the global behaviour of an individual in order to validate the ability to preserve the topology of the two-dimensional space to obtain high-performance classification for trajectory classification in contrast of non-considering the location of features. Moreover, a brief example has been included to focus on validate the nD-SOM-PINT proposal in other domain than the individual trajectory. Results confirm the high accuracy of the nD-SOM-PINT outperforming previous methods aimed to classify the same datasets.

  相似文献   

20.
《Intelligent Data Analysis》1998,2(1-4):287-301
In this paper, we investigate a form of modular neural network for classification with (a) pre-separated input vectors entering its specialist (expert) networks, (b) specialist networks which are self-organized (radial-basis function or self-targeted feedforward type) and (c) which fuses (or integrates) the specialists with a single-layer net. When the modular architecture is applied to spatiotemporal sequences, the Specialist Nets are recurrent; specifically, we use the Input Recurrent type.The Specialist Networks (SNs) learn to divide their input space into a number of equivalence classes defined by self-organized clustering and learning using the statistical properties of the input domain. Once the specialists have settled in their training, the Fusion Network is trained by any supervised method to map to the semantic classes.We discuss the fact that this architecture and its training is quite distinct from the hierarchical mixture of experts (HME) type as well as from stacked generalization.Because the equivalence classes to which the SNs map the input vectors are determined by the natural clustering of the input data, the SNs learn rapidly and accurately. The fusion network also trains rapidly by reason of its simplicity.We argue, on theoretical grounds, that the accuracy of the system should be positively correlated to the product of the number of equivalence classes for all of the SNs.This network was applied, as an empirical test case, to the classification of melodies presented as direct audio events (temporal sequences) played by a human and subject, therefore, to biological variations. The audio input was divided into two modes: (a) frequency (or pitch) variation and (b) rhythm, both as functions of time. The results and observations show the technique to be very robust and support the theoretical deductions concerning accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号