共查询到20条相似文献,搜索用时 15 毫秒
1.
E. Aygün Author Vitae B.J. Oommen Author Vitae Z. Cataltepe Author Vitae 《Pattern recognition》2010,43(11):3891-3899
We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model of Oommen and Kashyap [22] applicable for syntactic pattern recognition can be used to tackle peptide classification problem. We advocate that one can model the differences between compared strings as a mutation model consisting of random substitutions, insertions and deletions obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a support vector machine (SVM)-based peptide classifier can be devised. The classifier, which we have built has been tested for eight different substitution matrices and for two different data sets, namely, the HIV-1 Protease cleavage sites and the T-cell epitopes. The results show that the OIT model performs significantly better than the one which uses a Needleman-Wunsch sequence alignment score, it is less sensitive to the substitution matrix than the other methods compared, and that when combined with a SVM, is among the best peptide classification methods available. 相似文献
2.
3.
Stefan Scherer John Kane Christer Gobl Friedhelm Schwenker 《Computer Speech and Language》2013,27(1):263-287
The dynamic use of voice qualities in spoken language can reveal useful information on a speakers attitude, mood and affective states. This information may be very desirable for a range of, both input and output, speech technology applications. However, voice quality annotation of speech signals may frequently produce far from consistent labeling. Groups of annotators may disagree on the perceived voice quality, but whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. Further, the study looks to include these features as inputs to a fuzzy-input fuzzy-output support vector machine (F2SVM) algorithm, which is in turn capable of softly categorizing voice quality recordings. The F2SVM is compared in a thorough analysis to standard crisp approaches and shows promising results, while outperforming for example standard support vector machines with the sole difference being that the F2SVM approach receives fuzzy label information during training. Overall, it is possible to achieve accuracies of around 90% for both speaker dependent (cross validation) and speaker independent (leave one speaker out validation) experiments. Additionally, the approach using F2SVM performs at an accuracy of 82% for a cross corpus experiment (i.e. training and testing on entirely different recording conditions) in a frame-wise analysis and of around 97% after temporally integrating over full sentences. Furthermore, the output of fuzzy measures gave performances close to that of human annotators. 相似文献
4.
We present two algorithms that are near optimal with respect to the number of inversions present in the input. One of the
algorithms is a variation of insertion sort, and the other is a variation of merge sort. The number of comparisons performed
by our algorithms, on an input sequence of length n that has I inversions, is at most . Moreover, both algorithms have implementations that run in time . All previously published algorithms require at least comparisons for some c > 1.
M. L. Fredman was supported in part by NSF grant CCR-9732689. 相似文献
5.
Extreme learning machines: new trends and applications 总被引:2,自引:0,他引:2
Extreme learning machine(ELM), as a new learning framework, draws increasing attractions in the areas of large-scale computing, high-speed signal processing, artificial intelligence, and so on. ELM aims to break the barriers between the conventional artificial learning techniques and biological learning mechanism and represents a suite of machine learning techniques in which hidden neurons need not to be tuned. ELM theories and algorithms argue that “random hidden neurons” capture the essence of some brain learning mechanisms as well as the intuitive sense that the efficiency of brain learning need not rely on computing power of neurons.Thus, compared with traditional neural networks and support vector machine, ELM offers significant advantages such as fast learning speed, ease of implementation, and minimal human intervention. Due to its remarkable generalization performance and implementation efficiency, ELM has been applied in various applications. In this paper, we first provide an overview of newly derived ELM theories and approaches. On the other hand,with the ongoing development of multilayer feature representation, some new trends on ELM-based hierarchical learning are discussed. Moreover, we also present several interesting ELM applications to showcase the practical advances on this subject. 相似文献
6.
BRANKO D. KOVAČEVIĆ 《International journal of systems science》2013,44(12):2393-2407
A decision-theoretic approach to the estimation of unknown parameters from a linear discrete-time dynamic measurement model in the presence of disturbance uncertainty is considered. The unknown disturbance statistics are characterized by a certain class of distributions to which the real disturbance distribution is confined. Using game theory and the asymptotic estimation error covariance matrix as the criteria of how good an estimator is, the stochastic gradient-type algorithm is shown to be optimal in the min-max sense. Since the optimal solution is not tractable in practice, several suboptimal procedures are derived on the basis of suitable approximations. The convergence of the derived algorithms is established theoretically using the ordinary differential equation approach. Monte Carlo simulation results are presented for the quantitative performance evaluation of the algorithms. 相似文献
7.
针对中文问题分类方法中布尔模型提取特征信息损失较大的问题,提出了一种新的特征权重计算方法。在提取问题特征时,通过把信息熵算法和医院本体概念模型结合在一起,进行问题的特征模型计算,在此基础上使用支持向量机方法进行中文问题分类。在城域医院问答系统的中文问题集上进行实验,证明了该方法的有效性,大类准确率及小类准确率分别达到89.0%和87.1%,取得了较好的效果。 相似文献
8.
采用对抗训练的方式成为域适应算法的主流,通过域分类器将源域和目标域的特征分布对齐,减小不同域之间的特征分布差异。但是,现有的域适应方法仅将不同域数据之间的距离缩小,而没有考虑目标域数据分布与决策边界之间的关系,这会降低目标域内不同类别的特征的域内可区分性。针对现有方法的缺点,提出一种基于分类差异与信息熵对抗的无监督域适应算法(adversarial training on classification discrepancy and information entropy for unsupervised domain adaptation, ACDIE)。该算法利用两个分类器之间的不一致性对齐域间差异,同时利用最小化信息熵的方式降低不确定性,使目标域特征远离决策边界,提高了不同类别的可区分性。在数字标识数据集和Office-31数据集上的实验结果表明,ACDIE算法可以学习到更优的特征表示,域适应分类准确率有明显提高。 相似文献
9.
Sperner's lemma states that any admissible coloring of any triangulation of the unit triangle has a 3‐colored triangle. In
this paper, we first show that any algorithm to find this 3‐colored triangle that treats the coloring itself as an oracle
must be in the worst case linear in the size of the triangulation. Successively, we apply this lower bound to solve three
open questions on robust machines posed by Hartmanis and Hemachandra.
Received: November 4, 1993 相似文献
10.
Recently Zhang and Brockett extended the framework of‘minimum discrimination information’ (MDI) estimation techniques to include quadratic constraints. They claimed their approach was quite different from the usual Lagrange duality theory. We show that the dual problem obtained by Zhang and Brockett is actually a geometric dual. Hence the quadratically constrained MDI estimation can be enriched by the theory of generalized geometric programming. 相似文献
11.
一种基于鲁棒估计的极限学习机方法 总被引:2,自引:0,他引:2
极限学习机(ELM)是一种单隐层前馈神经网络(single-hidden layer feedforward neural networks,SLFNs),它相较于传统神经网络算法来说结构简单,具有较快的学习速度和良好的泛化性能等优点。ELM的输出权值是由最小二乘法(least square,LE)计算得出,然而经典的LS估计的抗差能力较差,容易夸大离群点和噪声的影响,从而造成训练出的参数模型不准确甚至得到完全错误的结果。为了解决此问题,提出一种基于M估计的采用加权最小二乘方法来取代最小二乘法计算输出权值的鲁棒极限学习机算法(RBELM),通过对多个数据集进行回归和分类分析实验,结果表明,该方法能够有效降低异常值的影响,具有良好的抗差能力。 相似文献
12.
Classification algorithms are used in many domains to extract information from data, predict the entry probability of events of interest, and, eventually, support decision making. This paper explores the potential of extreme learning machines (ELM), a recently proposed type of artificial neural network, for consumer credit risk management. ELM possess some interesting properties, which might enable them to improve the quality of model-based decision support. To test this, we empirically compare ELM to established scoring techniques according to three performance criteria: ease of use, resource consumption, and predictive accuracy. The mathematical roots of ELM suggest that they are especially suitable as a base model within ensemble classifiers. Therefore, to obtain a holistic picture of their potential, we assess ELM in isolation and in conjunction with different ensemble frameworks. The empirical results confirm the conceptual advantages of ELM and indicate that they are a valuable alternative to other credit risk modelling methods. 相似文献
13.
The discovery of diversity patterns from binary data is an important data mining task. In this paper, we propose the problem of mining highly diverse patterns called non-redundant diversity patterns (NDPs). In this framework, entropy is adopted to measure the diversity of itemsets. In addition, an algorithm called NDP miner is proposed to exploit both monotone properties of entropy diversity measure and pruning power for the efficient discovery of non-redundant diversity patterns. Finally, our experimental results are given to show that the NDP miner can efficiently identify non-redundant diversity patterns. 相似文献
14.
The discovery of diversity patterns from binary data is an important data mining task. In this paper, we propose the problem of mining highly diverse patterns called non-redundant diversity patterns (NDPs). In this framework, entropy is adopted to measure the diversity of itemsets. In addition, an algorithm called NDP miner is proposed to exploit both monotone properties of entropy diversity measure and pruning power for the efficient discovery of non-redundant diversity patterns. Finally, our experimental results are given to show that the NDP miner can efficiently identify non-redundant diversity patterns. 相似文献
15.
Vector quantization using information theoretic concepts 总被引:1,自引:0,他引:1
The process of representing a large data set with a smaller number of vectors in the best possible way, also known as vector quantization, has been intensively studied in the recent years. Very efficient algorithms like the Kohonen self-organizing map (SOM) and the Linde Buzo Gray (LBG) algorithm have been devised. In this paper a physical approach to the problem is taken, and it is shown that by considering the processing elements as points moving in a potential field an algorithm equally efficient as the before mentioned can be derived. Unlike SOM and LBG this algorithm has a clear physical interpretation and relies on minimization of a well defined cost function. It is also shown how the potential field approach can be linked to information theory by use of the Parzen density estimator. In the light of information theory it becomes clear that minimizing the free energy of the system is in fact equivalent to minimizing a divergence measure between the distribution of the data and the distribution of the processing elements, hence, the algorithm can be seen as a density matching method. 相似文献
16.
The paper describes the K-winner machine (KWM) model for classification. KWM training uses unsupervised vector quantization and subsequent calibration to label data-space partitions. A K-winner classifier seeks the largest set of best-matching prototypes agreeing on a test pattern, and provides a local-level measure of confidence. A theoretical analysis characterizes the growth function of a K-winner classifier, and the result leads to tight bounds to generalization performance. The method proves suitable for high-dimensional multiclass problems with large amounts of data. Experimental results on both a synthetic and a real domain (NIST handwritten numerals) confirm the approach effectiveness and the consistency of the theoretical framework. 相似文献
17.
目的 高光谱图像波段数目巨大,导致在解译及分类过程中出现“维数灾难”的现象。针对该问题,在K-means聚类算法基础上,考虑各个波段对不同聚类的重要程度,同时顾及类间信息,提出一种基于熵加权K-means全局信息聚类的高光谱图像分类算法。方法 首先,引入波段权重,用来刻画各个波段对不同聚类的重要程度,并定义熵信息测度表达该权重。其次,为避免局部最优聚类,引入类间距离测度实现全局最优聚类。最后,将上述两类测度引入K-means聚类目标函数,通过最小化目标函数得到最优分类结果。结果 为了验证提出的高光谱图像分类方法的有效性,对Salinas高光谱图像和Pavia University高光谱图像标准图中的地物类别根据其光谱反射率差异程度进行合并,将合并后的标准图作为新的标准分类图。分别采用本文算法和传统K-means算法对Salinas高光谱图像和Pavia University高光谱图像进行实验,并定性、定量地评价和分析了实验结果。对于图像中合并后的地物类别,光谱反射率差异程度大,从视觉上看,本文算法较传统K-means算法有更好的分类结果;从分类精度看,本文算法的总精度分别为92.20%和82.96%, K-means算法的总精度分别为83.39%和67.06%,较K-means算法增长8.81%和15.9%。结论 提出一种基于熵加权K-means全局信息聚类的高光谱图像分类算法,实验结果表明,本文算法对高光谱图像中具有不同光谱反射率差异程度的各类地物目标均能取得很好的分类结果。 相似文献
18.
A new class of discrete random fields designed for quick simulation and covariance inference under inhomogeneous conditions is introduced and studied. Simulation of these correlated fields can be done in a single pass instead of relying on multi-pass convergent methods like the Gibbs Sampler or other Markov chain Monte Carlo algorithms. The fields are constructed directly from an undirected graph with specified marginal probability mass functions and covariances between nearby vertices in a manner that makes simulation quite feasible yet maintains the desired properties. Special cases of these correlated fields have been deployed successfully in data authentication, object detection and CAPTCHA1 generation. Further applications in maximum likelihood estimation and classification such as optical character recognition are now given within. 相似文献
19.
Fault masking can reduce the effectiveness of a test suite. We propose an information theoretic measure, Squeeziness, as the theoretical basis for avoiding fault masking. We begin by explaining fault masking and the relationship between collisions and fault masking. We then define Squeeziness and demonstrate by experiment that there is a strong correlation between Squeeziness and the likelihood of collisions. We conclude with comments on how Squeeziness could be the foundation for generating test suites that minimise the likelihood of fault masking. 相似文献
20.
A normalized measure is established to provide the quantitative information about the degree of observability for the discrete-time, stochastically autonomous system. This measure is based on the generalized information theoretic quantities (generalized entropy, mutual information) of the system state and the observations, where the system state can be a discrete or a continuous random vector. Some important properties are presented. For the linear case, the explicit formula for the degree of observability is derived, and the equivalence between the proposed measure and the traditional rank condition is proved. The curves for the degree of observability are depicted in a simple example. 相似文献