首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of effort has been directed to the task of automatically identifying them, with considerable success. In this paper, we propose an approach for the identification of MWEs in a multilingual context, as a by-product of a word alignment process, that not only deals with the identification of possible MWE candidates, but also associates some multiword expressions with semantics. The results obtained indicate the feasibility and low costs in terms of tools and resources demanded by this approach, which could, for example, facilitate and speed up lexicographic work.  相似文献   

2.
In this paper, we present a novel scheme for linear feature extraction in classification. The method is based on the maximization of the mutual information (MI) between the features extracted and the classes. The sum of the MI corresponding to each of the features is taken as an heuristic that approximates the MI of the whole output vector. Then, a component-by-component gradient-ascent method is proposed for the maximization of the MI, similar to the gradient-based entropy optimization used in independent component analysis (ICA). The simulation results show that not only is the method competitive when compared to existing supervised feature extraction methods in all cases studied, but it also remarkably outperform them when the data are characterized by strongly nonlinear boundaries between classes.  相似文献   

3.
陈士超  郁滨 《计算机应用》2011,31(4):1070-1073
为了降低互信息方法固有问题对术语过滤效果的影响,提出一种双阈值互信息过滤方法,给出了一种基于局部评价指标的阈值确定算法,通过数据抽样、统计和计算,能够快速精确地给出最优上下限阈值。相比单阈值互信息过滤方法,在不更改互信息计算公式的前提下,通过设置双阈值的方法进行候选术语过滤与抽取。实验结果表明,在相同条件下,该方法能够显著提高准确率和F-测度值。  相似文献   

4.
一种可最优化计算特征规模的互信息特征提取   总被引:3,自引:0,他引:3       下载免费PDF全文
利用矩阵特征向量分解,提出一种可最优化计算特征规模的互信息特征提取方法.首先,论述了高斯分布假设下的该互信息判据的类可分特性,并证明了现有典型算法都是本算法的特例;然后,在给出该互信息判据严格的数学意义基础上,提出了基于矩阵特征向量分解计算最优化特征规模算法;最后,通过实际数据验证了该方法的有效性  相似文献   

5.
To recognize expressions accurately, facial expression systems require robust feature extraction and feature selection methods. In this paper, a normalized mutual information based feature selection technique is proposed for FER systems. The technique is derived from an existing method, that is, the max-relevance and min-redundancy (mRMR) method. We, however, propose to normalize the mutual information used in this method so that the domination of the relevance or of the redundancy can be eliminated. For feature extraction, curvelet transform is used. After the feature extraction and selection the feature space is reduced by employing linear discriminant analysis (LDA). Finally, hidden Markov model (HMM) is used to recognize the expressions. The proposed FER system (CNF-FER) is validated using four publicly available standard datasets. For each dataset, 10-fold cross validation scheme is utilized. CNF-FER outperformed the existing well-known statistical and state-of-the-art methods by achieving a weighted average recognition rate of 99 % across all the datasets.  相似文献   

6.
为了确定改进互信息(PMIk)方法的参数k取何值时能够克服互信息(PMI)方法过高估计两个低频且总是一起出现的字串间结合强度的缺点,解决术语抽取系统采用经过分词的语料库时由于分词错误导致的某些术语无法抽取的问题,以及改善术语抽取系统的可移植性,提出了一种结合PMIk和两个基本过滤规则从未经过分词的语料库中进行术语抽取的算法。首先,利用PMIk方法计算两个字之间的结合强度,确定2元待扩展种子;其次,利用PMIk方法计算2元待扩展种子分别和其左边、右边的字的结合强度,确定2元是否能扩展为3元,如此迭代扩展出多元的候选术语;最后,利用两个基本过滤规则过滤候选术语中的垃圾串,得到最终结果。理论分析表明,当k≥3(k∈N+)时,PMIk方法能克服PMI方法的缺点。在1 GB的新浪财经博客语料库和300 MB百度贴吧语料库上的实验验证了理论分析的正确性,且PMIk方法获得了比PMI方法更高的精度,算法有良好的可移植性。  相似文献   

7.
针对线性的互信息特征提取方法,通过研究互信息梯度在核空间中的线性不变性,提出一种快速、高效的非线性特征提取方法。该方法采用互信息二次熵快速算法及梯度上升的寻优策略,提取有判别能力的非线性高阶统计量;在计算时避免传统非线性特征提取中的特征值分解运算,有效降低计算量。通过UCT数据的投影和分类实验表明,该方法无论在投影空间的可分性上,还是在算法时间复杂度上,都明显优于传统算法。  相似文献   

8.
In this paper, we present a comprehensive approach for extracting and relating Arabic multiword expressions (MWE) from Social Networks. 15 million tweets were collected and processed to form our data set. Due to the complexity of processing Arabic and the lack of resources, we built an experimental system to extract and relate similar MWE using statistical methods. We introduce a new metrics for measuring valid MWE in Social Networks. We compare results obtained from our experimental system against semantic graph obtained from web knowledgebase.  相似文献   

9.
10.
由于线性变换无法较好保留数据的非线性结构而非线性变换往往需要进行大量的复杂运算,提出一种快速、高效的非线性特征提取方法。该方法通过研究互信息梯度在核空间中的线性不变性,采用互信息二次熵快速算法及梯度上升寻优策略,在有效降低计算量的同时能够提取有判别力的非线性高阶统计量。详细的数据投影和分类实验表明该方法在分类性能和算法时间复杂度上都优于传统算法。  相似文献   

11.
This paper provides a unifying view of three discriminant linear feature extraction methods: linear discriminant analysis, heteroscedastic discriminant analysis and maximization of mutual information. We propose a model-independent reformulation of the criteria related to these three methods that stresses their similarities and elucidates their differences. Based on assumptions for the probability distribution of the classification data, we obtain sufficient conditions under which two or more of the above criteria coincide. It is shown that these conditions also suffice for Bayes optimality of the criteria. Our approach results in an information-theoretic derivation of linear discriminant analysis and heteroscedastic discriminant analysis. Finally, regarding linear discriminant analysis, we discuss its relation to multidimensional independent component analysis and derive suboptimality bounds based on information theory.  相似文献   

12.
英中可比语料库中多词表达自动提取与对齐   总被引:3,自引:1,他引:2       下载免费PDF全文
多词表达(MWE)不仅用来提高当前机器翻译系统质量,而且也用于跨语言检索和数据挖掘等其他自然语言处理领域。为此,提出了基于语义模板与基于统计工具相结合的方法从三元组可比语料库中自动提取本族英语MWE。采用基于词表和分布方法计算词语间的相似度,扩大MWE覆盖范围。利用GIZA++对齐算法提取对译的中文MWE,依据统计方法计算互译概率信息,根据概率大小,选择最佳英汉MWE互译对。实验结果表明上述方法可以有效提高MWE提取和对齐的准确率。  相似文献   

13.
Recently, mutual interdependence analysis (MIA) has been successfully used to extract representations, or “mutual features”, accounting for samples in the class. For example, a mutual feature is a face signature under varying illumination conditions or a speaker signature under varying channel conditions. A mutual feature is a linear regression that is equally correlated with all samples of the input class. Previous work discussed two equivalent definitions of this problem and a generalization of its solution called generalized MIA (GMIA). Moreover, it showed how mutual features can be computed and employed. This paper uses a parametrized version GMIA(λ) to pursue a deeper understanding of what GMIA features really represent. It defines a generative signal model that is used to interpret GMIA(λ) and visualize its difference to MIA, principal and independent component analysis. Finally, we analyze the effect of λ on the feature extraction performance of GMIA(λ) in two standard pattern recognition problems: illumination-independent face recognition and text-independent speaker verification.  相似文献   

14.
最大模糊互信息用于图像分割   总被引:3,自引:3,他引:0       下载免费PDF全文
为了更好地选取图像阈值,将最大模糊熵(MFE)准则与最大互信息(MMI)准则结合,提出最大模糊互信息(MFMI)准则。同时为了有效确定最佳分割类数,提出根据模糊互信息差(dFMI)来判别的准则。综合上述的两点改进,提出一种新的多阈值分割算法——最大模糊互信息量分割算法(MFMI)。对合成图像、无损检测图像、标准测试图像进行仿真,同时对比结合前的MFE与MMI,经典的阈值分割法如OTSU和MET,以及流行的模糊C均值算法(FCM),可以发现MFMI误判率最小,代价是运行时间较长。综上,MFMI是一个有效的图像分割方法。  相似文献   

15.
Minimum output mutual information is regarded as a natural criterion for independent component analysis (ICA) and is used as the performance measure in many ICA algorithms. Two common approaches in information-theoretic ICA algorithms are minimum mutual information and maximum output entropy approaches. In the former approach, we substitute some form of probability density function (pdf) estimate into the mutual information expression, and in the latter we incorporate the source pdf assumption in the algorithm through the use of nonlinearities matched to the corresponding cumulative density functions (cdf). Alternative solutions to ICA use higher-order cumulant-based optimization criteria, which are related to either one of these approaches through truncated series approximations for densities. In this article, we propose a new ICA algorithm motivated by the maximum entropy principle (for estimating signal distributions). The optimality criterion is the minimum output mutual information, where the estimated pdfs are from the exponential family and are approximate solutions to a constrained entropy maximization problem. This approach yields an upper bound for the actual mutual information of the output signals - hence, the name minimax mutual information ICA algorithm. In addition, we demonstrate that for a specific selection of the constraint functions in the maximum entropy density estimation procedure, the algorithm relates strongly to ICA methods using higher-order cumulants.  相似文献   

16.
We extend the multimodal image registration method described in Alexander and Summers [Fast registration algorithm using a variational principle for mutual information, Proc. SPIE Int. Soc. Opt. Eng. 5032 (2003) 1053–1063] to nonlinear registration. A variational principle maximizing mutual information leads to an Euler–Lagrange (EL) equation for the displacement field, represented here in a basis of cubic B-spline functions. A cost function is constructed from the sum of squares of the residuals of the EL equation at a subset of pixels where the magnitude of the spatial gradient of intensity exceeds a user-chosen threshold. The unknown coefficients in the displacement field representation are evaluated using a Levenberg–Marquardt minimization procedure. The proposed method was successfully applied to several image pairs of the same and different modalities, and an artificially constructed series of images containing nonlinear distortions and noise.  相似文献   

17.
In this paper we propose a new way to achieve a navigation task (visual path following) for a non-holonomic vehicle. We consider an image-based navigation process. We show that it is possible to navigate along a visual path without relying on the extraction, matching and tracking of geometric visual features such as keypoint. The new proposed approach relies directly on the information (entropy) contained in the image signal. We show that it is possible to build a control law directly from the maximization of the shared information between the current image and the next key image in the visual path. The shared information between those two images is obtained using mutual information that is known to be robust to illumination variations and occlusions. Moreover the generally complex task of features extraction and matching is avoided. Both simulations and experiments on a real vehicle are presented and show the possibilities and advantages offered by the proposed method.  相似文献   

18.
基于互信息的主成分分析特征选择算法   总被引:3,自引:0,他引:3  
主成分分析是一种常用的特征选择算法,经典方法是计算各个特征之间的相关,但是相关无法评估变量间的非线性关系.互信息可用于衡量两个变量间相互依赖的强弱程度,且不局限于线性相关,鉴于此,提出一种基于互信息的主成分分析特征选择算法.该算法计算特征间的互信息,以互信息矩阵的特征值作为评价准则确定主成分的个数,并衡量主成分分析特征选择的效果.通过实例对所提出方法和传统主成分分析方法进行比较,并以神经网络为分类器分析分类效果.  相似文献   

19.
Xu  Xianfa  Chen  Zhe  Yin  Fuliang 《Applied Intelligence》2022,52(13):14888-14904
Applied Intelligence - Multi-view representation is a crucial but challenging issue in visual recognition task. To address this issue, a deep mutual information multi-view representation method is...  相似文献   

20.
We propose a graph model for mutual information based clustering problem. This problem was originally formulated as a constrained optimization problem with respect to the conditional probability distribution of clusters. Based on the stationary distribution induced from the problem setting, we propose a function which measures the relevance among data objects under the problem setting. This function is utilized to capture the relation among data objects, and the entire objects are represented as an edge-weighted graph where pairs of objects are connected with edges with their relevance. We show that, in hard assignment, the clustering problem can be approximated as a combinatorial problem over the proposed graph model when data is uniformly distributed. By representing the data objects as a graph based on our graph model, various graph based algorithms can be utilized to solve the clustering problem over the graph. The proposed approach is evaluated on the text clustering problem over 20 Newsgroup and TREC datasets. The results are encouraging and indicate the effectiveness of our approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号