首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two textual metrics “Frequency Rank” (FR) and “Intimacy” are proposed in this paper to measure the word using and collocation characteristics which are two important aspects of text style. The FR, derived from the local index numbers of terms in a sentences ordered by the global frequency of terms, provides single-term-level information. The Intimacy models relationship between a word and others, i.e. the closeness a term is to other terms in the same sentence. Two textual features “Frequency Rank Ratio (FRR)” and “Overall Intimacy (OI)” for capturing language variation are derived by employing the two proposed textual metrics. Using the derived features, language variation among documents can be visualized in a text space. Three corpora consisting of documents of diverse topics, genres, regions, and dates of writing are designed and collected to evaluate the proposed algorithms. Extensive simulations are conducted to verify the feasibility and performance of our implementation. Both theoretical analyses based on entropy and the simulations demonstrate the feasibility of our method. We also show the proposed algorithm can be used for visualizing the closeness of several western languages. Variation of modern English over time is also recognizable when using our analysis method. Finally, our method is compared to conventional text classification implementations. The comparative results indicate our method outperforms the others.  相似文献   

2.
向量空间模型(VSM)是一种使用特征向量对文本进行建模的方法,广泛应用于文本分类、模式识别等领域。但文本内容较多时,传统的VSM建模可能产生维数爆炸现象,效率低下且难以保证分类效果。针对VSM高维现象,提出一种利用词义和词频降低文本建模维度的方法,以提高效率和准确度。提出一种多义词判别优化的同义词聚类方法,结合上下文判别多义词的词义后,根据特征项词义相似度进行加权,合并词义相近的特征项。新方法使特征向量维度大大降低,多义词判别提高了文章特征提取的准确性。与其他文本特征提取和文本分类方法进行比较,结果表明,该算法在效率和准确度上有明显提高。  相似文献   

3.
We study the general problem which consists in detecting the morphic images of a given word in an arbitrary text. We introduce the concept ofrank of a pattern, which measures the complexity of its recognition in terms of periodicity.This notion leads to improve the general naive algorithm. A special class of equations in words is also concerned.  相似文献   

4.
通过分析特征词与类别间的相关性,提出了一种新的特征加权方法,依据特征词在特定类中出现的次数、特征词在某一类中的集中程度、特征词在特定类中的均匀分布程度来计算特征权值。通过与TF-IDF进行实验对比,新提出的TF-Var特征权重方法使得分类的微平均准确率得到了明显的提高。  相似文献   

5.
E. Zampieri 《Calcolo》1989,26(1):61-91
In this paper we consider the numerical approximation of elliptic problems by spectral methods in domains subdivided into substructures. We review an iterative procedure with interface relaxation, reducing the given differential problem to a sequence of Dirichlet and mixed Neumann-Dirichlet problems on each subdomain. The iterative procedure is applied to both tau and collocation spectral approximations. Two optimal strategies for the automatic selection of the relaxation parameter are indicated. We present several numerical experiments showing the convergence properties of the iterative scheme with respect to the decomposition. A multilevel technique for domain decomposition methods is proposed.  相似文献   

6.
The aim of this paper is to introduce the stochastic collocation methods in topology optimization for mechanical systems with material and geometric uncertainties. The random variations are modeled by a memory-less transformation of spatially varying Gaussian random fields which ensures their physical admissibility. The stochastic collocation method combined with the proposed material and geometry uncertainty models provides robust designs by utilizing already developed deterministic solvers. The computational cost is discussed in details and solutions to decrease it, like sparse grids and discretization refinement are proposed and demonstrated as well. The method is utilized in the design of compliant mechanisms.  相似文献   

7.
基于词频差异的特征选取及改进的TF-IDF公式   总被引:18,自引:2,他引:18  
罗欣  夏德麟  晏蒲柳 《计算机应用》2005,25(9):2031-2033
文档向量化的质量对于文本分类的速度和准确度有着很大的影响。对文档向量化中常用的TF-IDF公式,互信息量公式以及信息增益公式进行了分析。提出一种基于词频差异的特征选取方法和改进的TF-IDF公式,以提高特征选取质量和文本分类的速度及准确度。  相似文献   

8.
随着互联网的扩展,网络上出现了越来越多的含有观点信息的主观性评论文本。挖掘这些文本中的情感词语并进行极性判别具有重要的现实意义和商业价值。为此,提出一种基于翻译方法的情感词提取方法,使用汉英机器翻译系统翻译汉语种子情感词典生成候选英语词语,根据WordNet提取候选英语词语的上下位词、同义词或反义词并将这些词语翻译成汉语,进而提取汉语情感词语。另外,依据SentiWordNet判别候选英语词语极性,并将候选英语词语极性映射到目标汉语情感词语上,进而达到判别汉语情感词语极性的目的。实验结果表明上述方法可以有效提高情感词的识别效率以及极性判别的准确率。  相似文献   

9.
文本特征选择是文本分类的核心技术。针对信息增益模型的不足之处,以特征项的频数在文本中不同层面的分布为依据,分别从特征项基于文本的类内分布、基于词频的类内分布以及词频的类间分布等角度对IG模型逐步进行改进,提出了一种基于词频分布信息的优化IG特征选择方法。随后的文本分类实验验证了提出的优化IG模型的有效性。  相似文献   

10.
11.
This paper presents a unified framework for uncertainty quantification (UQ) in microelectromechanical systems (MEMS). The goal is to model uncertainties in the input parameters of micromechanical devices and to quantify their effect on the final performance of the device. We consider different electromechanical actuators that operate using a combination of electrostatic and electrothermal modes of actuation, for which high-fidelity numerical models have been developed. We use a data-driven framework to generate stochastic models based on experimentally observed uncertainties in geometric and material parameters. Since we are primarily interested in quantifying the statistics of the output parameters of interest, we develop an adaptive refinement strategy to efficiently propagate the uncertainty through the device model, in order to obtain quantities like the mean and the variance of the stochastic solution with minimal computational effort. We demonstrate the efficacy of this framework by performing UQ in some examples of electrostatic and electrothermomechanical microactuators. We also validate the method by comparing our results with experimentally determined uncertainties in an electrostatic microswitch. We show how our framework results in the accurate computation of uncertainties in micromechanical systems with lower computational effort.  相似文献   

12.
This paper presents the finger-knuckle-print (FKP) recognition system which comprises three functional phases namely: (1) novel technique for the feature extraction based on the structure function, (2) new classifier based on Triangular norms (T-norms), (3) novel techniques for the rank level fusion. The features derived from the structure function capture the variation in the texture of FKP. We have also proposed a classifier based on Frank T-norm which addresses the uncertainty in the intensity levels of image. We have also adapted the Choquet integral for the rank level fusion to improve further the identification accuracy of the individual FKP. The Choquet integral has never been used for the rank level fusion in the literature. The fuzzy densities will be learned using the reinforced hybrid bacterial foraging-particle swarm optimization (BF-PSO). The integral takes care of the overlapping information between the different instances of FKPs. We have also proposed the use of entropy based function for the rank level fusion. The rigorous experimental results of the rank level fusion show the significant improvement in the identification accuracy.  相似文献   

13.
近年来,针对涉众型非法金融活动在资金交易规律的研究引起了研究者的高度关注。为解决利用银行交易数据进行异常账户犯罪团伙主动发现的问题,提出一种基于银行账户非对称亲密度网络的团伙预测方法。首先,建立银行账户交易通用网络模型,将时序交易数据嵌入网络结构中。然后,利用节点的直接和间接交易关系信息,提出一种账户非对称亲密度计算方法。最终,利用节点在亲密度网络上的非对称交互信息,得到节点的异常倾向性指标。在包含传销团伙的真实数据上的实验结果表明,基于亲密度网络的团伙预测方法能有效发现潜在传销人员。  相似文献   

14.
给出了一个基于HMM和GMM双引擎识别模型的维吾尔语联机手写体整词识别系统。在GMM部分,系统提取了8-方向特征,生成8-方向特征样式图像、定位空间采样点以及提取模糊的方向特征。在对模型精细化迭代训练之后,得到GMM模型文件。HMM部分,系统采用了笔段特征的方法来获取笔段分段点特征序列,在对模型进行精细化迭代训练后,得到HMM模型文件。将GMM模型文件和HMM模型文件分别打包封装再进行联合封装成字典。在第一期的实验中,系统的识别率达到97%,第二期的实验中,系统的识别率高达99%。  相似文献   

15.
Dr. G. Keller 《Computing》1982,28(3):199-211
A general class of piecewise functions is described which leads to the same order of convergence of collocation methods as piecewise polynomials. This order only depends on the collocation points used.  相似文献   

16.
介绍基于FPGA可编程单元设计的一种占空比在整数范围内可调、分频比最小单位为0.1的可控分频器的设计,并对该分频器的精度进行了分析,提出了一种能大幅度提高小数分频精度的改进型双模小数分频法——整数/半整数转换双模小数分频法。设计在通过仿真、综合、后仿真后,在Xilinx FPGA上实现。  相似文献   

17.
International Journal of Information Security - The problem of attacks on neural networks through input modification (i.e., adversarial examples) has attracted much attention recently. Being...  相似文献   

18.
俞斌  贾雅琼 《微计算机信息》2007,23(20):152-154
本文提出了基于TMS320VC5402的语音识别系统方案.整个系统以TMS320VC5402为核心电路进行设计,由TLC320AD50C进行A/D转换.由TMS320VC5402识别语音信号,然后和机器人通信,并由AT89S52控制LCD显示识别结果.  相似文献   

19.
As the amount of online Chinese contents grows, there is a critical need for effective Chinese word segmentation approaches to facilitate Web computing applications in a range of domains including terrorism informatics. Most existing Chinese word segmentation approaches are either statistics-based or dictionary-based. The pure statistical method has lower precision, while the pure dictionary-based method cannot deal with new words beyond the dictionary. In this paper, we propose a hybrid method that is able to avoid the limitations of both types of approaches. Through the use of suffix tree and mutual information (MI) with the dictionary, our segmenter, called IASeg, achieves high accuracy in word segmentation when domain training is available. It can also identify new words through MI-based token merging and dictionary updating. In addition, with the proposed Improved Bigram method IASeg can process N-grams. To evaluate the performance of our segmenter, we compare it with two well-known systems, the Hylanda segmenter and the ICTCLAS segmenter, using a terrorism-centric corpus and a general corpus. The experiment results show that IASeg performs better than the benchmarks in both precision and recall for the domain-specific corpus and achieves comparable performance for the general corpus.  相似文献   

20.
网购评语是消费者对网购商品的直接反馈,从中挖掘有价值的知识有助于为商家开展精准化营销和个性化推荐服务、消费者制定购买决策等提供依据。鉴于此,以国内大型综合型电商平台上服装类网购评语为研究对象,对评语分词、筛选高频词,分析高频词之间的共现关系,构建高频词共现网络,分析得出网络评语的热点词多个结构特征和评语网络中少数节点对网络的运行起着主导的作用,为网购评语挖掘研究领域提供了按照网购评语高频词共现网络的结构特性对销量的交互影响进行研究的思路。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号