首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
基于搭配模式的汉语词性标注规则的获取方法   总被引:2,自引:0,他引:2  
文章介绍了一种基于搭配模式的汉语词性标注规则的获取方法。该方法从已标注了词性的语料库中自动获取候选搭配模式规则,然后根据可信度从候选规则中选择出大于某阈值的规则,再通过不断测试新语料来完善规则。将获取的规则用于汉语的词性标注,使标注的正确率得到了明显提高。  相似文献   

2.
汉语语料词性标注自动校对方法的研究   总被引:6,自引:0,他引:6  
兼类词的词类排歧是汉语语料词性标注中的难点问题,它严重影响语料的词性标注质量。针对这一难点问题,本文提出了一种兼类词词性标注的自动校对方法。它利用数据挖掘的方法从正确标注的训练语料中挖掘获取有效信息,自动生成兼类词词性校对规则,并应用获取的规则实现对机器初始标注语料的自动校对,从而提高语料中兼类词的词性标注质量。分别对50万汉语语料做封闭测试和开放测试,结果显示,校对后语料的兼类词词性标注正确率分别可提高11.32%和5.97%。  相似文献   

3.
提出了一种从正确标注的训练语料中自动获取兼类词词性较对规则的方法 ,并设计和实现了相应的词性自动校对系统。通过对中文文本进行自动校对 ,进一步提高其词性标注质量  相似文献   

4.
藏文词性自动标注是藏文信息处理后续句法分析、语义分析及篇章分析必不可少的基础工作。词性歧义问题的处理是藏文词性自动标注的关键所在,也是藏文信息处理的难点问题。对藏文词性标注中词性歧义问题进行了分析研究,提出了符合藏丈语法规则实用于藏文词性标注的解决词性排岐方法。实验证明:该处理方法在藏文词性自动标注中对词性排岐方面有较好的效果,使藏文词性标注正确率有了一定的提高。  相似文献   

5.
为实现篇章连贯语义关系的判定与自动标注,提出一种综合运用关联词多种语法信息的自动标注方法。该方法利用关联词的词性分布规则排除非关联词,标注出潜在关联词,对比关联词库中的模式表,并综合利用搭配距离、搭配强度和句法位置获取合法的篇章连贯模式,在此基础上标注出其语义关系。通过实验验证了该方法的有效性。  相似文献   

6.
词性标注有很多不同的研究方法,目前的维吾尔语词性标注方法都以基于规则的方法为主,其准确程度尚不能完全令人满意。在大规模人工标注的语料库的基础之上,研究了基于N元语言模型的维吾尔语词性自动标注的方法,分析了N元语言模型参数的选取以及数据平滑,比较了二元、三元文法模型对维吾尔语词性标注的效率;研究了标注集和训练语料规模对词性标注正确率的影响。实验结果表明,用该方法对维吾尔语进行词性标注有良好的效果。  相似文献   

7.
该文针对古汉语文本小、句简短、模式性强的结构特点,提出了一种基于“词-词性”匹配模式获取的快速树库构建方法,将句法标注过程规约为获取候选匹配模式、制定句法转换规则、自动生成句法树和最终人工校对四个步骤。该方法可大大缩减人工标注工作量,节省树库构建的工程成本,且所获取的匹配规则在古汉语教学研究中具有一定的实用价值。  相似文献   

8.
词性兼类是自动词性标注过程的关键所在,特别是确定未登录词词性的正确率对整个标注效果有很大的影响.对兼类词排歧方法进行了研究,针对统计和规则两种方法各自的优点和局限,提出运用隐马尔科夫模型和错误驱动学习方法相结合自动标注方法,最后介绍了如何通过这种方法在只有一个词库的有限条件下进行词性标注和未登录词的词性猜测.实验结果表明,该方法能有效提高未登录词词性标注的正确率.  相似文献   

9.
针对目前汉语兼类词标注的准确率不高的问题,提出了规则与统计模型相结合的兼类词标注方法。首先,利用隐马尔可夫、最大熵和条件随机场3种统计模型进行兼类词标注;然后,将改进的互信息算法应用到词性(POS)标注规则的获取上,通过计算目标词前后词单元与目标词的相关性获得词性标注规则;最后,将获取的规则与基于统计模型的词性标注算法结合起来进行兼类词标注。实验结果表明加入规则算法之后,平均词性标注准确率提升了5%左右。  相似文献   

10.
基于边界点词性特征统计的韵律短语切分   总被引:10,自引:6,他引:4  
由于基于规则方法的文本处理系统在系统建立时需要总结大量的规则,而且很难保证它在处理大规模真实文本时的强壮性,因此本文在使用统计方法进行韵律短语切分方面做了一些有益的探索。先对文本进行自动分词和自动词性标注,然后利用从已经经过人工标注的语料库中得到的韵律短语切分点的边界模式以及概率信息,对文本中的韵律短语切分点进行自动预测,最后利用规则进行适当的纠错。通过对一千句的真实文本进行封闭和开放测试, 词性标注的正确率在95%左右,韵律短语切分的召回率在60%左右,正确率达到了80%。  相似文献   

11.
传统关联规则挖掘在面临分类决策问题时,易出现非频繁规则遗漏、预测精度不高的问题。为得到正确合理且更为完整的规则,提出了一种改进方法 DT-AR(decision tree-association rule algorithm),利用决策树剪枝策略对关联规则集进行补充。该方法利用FP-Growth(frequent pattern growth)算法得到关联规则集,利用C4.5算法构建后剪枝决策树并提取分类规则,在进行置信度迭代筛选后与关联规则集取并集修正,利用置信度作为权重系数采取投票法进行分类。实验结果表明,与传统关联规则挖掘和决策树剪枝方法相比,该方法得到的规则在数据集分类结果上更准确。  相似文献   

12.
基于粗糙集的数据挖掘,提出了通过统计方法降低边界元素的不确定性程度的方法。该方法依据边界元素的统计规律从属性约简所产生的最小覆盖中选择合适的覆盖形成规则,从而更充分地利用属性约简和数据仓库中的数据资源,提高基于粗糙集的数据挖掘的效果。  相似文献   

13.
现有规则提取方法大多数只能在相容决策系统中提取规则,并且提取出的规则冗余度高、用户不易理解。针对该问题,提出一种基于对象集覆盖的规则提取方法,利用粗糙集理论将对象集划分为相应的等价类,根据属性特征值的一致性程度和相似程度产生有效性规则,通过等价类划分和对象集覆盖解决不相容决策系统的规则提取问题。算例分析结果表明,该方法提取出的规则简单可靠,具有较好的鲁棒性。  相似文献   

14.
Medical data usually contain high degree of uncertainty that leads to problems such as diagnosis disparity. With the continually rising expenditures of health care, controlling the level of the expenditures by improving the diagnosis process is an emergent task and poses a greater challenge. Rough sets have been shown to be a useful technique to deal with data with uncertainty. This paper applies rough sets to identify the set of significant symptoms causing diseases and to induce decision rules using the data of a Taiwan’s otolaryngology clinic. The data are limited to rhinology and throat logy and contain 657 records with each record including 12 condition attributes and one decision attribute. The clinic’s physician agrees with the set of significant symptoms identified and favors the reduct containing one critical symptom. To generate decision rules, we use LEM2 and Explore algorithms and discuss these rules in terms of numbers of rules and implications of certain rules. Our results show that LEM2 generates fewer numbers of certain rules and LEM2’s proportions of approximate rules are higher. We also observe that LEM2’s certain rules form a more sensible pattern that Explore lacks. Discovery of the pattern is considered to be potentially helpful in improving the medical diagnosis.  相似文献   

15.
As an extension of the soft set, the bijective soft set can be used to mine data from soft set environments, and has been studied and applied in some fields. However, only a small proportion of fault data will cause bijective soft sets losing major recognition ability for mining data. Therefore, this study aims to improve the bijective soft set-based data mining method on tolerate-fault-data ability. First some notions and operations of the bijective soft set at a β-misclassification degree is defined. Moreover, algorithms for finding an optimal β, reductions, cores, decision rules and misclassified data are proposed. This paper uses a real problem in gaining shoreline resources evaluation rules to validate the model. The results show that the proposed model has the fault-tolerant ability, and it improves the tolerate-ability of bijective soft set-based data mining method. Moreover, the proposed method can help decision makers to discover fault data for further analysis.  相似文献   

16.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

17.
粗集理论对股票时间序列的知识发现   总被引:3,自引:0,他引:3  
提出了将粗集理论应用于时间序列的知识发现。知识发现的过程包括时间序列数据预处理、属性约简和规则抽取三部分。其中数据预处理主要用信号处理技术清洗数据,然后将清洗后的时间序列按照某个变量的变化趋势进行分割,分割后每个时间段内的变化趋势不变,从而将时间序列转换成为一系列静态模式(每种模式代表一种行为趋势),从而去掉其时间依赖性。把决定各种模式的相关属性抽取出来组成一个适用于粗集理论的信息表,然后采用粗集理论对信息表进行属性约简和规则抽取,所得到的规则可以用于预测时间序列在未来的行为。最后将该方法用于股票的趋势预测,取得良好效果。  相似文献   

18.
王晓鹏 《计算机仿真》2020,37(1):234-238
对区间值属性数据集进行挖掘,可以有效分析出数据之间的关系。针对现有数据挖掘方法未对大规模数据进行聚类,导致挖掘过程占据内存大,挖掘精度低的问题,提出了一种新的区间值属性数据集挖掘算法。对问题定义、数据准备、数据提取、模式预测和数据聚类等模块进行详细分析,完成区间值属性数据聚类。根据聚类结果,将区间值属性数据分成多个数据集,挑选出能够支持最小支持度的项目集,将这些项目集作为频繁项集,进而提取出数据集之间的关联规则,将关联规则融入数据计算步骤,完成数据挖掘。为验证算法效果,进行仿真,结果表明,相较于传统挖掘算法,所提挖掘算法占用容量更小,挖掘精度更高。  相似文献   

19.
Linguistic rules in natural language are useful and consistent with human way of thinking. They are very important in multi-criteria decision making due to their interpretability. In this paper, our discussions concentrate on extracting linguistic rules from data sets. In the end, we firstly analyze how to extract complex linguistic data summaries based on fuzzy logic. Then, we formalize linguistic rules based on complex linguistic data summaries, in which, the degree of confidence of linguistic rules from a data set can be explained by linguistic quantifiers and its linguistic truth from the fuzzy logical point of view. In order to obtain a linguistic rule with a higher degree of linguistic truth, a genetic algorithm is used to optimize the number and parameters of membership functions of linguistic values. Computational results show that the proposed method is an alternative method for extracting linguistic rules with linguistic truth from data sets.  相似文献   

20.
The Extraction of Trading Rules From Stock Market Data Using Rough Sets   总被引:1,自引:0,他引:1  
We propose the rough set approach to the extraction of trading rules for discriminating between bullish and bearish patterns in the stock market. Rough set theory is quite valuable for extracting trading rules because it can be used to discover dependences in data while reducing the effect of superfluous factors in noisy data. In addition, it does not generate a signal to trade when the pattern of the market is uncertain because the selection of reducts and the extraction of rules are controlled by the strength of each reduct and rule. The experimental results are encouraging and show the usefulness of the rough set approach for stock market analysis with respect to profitability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号