首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 281 毫秒
1.
现代汉语存在着许多歧义短语结构,仅依靠句中词性标记无法获得词与词之间正确的搭配关系。本文研究了大量包含歧义的短语实例,分析了计算机处理汉语结构时面临的定界歧义和结构关系歧义问题,在已有短语结构规则的基础上归纳出了七种结构歧义模式,提出了分析歧义模式的关键是四种基本搭配信息的判断,并实现了基于语义知识和搭配知识的消歧算法。对887处短语进行排歧的实验结果表明,处理短语结构的正确率由82.30%上升到87.18%。  相似文献   

2.
汉语基本短语的自动识别   总被引:20,自引:10,他引:20  
本文应用基于实例的MBL(Memory-Based Learning)学习方法,对汉语中较常见的9种基本短语的边界及类别进行识别,并利用短语内部构成结构和词汇信息对预测中出现的边界歧义和短语类型歧义进行了排歧处理。实验中还比较了在特征向量中加入词汇信息与否对实验结果的影响。实验取得了比较令人满意的结果:对这9种基本短语的识别正确率达到95.2%;召回率达到93.7%。  相似文献   

3.
结合句法组成模板识别汉语基本名词短语的概率模型   总被引:5,自引:0,他引:5  
文中首先给出了汉语基本名词短语的形式化定义,并通过抽取baseNP句法组成模板,显示了这个定义的可操作性,文中指出,句法组成模板只是识别baseNP的必要条件,而非充要条件,仅靠句法组成模板并不能解决baseNP识别中的边界模糊歧义和短语类型歧义问题。据此,把体现baseNP内部组成的句法组成模板与体现上下文约束条件的N元模型结合起来,形成了汉语baseNP识别的新模型。实验证明,该模型的性能优于  相似文献   

4.
汉语短语标注标记集的确定   总被引:25,自引:9,他引:16  
本文提出了一个汉语短语标注的基本标记集, 并从句法功能和结构组成方面对不同短语的性质进行了深入的分析和探讨, 以期为汉语短语划分和标注的自动处理和人工校对提供一个统一的处理标准。  相似文献   

5.
提出了一个汉语基本短语分析模型,将汉语短语的边界划分和短语标识分开,假定这两个过程相互独立,采用最大熵方法分别建立模型解决。最大熵模型的关键是如何选取有效的特征,文中给出了两个步骤相关的特征空间以及特征选择过程和算法。实验表明,模型的短语定界精确率达到95.27%,标注精确率达到96.2%。  相似文献   

6.
本文进行了定中结构名词短语的分析,介绍了汉语短语的两种不同分类标准以及名词短语常见形式,提出了汉语定中结构名词短语(MHNP)的形式化定义,并据结构特点对其进行了分类。通过对人工标注语料库的分析、统计和归纳,利用基于特征词驱动的方法对带“的”字定中结构名词短语进行了自动标注。  相似文献   

7.
作为一种重要的短语类型,介词短语在汉语中分布广泛,正确识别汉语介词短语对自然语言处理领域的很多任务和应用都有重要的作用和意义。该文对近些年与识别汉语介词短语有关的研究做了梳理,从研究对象、实验评价标准和具体研究方法等几个方面比较详细地介绍了相关工作,最后归纳了汉语介词短语识别研究中表现出来的一些特点,并对未来研究的发展提出了几点建议。  相似文献   

8.
对蒙古语语料库基本名词短语的定界与统计分析   总被引:2,自引:0,他引:2  
解决蒙古语基本名词短语的定界问题,是在蒙古语词性标注语料库的基础上进行的探索性研究。基本名词短语的内部结构信息对其定界问题具有重要作用。确定基本名词短语内部结构的因素有多种,但基本名词短语成分的词类信息是最基本的因素。我们以词类信息为核心,附加一些限定条件,构建识别基本名词短语的形式规则集,并在实际语料中进行基本名词短语标注测试。  相似文献   

9.
评价短语是评价因子之一,是汉语倾向性研究的重要组成部分。评价短语可以分为“评价词+评价词”、“修饰词+评价词”、“普通词+评价词”、“修饰词+普通词”、“普通词+普通词”5类。评价短语类型不同,采用的倾向性分析策略也不同。短语计算规则和短语评价词典的互动是该文采用的基本方法。在制定短语计算规则时应遵守共性与个性相结合的原则;建立短语评价词典时应遵循最小评价因子原则。实验证明,短语计算规则与短语词典的建立提高了倾向性分析系统的准确率,是一种行之有效的方法。  相似文献   

10.
汉语复合名词短语因其使用范围广泛、结构独特、内部语义复杂的特点,一直是语言学分析和中文信息处理领域的重要研究对象。国内关于复合名词短语的语言资源极其匮乏,且现有知识库只研究名名复合形式的短语,包含动词的复合名词短语的知识库构建仍处于空白阶段,同时现有的复合名词短语知识库大部分脱离了语境,没有句子级别的信息。针对这一现状,该文从多个领域搜集语料,建立了一套新的语义关系体系,标注构建了一个具有相当规模的带有句子信息的基本复合名词语义关系知识库。该库的标注重点是标注句子中基本复合名词短语的边界以及短语内部成分之间的语义关系,总共收录27 007条句子。该文对标注后的知识库做了详细的计量统计分析。最后基于标注得到的知识库,使用基线模型对基本复合名词短语进行了自动定界和语义分类实验,并对实验结果和未来可能的改进方向做了总结分析。  相似文献   

11.
高频最大交集型歧义切分字段在汉语自动分词中的作用   总被引:41,自引:9,他引:41  
交集型歧义切分字段是影响汉语自动分词系统精度的一个重要因素。本文引入了最大交集型歧义切分字段的概念,并将之区分为真、伪两种主要类型。考察一个约1亿字的汉语语料库,我们发现,最大交集型歧义切分字段的高频部分表现出相当强的覆盖能力及稳定性:前4,619个的覆盖率为59.20% ,且覆盖率受领域变化的影响不大。而其中4,279个为伪歧义型,覆盖率高达53.35%。根据以上分析,我们提出了一种基于记忆的、高频最大交集型歧义切分字段的处理策略,可有效改善实用型非受限汉语自动分词系统的精度。  相似文献   

12.
汉语分词中组合歧义字段的研究   总被引:6,自引:0,他引:6  
汉语自动分词中组合歧义是难点问题,难在两点: 组合歧义字段的发现和歧义的消解。本文研究了组合歧义字段在切开与不切时的词性变化规律,提出了一种新的组合歧义字段自动采集方法,实验结果表明该方法可以有效地自动发现组合歧义字段,在1998年1月《人民日报》中就检测到400多个组合歧义字段,远大于常规方法检测到的歧义字段数目。之后利用最大熵模型对60个组合歧义字段进行消歧,考察了六种特征及其组合对消歧性能的影响,消歧的平均准确度达88.05%。  相似文献   

13.
交集型分词歧义是汉语自动分词中的主要歧义类型之一。现有的汉语自动分词系统对它的处理能力尚不能完全令人满意。针对交集型分词歧义,基于通用语料库的考察目前已有不少,但还没有基于专业领域语料库的相关考察。根据一个中等规模的汉语通用词表、一个规模约为9亿字的通用语料库和两个涵盖55个专业领域、总规模约为1.4亿字的专业领域语料库,对从通用语料库中抽取的高频交集型歧义切分字段在专业领域语料库中的统计特性,以及从专业领域语料库中抽取的交集型歧义切分字段关于专业领域的统计特性进行了穷尽式、多角度的考察。给出的观察结果对设计面向专业领域的汉语自动分词算法具有一定的参考价值。  相似文献   

14.
There has been increasing interest in using Stackelberg game (known as a security game) to allocate limited security resources against different attacker types with a specific probability distribution. However, real problems of this kind often face ambiguous information, such as imprecise, unreliable and absent payoffs, and ambiguous assignments of these payoffs. To this end, based on decision theory and the Dempster–Shafer theory of evidence, this paper proposes a novel framework that can handle these common types of ambiguity. More specifically, this paper deploys the underlying principles of existing rules from decision theory, as a way to characterise different attitudes to ambiguity, during the transformation of ambiguous payoffs into point‐valued payoffs. Hence, our framework holds some good properties: (i) it subsumes traditional security games without ambiguous payoffs, (ii) a uniform margin of error will not affect the results and (iii) the influence of complete ignorance can be minimised. Also, our framework is evaluated by using nine different transformation rules, under various conditions and constraints, against 73,000 randomly generated games (a first comprehensive empirical evaluation to date). The evaluation reveals the benefits of each transformation rule and confirms that different rules can model individuals' different attitudes to ambiguity.  相似文献   

15.
16.
Ambiguity, defined in this study as the existence of two or more interpretations of the same cue, is an essential component of ‘fuzziness’ in new product development (NPD) projects. In this paper, we present a model by which ambiguity in NPD projects can be classified and managed. The model has been developed grounded in case data from four NPD projects in companies making medical devices. Ambiguity is classified according to two axes: subjects of ambiguity and sources of ambiguity. Subjects of ambiguity include product, market, process and organizational resources. Sources of ambiguity include multiplicity, novelty, validity and reliability. Ambiguity can be managed by two means: reducing or sustaining it. If clarity is a main priority in the NPD project, reducing ambiguity is necessary and can be effectively achieved by applying the hypothetical‐deductive method. If novelty and flexibility are high project priorities, sustaining certain ambiguities can be useful. Managing ambiguity requires a constant harmonizing of the need for clarity and the need for novelty and flexibility.  相似文献   

17.
In this paper, we present a typology of ambiguity in Chinese, which includes morphological, lexical, syntactic, semantic, and contextual ambiguities. Examples are shown for each type of ambiguity and sometimes for subtypes. Ambiguity resolution strategies used in the ALICE machine translation system are presented in various levels of detail. A disambiguation model, called Four-Step, is proposed for resolving syntactic ambiguities involving serial verb construction and predication. As the name suggests, the model comprises four steps-well-formedness checking, preference for argument readings, precondition checking, and late closure. For resolving semantic ambiguity, we propose a new formalism, called Semantic Functional Grammar (SFG), to deal with the resolution problem. SFG integrates the concept of Semantic Grammar into Lexical-Functional Grammar (LFG) such that the functional structure (f-structures) include semantic functions in addition to grammatical functions. For dealing with lexical and contextual ambiguities, we briefly describe the mechanisms used in the ALICE system. As for morphological ambiguity, the resolution is a problem of word-boundary decision (segmentation) and is beyond the scope of this research. The mechanisms presented in the paper have been successfully applied to the translation of Chinese news headlines in the ALICE system.This research was supported partly by the Industrial Technology Research Institute, Taiwan under a grant for doctoral study to this author.  相似文献   

18.
目前,实体识别与依存关系分析,采用的主要是基于监督学习的深度端到端方法.这种方法存在两个问题:不能引入背景知识;不能识别出自然语言的多粒度、嵌套特征.为了解决以上问题,提出了基于短语窗口的依存句法标注规则,并标注了中文短语窗口数据集(CPWD),同时设计了配套的多维端到端短语识别模型(MDM模型).该标注规则以短语为最...  相似文献   

19.
Bayesian games can handle the incomplete information about players' types. However, in real life, the information could be not only incomplete but also ambiguous for lack of sufficient evidence, i.e., a player cannot have a precise probability about each type of the other players. To address this issue, this paper firstly extends the Bayesian games to ambiguous Bayesian games. Then, we introduce the concept of a solution to this kind of games and discuss their properties, especially about solution existence, how the ambiguity degree and players' ambiguity attitude influence the outcomes of an ambiguous Bayesian game, the case of lower boundary probability, and the missing situation. We also illustrate our game model, especially in the public security domain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号