首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
International crime and terrorism have drawn increasing attention in recent years. Retrieving relevant information from criminal records and suspect communications is important in combating international crime and terrorism. However, most of this information is written in languages other than English and is stored in various locations. Information sharing between countries therefore presents the challenge of cross-lingual semantic interoperability. In this work, we propose a new approach – the associate constraint network – to generate a cross-lingual concept space from a parallel corpus, and benchmark it with a previously developed technique, the Hopfield network. The associate constraint network is a constraint programming based algorithm, and the problem of generating the cross-lingual concept space is formulated as a constraint satisfaction problem. Nodes and arcs in an associate constraint network represent extracted terms from parallel corpora and their associations. Constraints are defined for the nodes in the associate constraint network, and node consistency and network satisfaction are also defined. Backmarking is developed to search for a feasible solution. Our experimental results show that the associate constraint network outperforms the Hopfield network in precision, recall and efficiency. The cross-lingual concept space that is generated with this method can assist crime analysts to determine the relevance of criminals, crimes, locations and activities in multiple languages, which is information that is not available in traditional thesauri and dictionaries.  相似文献   

2.
The Web is a universal repository of human knowledge and culture which has allowed unprecedented sharing of ideas and information in a scale never seen before. It can also be considered as a universal digital library interconnecting digital libraries in multiple domains and languages. Beside the advance of information technology, the global economy has also accelerated the development of inter-organizational information systems. Managing knowledge obtained in multilingual information systems from multiple geographical regions is an essential component in the contemporary inter-organization information systems. An organization cannot claim itself to be a global organization unless it is capable to overcome the cultural and language barriers in their knowledge management. Cross-lingual semantic interoperability is a challenge in multilingual knowledge management systems. Dictionary is a tool that is widely utilized in commercial systems to cross the language barrier. However, terms available in dictionary are always limited. As language is evolving, there are new words being created from time to time. For examples, there are new technical terms and name entities such as RFID and Baidu. To solve the problem of cross-lingual semantic interoperability, an associative constraint network approach is investigated to construct an automatic cross-lingual thesaurus. In this work, we have investigated the backmarking algorithm and the forward evaluation algorithm to resolve the constraint satisfaction problem represented by the associative constraint network. Experiments have been conducted and show that the forward evaluation algorithm outperforms the backmarking one in terms of precision and recall but the backmarking algorithm is more efficient than the forward evaluation algorithm. We have also benchmarked with our earlier technique, Hopfield network, and showed that the associate constraint network (either backmarking or forward evaluation) outperforms in precision, recall, and efficiency.  相似文献   

3.
It has long been a dream to have available a single, centralized, semantic thesaurus or terminology taxonomy to support research in a variety of fields. Much human and computational effort has gone into constructing such resources, including the original WordNet and subsequent wordnets in various languages. To produce such resources one has to overcome well-known problems in achieving both wide coverage and internal consistency within a single wordnet and across many wordnets. In particular, one has to ensure that alternative valid taxonomizations covering the same basic terms are recognized and treated appropriately. In this paper we describe a pipeline of new, powerful, minimally supervised, automated algorithms that can be used to construct terminology taxonomies and wordnets, in various languages, by harvesting large amounts of online domain-specific or general text. We illustrate the effectiveness of the algorithms both to build localized, domain-specific wordnets and to highlight and investigate certain deeper ontological problems such as parallel generalization hierarchies. We show shortcomings and gaps in the manually-constructed English WordNet in various domains.  相似文献   

4.
Computing the semantic similarity/relatedness between terms is an important research area for several disciplines, including artificial intelligence, cognitive science, linguistics, psychology, biomedicine and information retrieval. These measures exploit knowledge bases to express the semantics of concepts. Some approaches, such as the information theoretical approaches, rely on knowledge structure, while others, such as the gloss-based approaches, use knowledge content. Firstly, based on structure, we propose a new intrinsic Information Content (IC) computing method which is based on the quantification of the subgraph formed by the ancestors of the target concept. Taxonomic measures including the IC-based ones consume the topological parameters that must be extracted from taxonomies considered as Directed Acyclic Graphs (DAGs). Accordingly, we propose a routine of graph algorithms that are able to provide some basic parameters, such as depth, ancestors, descendents, Lowest Common Subsumer (LCS). The IC-computing method is assessed using several knowledge structures which are: the noun and verb WordNet “is a” taxonomies, Wikipedia Category Graph (WCG), and MeSH taxonomy. We also propose an aggregation schema that exploits the WordNet “is a” taxonomy and WCG in a complementary way through the IC-based measures to improve coverage capacity. Secondly, taking content into consideration, we propose a gloss-based semantic similarity measure that operates based on the noun weighting mechanism using our IC-computing method, as well as on the WordNet, Wiktionary and Wikipedia resources. Further evaluation is performed on various items, including nouns, verbs, multiword expressions and biomedical datasets, using well-recognized benchmarks. The results indicate an improvement in terms of similarity and relatedness assessment accuracy.  相似文献   

5.
Concrete concepts are often easier to understand than abstract concepts. The notion of abstractness is thus closely tied to the organisation of our semantic memory, and more specifically our internal lexicon, which underlies our word sense disambiguation (WSD) mechanisms. State-of-the-art automatic WSD systems often draw on a variety of contextual cues and assign word senses by an optimal combination of statistical classifiers. The validity of various lexico-semantic resources as models of our internal lexicon and the cognitive aspects pertinent to the lexical sensitivity of WSD are seldom questioned. We attempt to address these issues by examining psychological evidence of the internal lexicon and its compatibility with the information available from computational lexicons. In particular, we compare the responses from a word association task against existing lexical resources, WordNet and SUMO, to explore the relation between sense abstractness and semantic activation, and thus the implications on semantic network models and the lexical sensitivity of WSD. Our results suggest that concrete senses are more readily activated than abstract senses, and broad associations are more easily triggered than narrow paradigmatic associations. The results are expected to inform the construction of lexico-semantic resources and WSD strategies.  相似文献   

6.
This paper presents a novel approach to improve the interoperability between four semantic resources that incorporate predicate information. Our proposal defines a set of automatic methods for mapping the semantic knowledge included in WordNet, VerbNet, PropBank and FrameNet. We use advanced graph-based word sense disambiguation algorithms and corpus alignment methods to automatically establish the appropriate mappings among their lexical entries and roles. We study different settings for each method using SemLink as a gold-standard for evaluation. The results show that the new approach provides productive and reliable mappings. In fact, the mappings obtained automatically outnumber the set of original mappings in SemLink. Finally, we also present a new version of the Predicate Matrix, a lexical-semantic resource resulting from the integration of the mappings obtained by our automatic methods and SemLink.  相似文献   

7.
The Princeton WordNet® (PWN) is a widely used lexical knowledge database for semantic information processing. There are now many wordnets under creation for languages worldwide. In this paper, we endeavor to construct a wordnet for Pre-Qin ancient Chinese (PQAC), called PQAC WordNet (PQAC-WN), to process the semantic information of PQAC. In previous work, most recently constructed wordnets have been established either manually by experts or automatically using resources from which translation pairs between English and the target language can be extracted. The former method, however, is time-consuming, and the latter method, owing to a lack of language resources, cannot be performed on PQAC. As a result, a method based on word definitions in a monolingual dictionary is proposed. Specifically, for each sense, kernel words are first extracted from its definition, and the senses of each kernel word are then determined by graph-based Word Sense Disambiguation. Finally, one optimal sense is chosen from the kernel word senses to guide the mapping between the word sense and PWN synset. In this research, we obtain 66 % PQAC senses that can be shared with English and another 14 % language-specific senses that were added to PQAC-WN as new synsets. Overall, the automatic mapping achieves a precision of over 85 %.  相似文献   

8.
Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies.  相似文献   

9.
提出了一种以概念相关性为主要依据的名词消歧算法。与现有算法不同的是,该算法在WordNet上对两个语义之间的语义距离进行了拓展,定义了一组语义之间的语义密度,从而量化了一组语义之间的相关性。将相关性转化为语义密度后,再进行消歧。还提出了一种在WordNet上的类似LSH的语义哈希,从而大大降低了语义密度的计算复杂度以及整个消歧算法的计算复杂度。在SemCor上对该算法进行了测试和评估。  相似文献   

10.
Ontologies are widely considered as the building blocks of the semantic web, and with them, comes the data interoperability issue. As ontologies are not necessarily always labelled in the same natural language, one way to achieve semantic interoperability is by means of cross-lingual ontology mapping. Translation techniques are often used as an intermediate step to translate the conceptual labels within an ontology. This approach essentially removes the natural language barrier in the mapping environment and enables the application of monolingual ontology mapping tools. This paper shows that the key to this translation-based approach to cross-lingual ontology mapping lies with selecting appropriate ontology label translations in a given mapping context. Appropriateness of the translations in the context of cross-lingual ontology mapping differs from the ontology localisation point of view, as the former aims to generate correct mappings whereas the latter aims to adapt specifications of conceptualisations to target communities. This paper further demonstrates that the mapping outcome using the translation-based cross-lingual ontology mapping approach is conditioned on the translations selected for the intermediate label translation step. In particular, this paper presents the design, implementation and evaluation of a novel cross-lingual ontology mapping system: SOCOM++. SOCOM++ provides configurable properties that can be manipulated by a user in the process of selecting label translations in an effort to adjust the subsequent mapping outcome. It is shown through the evaluation that for the same pair of ontologies, the mappings between them can be adjusted by tuning the translations for the ontology labels. This finding is not yet shown in the previous research.  相似文献   

11.
A Semantic Network of English: The Mother of All WordNets   总被引:1,自引:0,他引:1  
We give a brief outline of the design and contents of the English lexical database WordNet, which serves as a model for similarly conceived wordnets in several European languages. WordNet is a semantic network, in which the meanings of nouns, verbs, adjectives, and adverbs are represented in terms of their links to other (groups of) words via conceptual-semantic and lexical relations. Each part of speech is treated differently reflecting different semantic properties. We briefly discuss polysemy in WordNet, and focus on the case of meaning extensions in the verb lexicon. Finally, we outline the potential uses of WordNet not only for applications in natural language processing, but also for research in stylistic analyses in conjunction with a semantic concordance.  相似文献   

12.
Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun “is a” taxonomy, the nominalization relation allowing the use of verb “is a” taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.  相似文献   

13.
该文提出了一种以商品评论为对象的基于语义融合的跨语言情感分类算法。该算法首先从短文本语义表示的角度出发,基于开源工具Word2Vec预先生成词嵌入向量来获得不同语言下的信息表示;其次,根据不同语种之间的词向量的统计关联性提出使用自联想记忆关系来融合提取跨语言文档语义;然后利用卷积神经网络的局部感知性和权值共享理论,融合自联想记忆模型下的复杂语义表达,从而获得不同长度的短语融合特征。深度神经网络将能够学习到任意语种语义的高层特征致密组合,并且输出分类预测。为了验证算法的有效性,将该模型与最新几种模型方法的实验结果进行了对比。实验结果表明,此模型适用于跨语言情感语料正负面情感分类,实验效果明显优于现有的其他算法。  相似文献   

14.
Legal information retrieval is in need of the provision of legal knowledge for the improvement of search strategies. For this purpose, the LOIS project is concerned with the construction of a multilingual WordNet for cross-lingual information retrieval in the legal domain. In this article, we set out how a hybrid approach, featuring lexically and legally grounded conceptual representations, can fit the cross-lingual information retrieval needs of both legal professionals and laymen  相似文献   

15.
对基于向量空间模型的检索方法进行改进,提出基于本体语义的信息检索模型。将WordNet词典作为参照本体来计算概念之间的语义相似度,依据查询中标引项之间的相似度,对查询向量中的标引项进行权值调整,并参照Word-Net本体对标引项进行同义和上下位扩展,在此基础上定义查询与文档间的相似度。与传统的基于词形的信息检索方法相比,该方法可以提高语义层面上的检索精度。  相似文献   

16.
语义相似度计算的应用范围广泛,从心理学、语言学、认知科学到人工智能都有其应用.提出了仅依赖于知网(HowNet)的信息量计算来估计两个词汇间的语义相似度.经实验证明,相比于传统的基于词网(WordNet)和大型语料库的计算信息量来估计语义相似度的算法,本文的算法更容易计算,并更接近于人工的语义相似度判断.  相似文献   

17.
Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet’s noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.  相似文献   

18.
Information access methods must be improved to overcome theinformation overload that most professionals face nowadays. Textclassification tasks, like Text Categorization, help the usersto access to the great amount of text they find in the Internetand their organizations.TC is the classification of documents into a predefined set ofcategories. Most approaches to automatic TC are based on theutilization of a training collection, which is a set of manuallyclassified documents. Other linguistic resources that areemerging, like lexical databases, can also be used forclassification tasks. This article describes an approach to TCbased on the integration of a training collection (Reuters-21578)and a lexical database (WordNet 1.6) as knowledge sources.Lexical databases accumulate information on the lexical items ofone or several languages. This information must be filtered inorder to make an effective use of it in our model of TC. Thisfiltering process is a Word Sense Disambiguation task. WSDis the identification of the sense of words in context. This taskis an intermediate process in many natural language processingtasks like machine translation or multilingual informationretrieval. We present the utilization of WSD as an aid for TC. Ourapproach to WSD is also based on the integration of two linguisticresources: a training collection (SemCor and Reuters-21578) and alexical database (WordNet 1.6).We have developed a series of experiments that show that: TC andWSD based on the integration of linguistic resources are veryeffective; and, WSD is necessary to effectively integratelinguistic resources in TC.  相似文献   

19.
基于深度学习的跨语言情感分析模型需要借助预训练的双语词嵌入(Bilingual Word Embedding,BWE)词典获得源语言和目标语言的文本向量表示.为了解决BWE词典较难获得的问题,该文提出一种基于词向量情感特征表示的跨语言文本情感分析方法,引入源语言的情感监督信息以获得源语言情感感知的词向量表示,使得词向量...  相似文献   

20.
This paper presents a three-step approach to enhance interoperability between heterogeneous se- mantic resources. Firstly, we construct homogeneous representations of these resources in a pivot format, namely OWL DL, with respect to the semantics expressed by the original representation languages. Secondly, mappings are established between concepts of these standardised resources and stored in a so-called articulation ontology. Thirdly, an approach for ranking those mappings is suggested in order to best fit users' needs. This approach is currently being implemented in the Semantic Resources “Interoperabilisation” and Linking System (SRILS). The mapping results as well as the work to be done are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号