首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Text summarization presents several challenges such as considering semantic relationships among words, dealing with redundancy and information diversity issues. Seeking to overcome these problems, we propose in this paper a new graph-based Arabic summarization system that combines statistical and semantic analysis. The proposed approach utilizes ontology hierarchical structure and relations to provide a more accurate similarity measurement between terms in order to improve the quality of the summary. The proposed method is based on a two-dimensional graph model that makes uses statistical and semantic similarities. The statistical similarity is based on the content overlap between two sentences, while the semantic similarity is computed using the semantic information extracted from a lexical database whose use enables our system to apply reasoning by measuring semantic distance between real human concepts. The weighted ranking algorithm PageRank is performed on the graph to produce significant score for all document sentences. The score of each sentence is performed by adding other statistical features. In addition, we address redundancy and information diversity issues by using an adapted version of Maximal Marginal Relevance method. Experimental results on EASC and our own datasets showed the effectiveness of our proposed approach over existing summarization systems.

  相似文献   

2.
刘柏嵩 《计算机工程》2008,34(8):229-231
提出一种通用的多策略本体学习框架,通过对Web上各专业领域文档集进行挖掘来实现本体自动构建。讨论本体学习中本体概念的抽取、概念之间语义关系的抽取和分类体系的自动构建等关键技术,通过实验对算法进行测试和评价。由于集成了多种机器学习算法,该方法在概念抽取和语义关系学习方面具有更高的准确性,采用通用本体WordNet和HowNet作为语料库,可适用于不同的专业领域。通过按需获取Web文档,该方法能实时生成本体。  相似文献   

3.
This study develops an ontology building process for extracting conceptual tags and hierarchies in textual corpus. Though humans have been creating ontologies for many years, efficient ontology building processes in textual corpus are extremely ad hoc. Several issues have identified including how to recognize terminology in textual document, name concept tags in terminologies, and derive conceptual hierarchies among concepts. The proposed approach is extraction technique combinations to produce ontology prototype for editors. The empirical feedback indicates that elicitation synergy is productive during the early stages of building. Additionally, this elicitation synergy is especially useful for ontology editors who lack reference models of a working domain and who encounter textual corpus as major knowledge sources.  相似文献   

4.
Key concept extraction is a major step for ontology learning that aims to build an ontology by identifying relevant domain concepts and their semantic relationships from a text corpus. The success of ontology development using key concept extraction strongly relies on the degree of relevance of the key concepts identified. If the identified key concepts are not closely relevant to the domain, the constructed ontology will not be able to correctly and fully represent the domain knowledge. In this paper, we propose a novel method, named CFinder, for key concept extraction. Given a text corpus in the target domain, CFinder first extracts noun phrases using their linguistic patterns based on Part-Of-Speech (POS) tags as candidates for key concepts. To calculate the weights (or importance) of these candidates within the domain, CFinder combines their statistical knowledge and domain-specific knowledge indicating their relative importance within the domain. The calculated weights are further enhanced by considering an inner structural pattern of the candidates. The effectiveness of CFinder is evaluated with a recently developed ontology for the domain of ‘emergency management for mass gatherings’ against the state-of-the-art methods for key concept extraction including—Text2Onto, KP-Miner and Moki. The comparative evaluation results show that CFinder statistically significantly outperforms all the three methods in terms of F-measure and average precision.  相似文献   

5.
Many natural language processing areas use semantic roles in order to improve the applications of the extracted information, the question answering and the machine translation, etc. In Arabic, the work of constructing the semantic role labeling system or the annotated corpus is extremely limited compared to their speaker’s number and to English language as well. In this paper, we present a supervised method for the semantic role labeling of Arabic sentences. Hence, we use the feedback capacity of the case-based reasoning to annotate new sentences from already annotated ones besides the use of the Arabic PropBank as a reference to the semantic labels. We test our method under a wide range corpus that contains 2332 attributes and 5291 arguments. Accordingly, an Arabic semantic role labeling system is tested, for the first time, in that corpus. As a result, our method shows the ability to annotate new sentences from the labeled sentences or the construction of the annotated corpus.  相似文献   

6.
In this paper, we proposed a novel approach based on topic ontology for tag recommendation. The proposed approach intelligently generates tag suggestions to blogs. In this approach, we construct topic ontology through enriching the set of categories in existing small ontology called as Open Directory Project. To construct topic ontology, a set of topics and their associated semantic relationships is identified automatically from the corpus‐based external knowledge resources such as Wikipedia and WordNet. The construction relies on two folds such as concept acquisition and semantic relation extraction. In the first fold, a topic‐mapping algorithm is developed to acquire the concepts from the semantic of Wikipedia. A semantic similarity‐clustering algorithm is used to compute the semantic similarity measure to group the set of similar concepts. The second is the semantic relation extraction algorithm, which derives associated semantic relations between the set of extracted topics from the lexical patterns between synsets in WordNet. A suitable software prototype is created to implement the topic ontology construction process. A Jena API framework is used to organize the set of extracted semantic concepts and their corresponding relationship in the form of knowledgeable representation of Web ontology language. Thus, Protégé tool provides the platform to visualize the automatically constructed topic ontology successfully. Using the constructed topic ontology, we can generate and suggest the most suitable tags for the new resource to users. The applicability of topic ontology with a spreading activation algorithm supports efficient recommendation in practice that can recommend the most popular tags for a specific resource. The spreading activation algorithm can assign the interest scores to the existing extracted blog content and tags. The weight of the tags is computed based on the activation score determined from the similarity between the topics in constructed topic ontology and content of the existing blogs. High‐quality tags that has the highest activation score is recommended to the users. Finally, we conducted experimental evaluation of our tag recommendation approach using a large set of real‐world data sets. Our experimental results explore and compare the capabilities of our proposed topic ontology with the spreading activation tag recommendation approach with respect to the existing AutoTag mechanism. And also discuss about the improvement in precision and recall of recommended tags on the data sets of Delicious and BibSonomy. The experiment shows that tag recommendation using topic ontology results in the folksonomy enrichment. Thus, we report the results of an experiment mean to improve the performance of the tag recommendation approach and its quality.  相似文献   

7.
特定领域本体自动构造方法   总被引:7,自引:0,他引:7       下载免费PDF全文
何婷婷  张小鹏 《计算机工程》2007,33(22):235-237
提出了一种自动构造特定领域本体的方法,该方法应用术语抽取和多重聚类技术。在术语抽取阶段,通过术语在专业语料与背景语料中出现概率的对比,采用LLR公式对术语进行评分,取得了更好的抽取效果。在层级关系发现过程中,采用上下文共现信息结合HowNet中词语的语义相似度,进行术语间相似度度量,力求获得术语间最合理的相关状况。同时改进了k-medoids聚类算法,更准确地发现术语的层级关系,进而构造出特定领域的本体。  相似文献   

8.
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical.We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.  相似文献   

9.
10.
11.
Seed URLs selection for focused Web crawler intends to guide related and valuable information that meets a user's personal information requirement and provide more effective information retrieval. In this paper, we propose a seed URLs selection approach based on user-interest ontology. In order to enrich semantic query, we first intend to apply Formal Concept Analysis to construct user-interest concept lattice with user log profile. By using concept lattice merger, we construct the user-interest ontology which can describe the implicit concepts and relationships between them more appropriately for semantic representation and query match. On the other hand, we make full use of the user-interest ontology for extracting the user interest topic area and expanding user queries to receive the most related pages as seed URLs, which is an entrance of the focused crawler. In particular, we focus on how to refine the user topic area using the bipartite directed graph. The experiment proves that the user-interest ontology can be achieved effectively by merging concept lattices and that our proposed approach can select high quality seed URLs collection and improve the average precision of focused Web crawler.  相似文献   

12.
Ontologies have been intensively applied for improving multimedia search and retrieval by providing explicit meaning to visual content. Several multimedia ontologies have been recently proposed as knowledge models suitable for narrowing the well known semantic gap and for enabling the semantic interpretation of images. Since these ontologies have been created in different application contexts, establishing links between them, a task known as ontology matching, promises to fully unlock their potential in support of multimedia search and retrieval. This paper proposes and compares empirically two extensional ontology matching techniques applied to an important semantic image retrieval issue: automatically associating common-sense knowledge to multimedia concepts. First, we extend a previously introduced textual concept matching approach to use both textual and visual representation of images. In addition, a novel matching technique based on a multi-modal graph is proposed. We argue that the textual and visual modalities have to be seen as complementary rather than as exclusive sources of extensional information in order to improve the efficiency of the application of an ontology matching approach in the multimedia domain. An experimental evaluation is included in the paper.  相似文献   

13.
In this paper we present an enhanced multi-modality ontology-based approach for web image retrieval step by step. Several ontology-based approaches have been made in the field of multimedia retrieval. Our multi-modality approach is one of the earliest attempts to integrate information from different modalities and apply the model in a complex domain. In order to develop the model, we need to answer the following questions: (1) how to find the proper structure and construct an ontology which can integrate information from different modalities; (2) how to quantify the matching degree (concept similarity) and provide an independent ranking mechanism; (3) how to ensure the scalability of this approach when applied to large domains. The first question has been answered by our multi-modality ontology which has been discussed in Wang et al. (Does ontology help in image retrieval? In: Asia-Pacific workshop on visual information processing, 2006) and its extension (Wang et al., Does ontology help in image retrieval?—a comparison between keyword, text ontology and multi-modality ontology approaches, ACM Press, New York, NY, USA, pp 109–112, 2006). More details about this work is given later. The main focus of this paper is that we propose a new ranking mechanism using Spearman’s ranking correlation to measure the similarity of concepts in the ontology. We take the priorities of information from different modalities into consideration. This algorithm gives the answer of the second question. The semantic matchmaking result is quantized and the degree of similarity between concepts is calculated. For the third question, importing of ontology will resolve the scalability issue but computing concept similarity and identify relationships when integrating different ontologies will be beyond the scope of this paper. To convince readers that our multi-modality ontology and concept similarity ranking is the right step forward, we decided to work on the animal kingdom. We believe this domain is challenging as demonstrated by images depict animals in a wide range of aspects, pose, configurations and appearances. We experimented with a data sets of 4,000 web images. Based on ground truth, we analyze the image content and text information, build up the enhanced multi-modality ontology and compare the retrieval results. Results show that we can even classify close animal species which share similar appearances and we can infer their hidden relationships from the canine family graph. By assigning a ranking to the semantic relationships we show unequivocal evidence that our improved model achieves good accuracy and performs comparable result with the Google re-ranking result in our previous work.  相似文献   

14.
In this paper, we developed an automatic extraction model of synonyms, which is used to construct our Quranic Arabic WordNet (QAWN) that depends on traditional Arabic dictionaries. In this work, we rely on three resources. First, the Boundary Annotated Quran Corpus that contains Quran words, Part-of-Speech, root and other related information. Second, the lexicon resources that was used to collect a set of derived words for Quranic words. Third, traditional Arabic dictionaries, which were used to extract the meaning of words with distinction of different senses. The objective of this work is to link the Quranic words of similar meanings in order to generate synonym sets (synsets). To accomplish that, we used term frequency and inverse document frequency in vector space model, and we then computed cosine similarities between Quranic words based on textual definitions that are extracted from traditional Arabic dictionaries. Words of highest similarity were grouped together to form a synset. Our QAWN consists of 6918 synsets that were constructed from about 8400 unique word senses, on average of 5 senses for each word. Based on our experimental evaluation, the average recall of the baseline system was 7.01 %, whereas the average recall of the QAWN was 34.13 % which improved the recall of semantic search for Quran concepts by 27 %.  相似文献   

15.
In this paper, we introduce an approach to task-driven ontology design which is based on information discovery from database schemas. Techniques for semi-automatically discovering terms and relationships used in the information space, denoting concepts, their properties and links are proposed, which are applied in two stages. At the first stage, the focus is on the discovery of heterogeneity/ambiguity of data representations in different schemas. For this purpose, schema elements are compared according to defined comparison features and similarity coefficients are evaluated. This stage produces a set of candidates for unification into ontology concepts. At the second stage, decisions are made on which candidates to unify into concepts and on how to relate concepts by semantic links. Ontology concepts and links can be accessed according to different perspectives, so that the ontology can serve different purposes, such as, providing a search space for powerful mechanisms for concept location, setting a basis for query formulation and processing, and establishing a reference for recognizing terminological relationships between elements in different schemas.  相似文献   

16.
A video retrieval system user hopes to find relevant information when the proposed queries are ambiguous. The retrieval process based on detecting concepts remains ineffective in such a situation. Potential relationships between concepts have been shown as a valuable knowledge resource that can enhance the retrieval effectiveness, even for ambiguous queries. Recent researches in multimedia retrieval have focused on ontology modeling as a common framework to manage knowledge. Handling these ontologies has to cope with issues related to generic knowledge management and processing scalability. Considering these issues, we suggest a context-based fuzzy ontology framework for video content analysis and indexing. In this paper, we focused on the way in which we modeled our fuzzy ontology: First, we populate automatically the generated ontology by gathering various available video annotation datasets. Then, the ontology content was used to infer enhanced video semantic interpretation. Finally, considering user feedback, the content of the ontology was improved. Experimental results showed that our approach achieves the goal of scalability while at the same time allowing better video content semantic interpretation.  相似文献   

17.
提出了一种词汇和本体概念间的语义相似度计算方法。该方法利用编辑距离和维基百科从语法和语义两方面综合考虑词汇和概念间的语义相似度。在领域本体的指导下,将方法应用于语义标注过程,建立词汇与本体概念之间的映射。在标注过程中建立知识库,提高算法性能,实验结果说明该方法是行之有效的。  相似文献   

18.
本体作为领域知识的表示方法,已经成为语义Web的基础。本体通常由领域专家建立,用于表示领域中概念以及概念与概念之间的关系。但这也使得普通用户难以理解本体中描述的信息。普通用户往往希望本体中的信息能够以自然语言的形式描述。这正是本文讨论的主要问题。本文采用分治策略,利用基于嵌套复杂模板的解决方案,设计并实现了本体知识文摘的算法。我们开发了一个原型系统SWARMS,并将该文摘算法进行了运用。初步的实验表明,本文提出的方法取得较好的结果。  相似文献   

19.
In this paper we show how to resolve the ambiguity of concepts that are extracted from visual stream with the help of identified concepts from associated textual stream. The disambiguation is performed at the concept-level based on semantic closeness over the domain ontology. The semantic closeness is a function of the distance between the concept to be disambiguated and selected associated concepts in the ontology. In this process, the image concepts will be disambiguated with any associated concept from the image and/or the text. The ability of the text concepts to resolve the ambiguity in the image concepts is varied. The best talent to resolve the ambiguity of an image concept occurs when the same concept(s) is stated clearly in both image and text, while, the worst case occurs when the image concept is an isolated concept that has no semantically close text concept. WordNet and the image labels with selected senses are used to construct the domain ontology used in the disambiguation process. The improved accuracy, as shown in the results, proves the ability of the proposed disambiguation process.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号