首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
提出一种基于本体的网络会话表示方法,即语义会话,和一种会话聚类和可视化方法。会话聚类方面基于用户浏览网站的公共路径提出一种语义会话间的相似性度量——语义公共路径相似性度量(SMSCP),并且使用改进的kmedoids聚类算法衡量其有效性。在聚类结果可视化方面应用层云表来展示聚类结果。实验表明文中的聚类方法和可视化方法具有更好的有效性及可理解性。  相似文献   

2.
The multiagent approach to coreference resolution in the process of extracting information from texts in natural languages for ontology population is described. Special class agents corresponding to ontology classes are defined. They analyze the available information in the corresponding ontology instances. The results of this analysis are used for values to instance attribute, for detecting duplicates and equivalents of instances, for fixing coreferential relations, and for determining the weights of information connections used to resolve ambiguities. Coreferences are resolved taking into account a multifactor similarity measure between extracted objects that combines the semantic, context, positional, and grammar similarity measures. The class agents work within the multiagent approach to text analysis aimed at ontology population.  相似文献   

3.
This paper proves a theorem on the metrics taking into account the ordering (assigning of real numbers (0, 1) to variables of a formula) of elementary propositions in models by each expert and the degrees with which the models are scattered over the variables. This approach is proposed for the first time. Some examples demonstrating the novelty of the metrics are presented, and a method is proposed that allows a new metrics to be constructed based on previously obtained and/or already available metrics.  相似文献   

4.
In this paper the problem of performing external validation of the semantic coherence of topic models is considered. The Fowlkes-Mallows index, a known clustering validation metric, is generalized for the case of overlapping partitions and multi-labeled collections, thus making it suitable for validating topic modeling algorithms. In addition, we propose new probabilistic metrics inspired by the concepts of recall and precision. The proposed metrics also have clear probabilistic interpretations and can be applied to validate and compare other soft and overlapping clustering algorithms. The approach is exemplified by using the Reuters-21578 multi-labeled collection to validate LDA models, then using Monte Carlo simulations to show the convergence to the correct results. Additional statistical evidence is provided to better understand the relation of the metrics presented.  相似文献   

5.
一种面向语义网服务的本体映射框架*   总被引:2,自引:0,他引:2  
本体的异构性阻碍了语义网服务的互操作。从解决语义网服务中本体的异构问题出发,同时考虑到目前的本体映射系统大多效率不高、映射结果不够准确的问题,提出了一种适用于语义网服务的本体映射方法及系统框架。该方法利用机器学习技术来提高本体映射的自动化程度,利用综合评判技术修正映射结果,以提高本体映射的准确率。采用OAEI 2007的基准测试数据集benchmarks进行实验测试,结果表明本系统的性能基本达到预期效果,能够有效地解决语义网服务中的本体异构问题。  相似文献   

6.
赵亮  刘建辉  王星 《计算机科学》2016,43(6):280-282, 307
分类变量的相似度分析是数据挖掘任务中的一个重要环节,现有的分类变量相似度算法中存在忽视变量差异、受不均衡分布影响严重、无法应用于混合数据集等缺点。为克服以上缺点,提出了一种基于Hellinger距离的分类变量相似度算法。该算法累加分类变量对应子集中不同属性变量的分布差异作为相似度,且支持混合数据集。将所提算法代入聚类算法并应用于UCI公共数据集,结果表明,该算法在准确度、有效性和稳定性上都有较大提高。  相似文献   

7.
王志华  金燕  李占波 《计算机工程》2011,37(11):83-85,88
基于内容的语义Web检索只考虑内容本身,没有考虑用户的不同,不能准确反映用户需求。为此,提出一个自适应语义Web检索框架,对于Web中文文档,借助HowNet知识库给出一种本体学习方法,通过提取用户客观、显式和隐式信息建立用户信息库,并设计用户初始查询本体和个性化查询本体构建算法,从而实现用户的自适应检索。实验结果表明,该方法具有较高的检索效率。  相似文献   

8.
互联网上存在海量数据,如何在大量的信息中查找到有用信息就变成了一个至关重要的问题。语义网为解决这一问题带来了曙光。然而当今网络现状与语义网之间存在巨大差距,即海量非结构化的页面内容难直接转化为语义的知识。提出了一种基于文档内容的语义标注方法,利用本体所表达的语义环境,即本体知识相关词汇及其所处的语义上下文环境在文档中出现频率,实现对文档的语义标注。实验显示方法取得良好的效果,但受本体知识质量和标注文档质量两个因素影响较大。  相似文献   

9.
The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. A more versatile variable selection model is proposed, taking into account three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally independent of all the relevant variables. A model selection criterion and a variable selection algorithm are derived for this new variable role modeling. The model identifiability and the consistency of the variable selection criterion are also established. Numerical experiments highlight the interest of this new modeling.  相似文献   

10.
针对目前矿山领域异构数据融合时先验知识获取困难、物联网本体库实时性差、实例对象数据手动标注方式效率较低等问题,提出了一种矿山语义物联网自动语义标注方法。给出了传感数据语义化处理框架:一方面,确定本体的专业领域和范畴,通过重用流注释本体(SAO)构建领域本体,作为驱动语义标注的基础;另一方面,使用机器学习方法对感知数据流进行特征提取与数据分析,从海量数据中挖掘出概念间的关系;通过数据挖掘知识来驱动本体的更新与完善,实现本体的动态更新、拓展与更精确的语义标注,增强机器的理解力。以矿井提升系统主轴故障为例阐述从本体到实例化的语义标注过程:结合领域专家知识及本体重用,采用"七步法"建立矿井提升系统主传动故障本体;为了加强实例数据属性描述的准确性,使用主成分分析法(PCA)与K-means聚类方法对数据集进行降维和分组,提取出数据属性与概念的关系;通过基于语义Web的规则语言(SWRL)标注具体先行条件与后续概念的关系,优化领域本体。实验结果表明:在本体实例化过程中,可利用机器学习技术从传感数据中自动提取概念,实现传感数据的自动语义标注。  相似文献   

11.
A few of clustering techniques for categorical data exist to group objects having similar characteristics. Some are able to handle uncertainty in the clustering process while others have stability issues. However, the performance of these techniques is an issue due to low accuracy and high computational complexity. This paper proposes a new technique called maximum dependency attributes (MDA) for selecting clustering attribute. The proposed approach is based on rough set theory by taking into account the dependency of attributes of the database. We analyze and compare the performance of MDA technique with the bi-clustering, total roughness (TR) and min–min roughness (MMR) techniques based on four test cases. The results establish the better performance of the proposed approach.  相似文献   

12.
A variety of clustering algorithms exists to group objects having similar characteristics. But the implementations of many of those algorithms are challenging in the process of dealing with categorical data. While some of the algorithms cannot handle categorical data, others are unable to handle uncertainty within categorical data in nature. This is prerequisite for clustering categorical data which also deal with uncertainty. An algorithm, termed minimum-minimum roughness (MMR) was proposed, which uses the rough set theory in order to deal with the above problems in clustering categorical data. Later many algorithms has developed to improve the handling of hybrid data. This research proposes information-theoretic dependency roughness (ITDR), another technique for categorical data clustering taking into account information-theoretic attributes dependencies degree of categorical-valued information systems. In addition, it is second to none of all its predecessors; MMR, MMeR, SDR and standard-deviation of standard-deviation roughness (SSDR). Experimental results on two benchmark UCI datasets show that ITDR technique is better with the baseline categorical data clustering technique with respect to computational complexity and the purity of clusters.  相似文献   

13.
目前关于本体复杂性的研究,还没有比较系统和全面的方法。从本体概念模型的结构特点出发,将其与复杂网络的结构进行类比,借鉴复杂网络研究的研究方法和性质参数对本体结构进行研究,并选取生物学领域应用较广的GO本体作为样本,对其平均路径长度、度分布和簇系数等参数进行统计和分析,结果表明其具有无标度特性而不具有小世界特性。  相似文献   

14.
Content based image retrieval (CBIR) systems could provide more precise results by taking the user’s feedbacks into account. Two types of the relevance feedback learning paradigms are short term learning (STL) and long term learning (LTL). By using both STL and LTL, a collaborative CBIR system is proposed in this paper. The proposed system introduced three fusion methods: including fusion in retrieved images, fusion in ranks, and fusion in similarities to make cooperation between STL and LTL. The proposed fusion methods are examined in a CBIR system equipped with a proposed statistical semantic clustering (SSC) method of LTL. The SSC method works based on the concept of semantic categories of the images by clustering techniques and constructing a relevancy matrix between images and semantic categories. The results of the SSC method with the suggested fusion methods are compared with two state-of-the-art LTL methods, namely virtual feature based method and dynamic semantic clustering. Comparative results confirm the efficiency of the proposed method. Furthermore, experimental results demonstrate that for a unique LTL method, various fusion methods lead to different results.  相似文献   

15.
基于本体的法律信息语义检索   总被引:3,自引:0,他引:3       下载免费PDF全文
网络中海量的法律信息及其多义性为准确、高效的查询检索提出了难题,进而也桎梏着司法判案、决策的方法。为了较好地解决司法信息检索中存在的问题,通过对国内外领域本体方法、语义Web技术的研究,借助本体的概念构建了面向案例的法律信息语义检索原型,为法律领域的知识管理和信息检索提供了可借鉴的参考。  相似文献   

16.
Ontology reuse is recommended as a key factor to develop cost-effective and high-quality ontologies because it could reduce development costs by avoiding rebuilding existing ontologies. Selecting the desired ontology from existing ontologies is essential for ontology reuse. Until now, much research on ontology selection has focused on lexical-level support. However, in these cases, it is almost impossible to find an ontology that includes all the concepts matched by the search terms at the semantic level. Finding an ontology that meets users’ needs requires a new ontology selection and ranking mechanism based on semantic similarity matching. We propose an ontology selection and ranking model consisting of selection standards and metrics based on better semantic matching capabilities. The model we propose presents two novel features different from previous research models. First, it enhances the ontology selection and ranking method practically and effectively by enabling semantic matching of taxonomy or relational linkage between concepts. Second, it identifies what measures should be used to rank ontologies in the given context and what weight should be assigned to each selection measure.  相似文献   

17.
可处理混合属性的任意形状聚类   总被引:1,自引:1,他引:0       下载免费PDF全文
聚类是数据挖掘中一个非常活跃的研究分支,任意形状的聚类则是一个有待研究的开放问题。提出一种包含分类属性取值频率信息的类间差异性度量和一种对象与类的相似度定义,在此基础上提出一种能处理任意形状的聚类算法,可处理混合属性数据集。在人造数据集和真实数据集上检验了提出的算法,并与相关算法进行了对比,实验结果表明,提出的算法是有效可行的。  相似文献   

18.
This article proposes an ontology-based topological representation of remote-sensing images. Semantics, especially related to the topological relationships between the objects represented, are not explicit in remote-sensing images and this fact limits spatial analysis. Our aim is to provide an explicit ontological definition of the topological relations between objects in the image using the Quadtree data structure for spatial indexing. This structure is explicitly defined in an ontology allowing the automatic interpretation of the representations obtained, taking into account the topological relations and increasing the spatial analytical capabilities. This representation has been validated by a case study of semantic retrieval based on the normalized difference vegetation index (NDVI), taking into account the topological relations between NDVI regions in images. In the experiments, we compare the effectiveness of results from eight queries using four traditional supervised image classification algorithms and the proposal representation. The experimental results show the feasibility of the proposal, supporting the concept of the image retrieval process providing a semantic complement to remote-sensing images. The proposed representation contributes to incorporation of semantics into geographical data, especially to remote-sensing images, and it can be used to develop applications in the Geospatial Semantic Web.  相似文献   

19.
Data mining algorithms such as data classification or clustering methods exploit features of entities to characterise, group or classify them according to their resemblance. In the past, many feature extraction methods focused on the analysis of numerical or categorical properties. In recent years, motivated by the success of the Information Society and the WWW, which has made available enormous amounts of textual electronic resources, researchers have proposed semantic data classification and clustering methods that exploit textual data at a conceptual level. To do so, these methods rely on pre-annotated inputs in which text has been mapped to their formal semantics according to one or several knowledge structures (e.g. ontologies, taxonomies). Hence, they are hampered by the bottleneck introduced by the manual semantic mapping process. To tackle this problem, this paper presents a domain-independent, automatic and unsupervised method to detect relevant features from heterogeneous textual resources, associating them to concepts modelled in a background ontology. The method has been applied to raw text resources and also to semi-structured ones (Wikipedia articles). It has been tested in the Tourism domain, showing promising results.  相似文献   

20.

Text document clustering is used to separate a collection of documents into several clusters by allowing the documents in a cluster to be substantially similar. The documents in one cluster are distinct from documents in other clusters. The high-dimensional sparse document term matrix reduces the clustering process efficiency. This study proposes a new way of clustering documents using domain ontology and WordNet ontology. The main objective of this work is to increase cluster output quality. This work aims to investigate and examine the method of selecting feature dimensions to minimize the features of the document name matrix. The sports documents are clustered using conventional K-Means with the dimension reduction features selection process and density-based clustering. A novel approach named ontology-based document clustering is proposed for grouping the text documents. Three critical steps were used in order to develop this technique. The initial step for an ontology-based clustering approach starts with data pre-processing, and the characteristics of the DR method are reduced with the Info-Gain collection. The documents are clustered using two clustering methods: K-Means and Density-Based clustering with DR Feature Selection Process. These methods validate the findings of ontology-based clustering, and this study compared them using the measurement metrics. The second step of this study examines the sports field ontology development and describes the principles and relationship of the terms using sports-related documents. The semantic web rational process is used to test the ontology for validation purposes. An algorithm for the synonym retrieval of the sports domain ontology terms has been proposed and implemented. The retrieved terms from the documents and sport ontology concepts are mapped to the retrieved synonym set words from the WorldNet ontology. The suggested technique is based on synonyms of mapped concepts. The proposed ontology approach employs the reduced feature set in order to clustering the text documents. The results are compared with two traditional approaches on two datasets. The proposed ontology-based clustering approach is found to be effective in clustering the documents with high precision, recall, and accuracy. In addition, this study also compared the different RDF serialization formats for sports ontology.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号