共查询到20条相似文献,搜索用时 15 毫秒
1.
Edgar Meij Marc Bron Laura Hollink Bouke Huurnink Maarten de Rijke 《Journal of Web Semantics》2011,9(4):418-433
We introduce the task of mapping search engine queries to DBpedia, a major linking hub in the Linking Open Data cloud. We propose and compare various methods for addressing this task, using a mixture of information retrieval and machine learning techniques. Specifically, we present a supervised machine learning-based method to determine which concepts are intended by a user issuing a query. The concepts are obtained from an ontology and may be used to provide contextual information, related concepts, or navigational suggestions to the user submitting the query. Our approach first ranks candidate concepts using a language modeling for information retrieval framework. We then extract query, concept, and search-history feature vectors for these concepts. Using manual annotations we inform a machine learning algorithm that learns how to select concepts from the candidates given an input query. Simply performing a lexical match between the queries and concepts is found to perform poorly and so does using retrieval alone, i.e., omitting the concept selection stage. Our proposed method significantly improves upon these baselines and we find that support vector machines are able to achieve the best performance out of the machine learning algorithms evaluated. 相似文献
2.
快速、准确和全面地从大量互联网文本信息中定位情感倾向是当前大数据技术领域面临的一大挑战.文本情感分类方法大致分为基于语义理解和基于有监督的机器学习两类.语义理解处理情感分类的优势在于其对不同领域的文本都可以进行情感分类,但容易受到中文存在的不同句式及搭配的影响,分类精度不高.有监督的机器学习虽然能够达到比较高的情感分类精度,但在一个领域方面得到较高分类能力的分类器不适应新领域的情感分类.在使用信息增益对高维文本做特征降维的基础上,将优化的语义理解和机器学习相结合,设计了一种新的混合语义理解的机器学习中文情感分类算法框架.基于该框架的多组对比实验验证了文本信息在不同领域中高且稳定的分类精度. 相似文献
3.
Semantic labelling refers to the problem of assigning known labels to the elements of structured information from a source such as an HTML table or an RDF dump with unknown semantics. In the recent years it has become progressively more relevant due to the growth of available structured information in the Web of data that need to be labelled in order to integrate it in data systems. The existing approaches for semantic labelling have several drawbacks that make them unappealing if not impossible to use in certain scenarios: not accepting nested structures as input, being unable to label structural elements, not being customisable, requiring groups of instances when labelling, requiring matching instances to named entities in a knowledge base, not detecting numeric data, or not supporting complex features. In this article, we propose TAPON-MT, a framework for machine learning semantic labelling. Our framework does not have the former limitations, which makes it domain-independent and customisable. We have implemented it with a graphical interface that eases the creation and analysis of models, and we offer a web service API for their application. We have also validated it with a subset of the National Science Foundation awards dataset, and our conclusion is that TAPON-MT creates models to label information that are effective and efficient in practice. 相似文献
4.
We explore contextual and dispositional correlates of the motivation to contribute to open source initiatives. We examine how the context of the open source project, and the personal values of contributors, are related to the types of motivations for contributing. A web-based survey was administered to 300 contributors in two prominent open source contexts: software and content. As hypothesized, software contributors placed a greater emphasis on reputation-gaining and self-development motivations, compared with content contributors, who placed a greater emphasis on altruistic motives. Furthermore, the hypothesized relationships were found between contributors’ personal values and their motivations for contributing. 相似文献
5.
The nature of scientific and technological data collection is evolving rapidly: data volumes and rates grow exponentially, with increasing complexity and information content, and there has been a transition from static data sets to data streams that must be analyzed in real time. Interesting or anomalous phenomena must be quickly characterized and followed up with additional measurements via optimal deployment of limited assets. Modern astronomy presents a variety of such phenomena in the form of transient events in digital synoptic sky surveys, including cosmic explosions (supernovae, gamma ray bursts), relativistic phenomena (black hole formation, jets), potentially hazardous asteroids, etc. We have been developing a set of machine learning tools to detect, classify and plan a response to transient events for astronomy applications, using the Catalina Real-time Transient Survey (CRTS) as a scientific and methodological testbed. The ability to respond rapidly to the potentially most interesting events is a key bottleneck that limits the scientific returns from the current and anticipated synoptic sky surveys. Similar challenge arises in other contexts, from environmental monitoring using sensor networks to autonomous spacecraft systems. Given the exponential growth of data rates, and the time-critical response, we need a fully automated and robust approach. We describe the results obtained to date, and the possible future developments. 相似文献
6.
Learning to match ontologies on the Semantic Web 总被引:19,自引:0,他引:19
On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible on the Web scale. Hence the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web. We describe GLUE, a system that employs machine learning techniques to find such mappings. Given two ontologies, for each concept in one ontology GLUE finds the most similar concept in the other ontology. We give well-founded probabilistic definitions to several practical similarity measures and show that GLUE can work with all of them. Another key feature of GLUE is that it uses multiple learning strategies, each of which exploits well a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend GLUE to incorporate commonsense knowledge and domain constraints into the matching process. Our approach is thus distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains and show that GLUE proposes highly accurate semantic mappings. Finally, we extend GLUE to find complex mappings between ontologies and describe experiments that show the promise of the approach.Received: 16 December 2002, Accepted: 16 April 2003, Published online: 17 September 2003Edited by: Edited by B.V. Atluri, A. Joshi, and Y. Yesha 相似文献
7.
基于知识的问题求解需要一个丰富而相对完备的概念系统,尤其是当任务具有领域非限定特征时。本文以认知心理学领域的表征重述模型为理论基础,提出了一种基于对象表征的概念系统表征与发展方法,详细研究了概念在4种不同水平下的表征和发展过程。该研究突破了人工智能和认知心理学领域在这一问题研究上的局限性,有助于提高基于知识的系统的推理能力和问题求解能力。 相似文献
8.
9.
In recent years, designing useful learning diagnosis systems has become a hot research topic in the literature. In order to help teachers easily analyze students’ profiles in intelligent tutoring system, it is essential that students’ portfolios can be transformed into some useful information to reflect the extent of students’ participation in the curriculum activity. It is observed that students’ portfolios seldom reflect students’ actual studying behaviors in the learning diagnosis systems given in the literature; we thus propose three kinds of learning parameter improvement mechanisms in this research to establish effective parameters that are frequently used in the learning platforms. The proposed learning parameter improvement mechanisms can calculate the students’ effective online learning time, extract the portion of a message in discussion section which is strongly related to the learning topics, and detect plagiarism in students’ homework, respectively. The derived numeric parameters are then fed into a Support Vector Machine (SVM) classifier to predict each learner’s performance in order to verify whether they mirror the student’s studying behaviors. The experimental results show that the prediction rate for the SVM classifier can be increased up to 35.7% in average after the inputs to the classifier are “purified” by the learning parameter improvement mechanisms. This splendid achievement reveals that the proposed algorithms indeed produce the effective learning parameters for commonly used e-learning platforms in the literature. 相似文献
10.
朱诗能;韩萌;杨书蓉;代震龙;杨文艳;丁剑 《计算机工程与应用》2025,61(2):59-72
现实世界的场景中;从数据流中学习会面临着类不平衡的问题;学习算法由于缺少训练数据而无法有效识别少数类样本。为了介绍不平衡数据流集成分类的研究现状和面临的挑战;依据近年来的不平衡数据流集成分类领域文献;从基于加权、选择和投票的决策规则和基于代价敏感学习、主动学习和增量学习的学习方式的角度详细分析和总结了不平衡数据流的集成方法;并比较了使用相同数据集的算法的性能。针对处理不同类型复杂数据流中的不平问题;从概念漂移、多类、噪声和类重叠四个方面对其集成分类算法进行总结;分析了经典算法的时间复杂度。对动态数据流、缺失信息的数据流、多标签数据流和不确定数据流中不平衡问题的分类挑战提出了下一步的集成策略研究。 相似文献
11.
Access to legal information and, in particular, to legal literature is examined for the creation of a search and retrieval
system for Italian legal literature. The design and implementation of services such as integrated access to a wide range of
resources are described, with a particular focus on the importance of exploiting metadata assigned to disparate legal material.
The integration of structured repositories and Web documents is the main purpose of the system: it is constructed on the basis
of a federation system with service provider functions, aiming at creating a centralized index of legal resources. The index
is based on a uniform metadata view created for structured data by means of the OAI approach and for Web documents by a machine
learning approach, which, in this paper, has been assessed as regards document classification. Semantic searching is a major
requirement for legal literature users and a solution based on the exploitation of Dublin Core metadata, as well as the use
of legal ontologies and related terms prepared for accessing indexed articles have been implemented.
相似文献
E. FrancesconiEmail: |
12.
In this paper we address the problem of providing an order of relevance, or ranking, among entities’ properties used in RDF datasets, Linked Data and SPARQL endpoints. We first motivate the importance of ranking RDF properties by providing two killer applications for the problem, namely property tagging and entity visualization. Moved by the desiderata of these applications, we propose to apply Machine Learning to Rank (MLR) techniques to the problem of ranking RDF properties. Our devised solution is based on a deep empirical study of all the dimensions involved: feature selection, MLR algorithm and Model training. The major advantages of our approach are the following: (a) flexibility/personalization, as the properties’ relevance can be user-specified by personalizing the training set in a supervised approach, or set by a novel automatic classification approach based on SWiPE; (b) speed, since it can be applied without computing frequencies over the whole dataset, leveraging existing fast MLR algorithms; (c) effectiveness, as it can be applied even when no ontology data is available by using novel dataset-independent features; (d) precision, which is high both in terms of f-measure and Spearman’s rho. Experimental results show that the proposed MLR framework outperform the two existing approaches found in literature which are related to RDF property ranking. 相似文献
13.
In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using OWL and RDF for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases. 相似文献
14.
With the growing adoption of Building Information Modeling (BIM), specialized applications have been developed to perform domain-specific analyses. These applications need tailored information with respect to a BIM model element’s attributes and relationships. In particular, architectural elements need further qualification concerning their geometric and functional ‘subtypes’ to support exact simulations and compliance checks. BIM and its underlying data schema, the Industry Foundation Classes (IFC), provide a rich representation with which to exchange semantic entity and relationship data. However, subtypes for individual elements are not represented by default and often require manual designation, leaving it vulnerable to errors and omissions. Existing research to enrich the semantics of IFC model entities employed domain-specific rule sets that scrutinize their legitimacy and modify them, if and when necessary. However, such an approach is limited in their scalability and comprehensibility. This study explored the use of 3D geometric deep neural networks originating from computer vision research. Specifically, Multi-view CNN(MVCNN) and PointNet were investigated to determine their applicability in extracting unique features of door (IfcDoor) and wall (IfcWall) element subtypes, and in turn be leveraged to automate subtype classifications. Test results indicated MVCNN as having the best prediction performance, while PointNet’s accuracy was hampered by resolution loss due to selective use of point cloud data. The research confirmed deep neural networks as a viable solution to distinguishing BIM element subtypes, the critical factor being their ability to detect subtle differences in local geometries. 相似文献
15.
Learning to integrate web taxonomies 总被引:1,自引:0,他引:1
We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy. This problem is not only currently pervasive on the Web, but is also important to the emerging Semantic Web. A straightforward approach to automating this process would be to build classifiers through machine learning and then use these classifiers to classify objects from the source taxonomies into categories of the master taxonomy. However, conventional machine learning algorithms totally ignore the availability of the source taxonomies. In fact, source and master taxonomies often have common categories under different names or other more complex semantic overlaps. We introduce two techniques that exploit the semantic overlap between the source and master taxonomies to build better classifiers for the master taxonomy. The first technique, Cluster Shrinkage, biases the learning algorithm against splitting source categories by making objects in the same category appear more similar to each other. The second technique, Co-Bootstrapping, tries to facilitate the exploitation of inter-taxonomy relationships by providing category indicator functions as additional features for the objects. Our experiments with real-world Web data show that these proposed add-on techniques can enhance various machine learning algorithms to achieve substantial improvements in performance for taxonomy integration. 相似文献
16.
John P. Eakins 《Pattern recognition》2002,35(1):3-14
Research into techniques for the retrieval of images by semantic content is still in its infancy. This paper reviews recent trends in the field, distinguishing four separate lines of activity: automatic scene analysis, model-based and statistical approaches to object classification, and adaptive learning from user feedback. It compares the strengths and weaknesses of model-based and adaptive techniques, and argues that further advances in the field are likely to involve the increasing use of techniques from the field of artificial intelligence. 相似文献
17.
真值发现作为整合由不同数据源提供的冲突信息的一种手段,在传统数据库领域已经得到了广泛的研究.然而现有的很多真值发现方法不适用于数据流应用,主要原因是它们都包含迭代的过程.本文针对一种特殊的数据流—感知数据流上的连续真值发现问题进行了研究.结合感知数据本身及其应用特点,提出一种变频评估数据源可信度的策略,减少迭代过程的执行,提高每一时刻多源感知数据流真值发现的效率.本文首先定义并研究了当感知数据流真值发现的相对误差和累积误差较小时,相邻时刻数据源的可信度变化需要满足的条件,进而给出了一种概率模型,以预测数据源的可信度满足该条件的概率.之后,通过整合上述结论,实现在预测的累积误差以一定概率不超过给定阈值的前提下,最大化数据源可信度的评估周期以提高效率,并将该问题转化为了一个最优化问题.在此基础上,提出了一种变频评估数据源可信度的算法—CTF-Stream (Continuous Truth Finding over Sensor Data Streams),CTF-Stream 结合历史数据动态地确定数据源可信度的评估时刻,在保证真值发现结果达到用户给定精度的同时提高了效率.最后,本文通过在真实的感知数据集合上进行实验,进一步验证了算法在处理感知数据流的真值发现问题时的效率和准确率. 相似文献
18.
19.
在集成分类中,如何对基分类器实现动态更新和为基分类器分配合适的权值一直是研究的重点。针对以上两点,提出了BIE和BIWE算法。BIE算法通过最新训练的基分类器的准确率确定集成是否需要替换性能较差的基分类器及需替换的个数,实现对集成分类器的动态迭代更新;BIWE算法在此基础上提出了一个加权函数,对具有不同参数特征的数据流可以有针对性地获得基分类器的最佳权值,从而提升集成分类器的整体性能。实验结果表明,BIE算法相较对比算法在准确率持平或略高的情况下,可以减少生成树的叶子数、节点数和树的深度;BIWE算法相较对比算法不仅准确率较高,而且能大幅度减少生成树的规模。 相似文献
20.
大部分数据流分类算法解决了数据流无限长度和概念漂移这两个问题。但是,这些算法需要人工专家将全部实例都标记好作为训练集来训练分类器,这在数据流高速到达并需要快速分类的环境中是不现实的,因为标记实例需要时间和成本。此时,如果采用监督学习的方法来训练分类器,由于标记数据稀少将得到一个弱分类器。提出一种基于主动学习的数据流分类算法,该算法通过选择全部实例中的一小部分来人工标记,其中这小部分实例是分类置信度较低的样本,从而可以极大地减少需要人工标记的实例数量。实验结果表明,该算法可以在数据流存在概念漂移情况下,使用较少的标记数据对数据流训练出分类器,并且分类效果良好。 相似文献