首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
文中讨论了关系数据库中的知识发现,以RS理论为数学工具,将面向属性的归纳方法与实例学习相结合,用来关系数据库中知识发现的属性泛化和约简。引入了属性泛化的算法。用信息论的观点来解释属性约简的含义,提出了用信息增益作为属性的重要度量,并给出了基于这种重要度量的属性约简算法。  相似文献   

2.
Graph analysis by data visualization involves achieving a series of topology-based tasks. When the graph data belongs to a data domain that contains multiple node and link types, as in the case of semantic graphs, topology-based tasks become more challenging. To reduce visual complexity in semantic graphs, we propose an approach which is based on applying relational operations such as selecting and joining nodes of different types. We use node aggregation to reflect the relational operations to the graph. We introduce glyphs for representing aggregated nodes. Using glyphs lets us encode connectivity information of multiple nodes with a single glyph. We also use visual parameters of the glyph to encode node attributes or type specific information. Rather than doing the operations in the data abstraction layer and presenting the user with the resulting visualization, we propose an interactive approach where the user can iteratively apply the relational operations directly on the visualization. We present the efficiency of our method by the results of a usability study that includes a case study on a subset of the International Movie Database. The results of the controlled experiment in our usability study indicate a statistically significant contribution in reducing the completion time of the evaluation tasks.  相似文献   

3.
We present a method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. Horizontal reduction is accomplished by merging identical tuples after the substitution of an attribute value by its higher level value in a pre-defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples in the database. In the second phase, a novel context-sensitive feature merit measure is used to rank the features, a subset of relevant attributes is chosen based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without destroying the interdependence relationships between classes and the attributes. Then rough set-based value reduction is further performed on the reduced table and all redundant condition values are dropped. Finally, tuples in the reduced table are transformed into a set of maximal generalized decision rules. The experimental results on UCI data sets and a real market database demonstrate that our method can dramatically reduce the feature space and improve learning accuracy.  相似文献   

4.
Error Reduction through Learning Multiple Descriptions   总被引:1,自引:0,他引:1  
  相似文献   

5.
With the rapid growth of information on the Internet and electronic government recently, automatic multi-document summarization has become an important task. Multi-document summarization is an optimization problem requiring simultaneous optimization of more than one objective function. In this study, when building summaries from multiple documents, we attempt to balance two objectives, content coverage and redundancy. Our goal is to investigate three fundamental aspects of the problem, i.e. designing an optimization model, solving the optimization problem and finding the solution to the best summary. We model multi-document summarization as a Quadratic Boolean Programing (QBP) problem where the objective function is a weighted combination of the content coverage and redundancy objectives. The objective function measures the possible summaries based on the identified salient sentences and overlap information between selected sentences. An innovative aspect of our model lies in its ability to remove redundancy while selecting representative sentences. The QBP problem has been solved by using a binary differential evolution algorithm. Evaluation of the model has been performed on the DUC2002, DUC2004 and DUC2006 data sets. We have evaluated our model automatically using ROUGE toolkit and reported the significance of our results through 95% confidence intervals. The experimental results show that the optimization-based approach for document summarization is truly a promising research direction.  相似文献   

6.
A graduated assignment algorithm for graph matching   总被引:19,自引:0,他引:19  
A graduated assignment algorithm for graph matching is presented which is fast and accurate even in the presence of high noise. By combining graduated nonconvexity, two-way (assignment) constraints, and sparsity, large improvements in accuracy and speed are achieved. Its low order computational complexity [O(lm), where l and m are the number of links in the two graphs] and robustness in the presence of noise offer advantages over traditional combinatorial approaches. The algorithm, not restricted to any special class of graph, is applied to subgraph isomorphism, weighted graph matching, and attributed relational graph matching. To illustrate the performance of the algorithm, attributed relational graphs derived from objects are matched. Then, results from twenty-five thousand experiments conducted on 100 mode random graphs of varying types (graphs with only zero-one links, weighted graphs, and graphs with node attributes and multiple link types) are reported. No comparable results have been reported by any other graph matching algorithm before in the research literature. Twenty-five hundred control experiments are conducted using a relaxation labeling algorithm and large improvements in accuracy are demonstrated  相似文献   

7.
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical.We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.  相似文献   

8.
We present a new approach for the visual analysis of state transition graphs. We deal with multivariate graphs where a number of attributes are associated with every node. Our method provides an interactive attribute-based clustering facility. Clustering results in metric, hierarchical and relational data, represented in a single visualization. To visualize hierarchically structured quantitative data, we introduce a novel technique: the bar tree. We combine this with a node-link diagram to visualize the hierarchy and an arc diagram to visualize relational data. Our method enables the user to gain significant insight into large state transition graphs containing tens of thousands of nodes. We illustrate the effectiveness of our approach by applying it to a real-world use case. The graph we consider models the behavior of an industrial wafer stepper and contains 55 043 nodes and 289 443 edges  相似文献   

9.
The discovery of interesting patterns in relational databases is an important data mining task. This paper is concerned with the development of a search algorithm for first-order hypothesis spaces adopting an important pruning technique (termed subset pruning here) from association rule mining in a first-order setting. The basic search algorithm is extended by so-called requires and excludes constraints allowing to declare prior knowledge about the data, such as mutual exclusion or generalization relationships among attributes, so that it can be exploited for further structuring and restricting the search space. Furthermore, it is illustrated how to process taxonomies and numerical attributes in the search algorithm.Several task settings using different interestingness criteria and search modes with corresponding pruning criteria are described. Three settings serve as test beds for evaluation of the proposed approach. The experimental evaluation shows that the impact of subset pruning is significant, since it reduces the number of hypothesis evaluations in many cases by about 50%. The impact of generalization relationships is shown to be less effective in our experimental set-up.  相似文献   

10.
Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures. Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

11.
传统的关联规则表示方法无法展示概念之间的本质关系,缺少对概念层面的认识,忽略了知识发现结果的共享等问题,而概念格作为一种能够生动简洁地体现概念之间泛化和例化关系的数据结构,在对关联规则可视化和发现潜在知识方面也有着独特的优势。提出了以概念格为背景的关联规则可视化方法,以概念为查找单元,在概念格中寻找需要展示的关联规则路径,将属性之间的关联关系扩展到概念层面,并给出了相对应的多模式规则的可视化的策略与算法。结合某校图书馆借书记录数据,进行关联规则分析与可视化实现。实验结果表明,该可视化方法在知识发现和共享方面具有良好的效果。  相似文献   

12.
针对现存的跨场景人脸活体检测模型泛化性能差、类间重叠等问题,提出了一种基于条件对抗域泛化的人脸活体检测方法。首先,该方法使用嵌入注意力机制的U-Net网络和ResNet-18编码器提取多个源域的特征,然后将提取的特征送入辅助分类器,并将特征编码器的输出和分类器预测的结果通过多线性映射的方法进行融合,再输入到域判别器中进行对抗训练,以实现特征和类层面对齐多个源域。其次,为了减少预测不准确的难迁移样本对域泛化造成的影响,采用了熵函数来控制样本的优先级,以提高域泛化的性能。此外,通过添加人脸深度图以进一步抓取活体与假体的区别特征,通过非对称三元组损失约束作为辅助监督,进一步提高类内紧凑性和类间区分性。在公开活体检测数据集上的对比实验验证了所提方法的有效性。  相似文献   

13.
目的 由于乳腺肿瘤病灶的隐蔽性强且极易转移,目前采用医学辅助诊断(computer-aided diagnosis,CAD)来尽早地发现肿瘤并诊断。然而,医学图像数据量少且标注昂贵,导致全监督场景下的基于深度学习的X-ray乳腺肿瘤检测方法的性能非常有限,且模型泛化能力弱;此外,噪声产生的域偏移(domain shift)也降低了不同环境下肿瘤检测的性能。针对上述挑战,提出一种单域泛化X-ray乳腺肿瘤检测方法。方法 提出了一种单域泛化模型(single-domain generalization model, SDGM)进行X-ray乳腺肿瘤检测,采用ResNet-50(residual network-50)作为主干特征提取网络,设计了域特征增强模块(domain feature enhancement module, DFEM)来有效融合上采样与下采样中的全局信息以抑制噪声,然后在检测头处设计了实例泛化模块(instance generalization module,IGM),对每个实例的类别语义信息进行正则化与白化处理来提升模型的泛化性能,通过学习少量的有标注医学图像对不可预...  相似文献   

14.
Mining optimized gain rules for numeric attributes   总被引:7,自引:0,他引:7  
Association rules are useful for determining correlations between attributes of a relation and have applications in the marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support, confidence, or gain of the rule is maximized. In this paper, we generalize the optimized gain association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving the uninstantiated attribute. For rules containing a single numeric attribute, we present an algorithm with linear complexity for computing optimized gain rules. Furthermore, we propose a bucketing technique that can result in a significant reduction in input size by coalescing contiguous values without sacrificing optimality. We also present an approximation algorithm based on dynamic programming for two numeric attributes. Using recent results on binary space partitioning trees, we show that the approximations are within a constant factor of the optimal optimized gain rules. Our experimental results with synthetic data sets for a single numeric attribute demonstrate that our algorithm scales up linearly with the attribute's domain size as well as the number of disjunctions. In addition, we show that applying our optimized rule framework to a population survey real-life data set enables us to discover interesting underlying correlations among the attributes.  相似文献   

15.
This paper presents a methodology developed for a study to evaluate the state of the art of automated map generalization in commercial software without applying any customization. The objectives of this study are to learn more about generic and specific requirements for automated map generalization, to show possibilities and limitations of commercial generalization software, and to identify areas for further research. The methodology had to consider all types of heterogeneity to guarantee independent testing and evaluation of available generalization solutions. The paper presents the two main steps of the methodology. The first step is the analysis of map requirements for automated generalization, which consisted of sourcing representative test cases, defining map specifications in generalization constraints, harmonizing constraints across the test cases, and analyzing the types of constraints that were defined. The second step of the methodology is the evaluation of generalized outputs. In this step, three evaluation methods were integrated to balance between human and machine evaluation and to expose possible inconsistencies. In the discussion the applied methodology is evaluated and areas for further research are identified.  相似文献   

16.
Kihong Heo  Hakjoo Oh  Kwangkeun Yi 《Software》2017,47(11):1677-1705
We present a practical technique for achieving a scalable and precise global static analysis by selectively applying context‐sensitivity and the octagon relational domain. For precise analysis, context‐sensitivity and relational analysis are key properties, but it has been hard to practically combine both of them. Our approach turns on those precision improvement features only when the analysis is likely to improve the precision to resolve given queries. The guidance comes from an impact pre‐analysis that estimates the impact of a fully context‐sensitive and relational octagon analysis. We designed a cost‐effective pre‐analysis and implemented this method in a realistic octagon analysis for full C. The experimental results show that our approach proves eight times more queries, while saving the time cost by 73.1% compared with a partially relational octagon analysis enabled by a syntactic heuristic. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

17.
We expand on a recent paper by Courrieu which introduces three algorithms for determining the distance between any point and the interpolation domain associated with a feedforward neural network. This has been shown to have a significant relation with the network's generalization capability. A further neural-like relaxation algorithm is presented here, which is proven to naturally solve the problem originally posed by Courrieu. The algorithm is based on a powerful result developed in the context of Markov chain theory, and turns out to be a special case of a more general relaxation model which has long become a standard technique in the machine vision domain. Some experiments are presented which confirm the validity of the proposed approach.  相似文献   

18.
郑学伟 《微机发展》2014,(12):64-68
语义网的研究中基于领域本体的构建研究方法基本上还处于手工阶段,如何在本体的设计中实现自动构建仍然是目前需要解决的问题,而采用基于图的构建原理,应用MCL聚类的本体自动构建算法进行概念提取和关系运算。将领域文本文档映射为文档概念图,在关系运算中采用基于频繁信息子图的gSpan算法任意关系提取算法,得到基于OWL-DL描述的领域本体,并通过评价反馈机制进行闭环校正是研究的核心思想。  相似文献   

19.
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL. Editors: David Page and Akihiro Yamamoto  相似文献   

20.
基于C—Tree的无级比例尺GIS多边形综合技术   总被引:6,自引:0,他引:6       下载免费PDF全文
无级比例尺GIS是数字地球和Web GIS的核心技术之一,但随着GIS的广泛应用和深入发展,现有的GIS技术已经不能满足信息社会的需要,其中一个需要解决的重要问题就是GIS的空间数据量如何随着比例尺的变化自动增减。针对无级比例尺GIS多边形综合中的选取与合并技术,在对选取的数量规律和质量原则以及合并的原则进行充分论述的基础上,提出了一种多边形图层数据组织策略C-Tree,并给出了基于C-Tree的多边形综合算法。对于给定的大给与大比例尺地图多边形图层数据,该算法可以高效率地完成多边形选取与合并的综合操作,输出小比例尺地图图层数据。该算法现已成功应用于时空一体化智能城建信息系统,并获得了满意的结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号