首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider linguistic data(base) summaries in the sense of Yager [Information Sciences 28 (1982) 69-86], exemplified by “most employees are young and well paid” (with some degree of truth added), for a personnel database, as an intuitive, human consistent and natural language based knowledge discovery tool. We present first an extension of the classic Yager’s approach to involve more sophisticated criteria of goodness, search methods, etc. We advocate the use of the concept of a protoform (prototypical form), that is recently vividly advocated by Zadeh [A prototype-centered approach to adding deduction capabilities to search engines—the concept of a protoform. BISC Seminar, University of California, Berkeley, 2002], as a general form of a linguistic data summary. We present an extension of our interactive approach, based on fuzzy logic and fuzzy database queries, which makes it possible to implement such linguistic data summaries. We show how fuzzy queries are related to linguistic summaries, and show that one can introduce a hierarchy of protoforms, or abstract summaries in the sense of latest Zadeh’s [A prototype-centered approach to adding deduction capabilities to search engines—the concept of a protoform. BISC Seminar, University of California, Berkeley, 2002] ideas meant mainly for increasing deduction capabilities of search engines. For illustration we show an implementation for a sales database in a computer retailer, employing some type of a protoform of a linguistic summary.  相似文献   

2.
In this work we consider a fuzzy set based approach to the issue of discovery in databases (database mining). The concept of linguistic summaries is described and shown to be a user friendly way to present information contained in a database. We discuss methods for measuring the amount of information provided by a linguistic summary. The issue of conjecturing, how to decide on which summaries may be informative, is discussed. We suggest two approaches to help us focus on relevant summaries. The first method, called the template method, makes use of linguistic concepts related to the domain of the attributes involved in the summaries. The second approach uses the mountain clustering method to help focus our summaries. © 1996 John Wiley & Sons, Inc.  相似文献   

3.
Creating linguistic summaries of data has been a goal of the artificial and computational intelligence communities for many years. Summaries of written text have garnered the most attention. More recently, creating summaries of imagery and other sensed data has become important as a means of compressing large amounts of data and communicating with humans. In this paper, we consider the question of comparing sets of summaries generated from sensed data. In an earlier work, we developed a metric between individual protoform‐based summaries; and here, as a next step, we propose aggregation methods to fuse these individual distances. We provide a case study from eldercare where the goal is to compare different nighttime patterns for change detection. © 2012 Wiley Periodicals, Inc.  相似文献   

4.
Data Mining in Large Databases Using Domain Generalization Graphs   总被引:5,自引:0,他引:5  
Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.  相似文献   

5.
In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set. We showed long threads to ten different raters and asked them to create a summary by selecting the posts that they considered to be the most important for the thread. We study the agreement between human raters on the summarization task, and we show how multiple reference summaries can be combined to develop a successful model for automatic summarization. We found that although the inter-rater agreement for the summarization task was slight to fair, the automatic summarizer obtained reasonable results in terms of precision, recall, and ROUGE. Moreover, when human raters were asked to choose between the summary created by another human and the summary created by our model in a blind side-by-side comparison, they judged the model’s summary equal to or better than the human summary in over half of the cases. This shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries. In addition, we investigated the potential for personalized summarization. However, the results for the three raters involved in this experiment were inconclusive. We release the reference summaries as a publicly available dataset.  相似文献   

6.
Temporal XML: modeling, indexing, and query processing   总被引:1,自引:0,他引:1  
In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time. We study the temporal constraints imposed by the data model, and present algorithms for validating a temporal XML document against these constraints, along with methods for fixing inconsistent documents. In addition, we discuss different ways of mapping the abstract representation into a temporal XML document, and introduce TXPath, a temporal XML query language that extends XPath 2.0. In the second part of the paper, we present our approach for summarizing and indexing temporal XML documents. In particular we show that by indexing continuous paths, i.e., paths that are valid continuously during a certain interval in a temporal XML graph, we can dramatically increase query performance. To achieve this, we introduce a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes. Within this framework, we present two new summaries: LCP and Interval summaries. The indexing scheme, denoted TempIndex, integrates these summaries with additional data structures. We give a query processing strategy based on TempIndex and a type of ancestor-descendant encoding, denoted temporal interval encoding. We present a persistent implementation of TempIndex, and a comparison against a system based on a non-temporal path index, and one based on DOM. Finally, we sketch a language for updates, and show that the cost of updating the index is compatible with real-world requirements.  相似文献   

7.
The amount of data that is generated during the execution of a business process is growing. As a consequence it is increasingly hard to extract useful information from the large amount of data that is produced. Linguistic summarization helps to point business analysts in the direction of useful information, by verbalizing interesting patterns that exist in the data. In previous work we showed how linguistic summarization can be used to automatically generate diagnostic statements about event logs, such as ‘for most cases that contained the sequence ABC, the throughput time was long’. However, we also showed that our technique produced too many of these statements to be useful in a practical setting. Therefore this paper presents a novel technique for linguistic summarization of event logs, which generates linguistic summaries that are concise enough to be used in a practical setting, while at the same time enriching the summaries that are produced by also enabling conjunctive statements. The improved technique is based on pruning and clustering of linguistic summaries. We show that it can be used to reduce the number of summary statements 80–100% compared to previous work. In a survey among 51 practitioners, we found that practitioners consider linguistic summarization useful and easy to use and intend to use it if it were commercially available.  相似文献   

8.
Summaries are an essential component of video retrieval and browsing systems. Most research in video summarization has focused on content analysis to obtain compact yet comprehensive representations of video items. However, important aspects such as how they can be effectively integrated in mobile interfaces and how to predict the quality and usability of the summaries have not been investigated. Conventional summaries are limited to a single instance with certain length (i.e. a single scale). In contrast, scalable summaries target representations with multiple scales, that is, a set of summaries with increasing length in which longer summaries include more information about the video. Thus, scalability provides high flexibility that can be exploited in devices such as smartphones or tablets to provide versions of the summary adapted to the limited visualization area. In this paper, we explore the application of scalable storyboards to summary adaptation and zoomable video navigation in handheld devices. By introducing a new adaptation dimension related with the summarization scale, we can formulate navigation and adaptation in a two-dimensional adaptation space, where different navigation actions modify the trajectory in that space. We also describe the challenges to evaluate scalable summaries and some usability issues that arise from having multiple scales, proposing some objective metrics that can provide useful insight about their potential quality and usability without requiring very costly user studies. Experimental results show a reasonable agreement with the trends shown in subjective evaluations. Experiments also show that content-based scalable storyboards are less redundant and useful than the content-blind baselines.  相似文献   

9.
International Journal on Software Tools for Technology Transfer - We show how to underapproximate the procedure summaries of recursive programs over the integers using off-the-shelf analyzers for...  相似文献   

10.
Bayesian models are increasingly used to analyze complex multivariate outcome data. However, diagnostics for such models have not been well developed. We present a diagnostic method of evaluating the fit of Bayesian models for multivariate data based on posterior predictive model checking (PPMC), a technique in which observed data are compared to replicated data generated from model predictions. Most previous work on PPMC has focused on the use of test quantities that are scalar summaries of the data and parameters. However, scalar summaries are unlikely to capture the rich features of multivariate data. We introduce the use of dissimilarity measures for checking Bayesian models for multivariate outcome data. This method has the advantage of checking the fit of the model to the complete data vectors or vector summaries with reduced dimension, providing a comprehensive picture of model fit. An application with longitudinal binary data illustrates the methods.  相似文献   

11.
ECU作为汽车发动机控制系统的核心,对其进行自主知识产权的智能研究是十分必要的结合国内实际和总结分析现有系统.采用世界先进技术和综合设计思想.给出了基于CAN总线的汽车发动机智能电子控制器的设计方案.包括系统硬件控制结构和软件控制算法.这是进一步设计整体控制系统的前期基础工作。  相似文献   

12.
One of the critical issues in Web-based e-commerce has been how to efficiently and effectively integrate and query heterogeneous, diverse e-catalogs. We propose an integration framework for building and querying catalogs. Our approach is based on a hybrid of peer-to-peer data sharing paradigm and Web-services architecture. Peers in our system serve as domain-specific data integration mediators. Links between peers are established based on the similarity of the domain they represent. The relationships are used for routing queries among peers. As the number of catalogs involved grow larger, the need for filtering irrelevant data sources will become increasingly high. We apply a summarisation technique to summarise the content of catalogs. The summaries are used to pre-selecting data sources that are relevant to a user query. We use terms e-catalog and catalog interchangeably.  相似文献   

13.
For some years, data summarization techniques have been developed to handle the growth of databases. However these techniques are usually not provided with tools for end-users to efficiently use the produced summaries. This paper presents a first attempt to develop a querying tool for the SAINTETIQ summarization model. The proposed search algorithm takes advantage of the hierarchical structure of the SAINTETIQ summaries to efficiently answer questions such as “how are, on some attributes, the tuples which have specific characteristics?” Moreover, this algorithm can be seen both as a boolean querying mechanism over a hierarchy of summaries, and as a flexible querying mechanism over the underlying relational tuples.  相似文献   

14.
面向企业应用的RFID集成中间件框架   总被引:1,自引:0,他引:1  
本文针对RFID系统与企业应用系统的集成问题,设计并实现了RFID集成中间件,RFID集成中间件由单据服务器、硬件服务器、业务流程控制引擎、数据处理引擎、数据库管理、公共信息服务模块组成。数据流技术应用于RFID数据的过滤、封装环节,采用规则引擎与Ioc技术实现了RFID数据与业务流程的集成和融合,以及处理组件可配置性,分布式架构则提高了中间件的部署灵活性。  相似文献   

15.
Metasearch engines offer better coverage and are more fault-tolerant and expandable than single search engines. A metasearch engine is required to post queries with and obtain retrieval results from several other Internet search engines. In this paper, we describe the use of the extensible style language (XSL) to support metasearches. We show how XSL can transform a query, expressed in XML, into different forms for different search engines. We show how the retrieval results could be transformed into a standard format so that the metasearch engine can interpret the retrieved data, filtering the irrelevant information (e.g. advertisement). The proposed structure treats the metasearch engine and the individual search engines as separate modules with a clearly defined communication structure through XSL. Thus, the system is more extensible than coding the structure and syntactic transformation processes. It allows other new search engines to be included just through plug-and-play, requiring only that the new transformation of XML for this search engine be included in the XSL.  相似文献   

16.
XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potential, some mechanism is needed to publish relational data as XML documents. Towards that goal, one of the major challenges is finding a way to efficiently structure and tag data from one or more tables as a hierarchical XML document. Different alternatives are possible depending on when this processing takes place and how much of it is done inside the relational engine. In this paper, we characterize and study the performance of these alternatives. Among other things, we explore the use of new scalar and aggregate functions in SQL for constructing complex XML documents directly in the relational engine. We also explore different execution plans for generating the content of an XML document. The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit. Our results also show the superiority of having the relational engine use what we call an “outer union plan” to generate the content of an XML document. Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001  相似文献   

17.
Linguistic rules in natural language are useful and consistent with human way of thinking. They are very important in multi-criteria decision making due to their interpretability. In this paper, our discussions concentrate on extracting linguistic rules from data sets. In the end, we firstly analyze how to extract complex linguistic data summaries based on fuzzy logic. Then, we formalize linguistic rules based on complex linguistic data summaries, in which, the degree of confidence of linguistic rules from a data set can be explained by linguistic quantifiers and its linguistic truth from the fuzzy logical point of view. In order to obtain a linguistic rule with a higher degree of linguistic truth, a genetic algorithm is used to optimize the number and parameters of membership functions of linguistic values. Computational results show that the proposed method is an alternative method for extracting linguistic rules with linguistic truth from data sets.  相似文献   

18.
XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document’s content in a persistent store, under the form of path indices or path tables. We revisit the notions of path summaries and path-driven storage model in the context of current-day XML databases. This context is characterized by complex queries, typically expressed in an XQuery subset, and by the presence of efficient encoding techniques such as structural node identifiers. We review a path summary’s many uses for query optimization, and given them a common basis, namely relevant paths. We discuss summary-based tree pattern minimization and present some efficient summary-based minimization heuristics. We consider relevant path computation and provide a time- and memory-efficient computation algorithm. We combine the principle of path partitioning with the presence of structural identifiers in a simple path-partitioned storage model, which allows for selective data access and efficient query plans. This model improves the efficiency of twig query processing up to two orders of magnitude over the similar tag-partitioned indexing model. We have implemented the path-partitioned storage model and path summaries in the XQueC compressed database prototype [8]. We present an experimental evaluation of a path summary’s practical feasibility and of tree pattern matching in a path-partitioned store.  相似文献   

19.
在大多数信息管理中都存在模式和数据转换,且都是单独被研究。相信有统一机制,将中间件看成是转换引擎,并且讨论在什么时候进行转换如何进行转换。  相似文献   

20.
The need for a programming language abstraction for timed preemption is argued, and several possibilities for such an abstraction are presented. One, called engines, is adopted. Engines are an abstraction of bounded computation, not a process abstraction in the usual sense. However, in conjuction with first class continuations, engines allow a language to be extended with time-sharing implementations for a variety of process abstraction facilities. We present a direct implementation of hiaton streams. Engine nesting refers to the initiation of an engine computation by an already running engine. We consider the need for engine nesting and show how it may be accomplished in a manner that charges a parent engine for the computation of its offspring. We conclude by discussing the importance of simple and general abstractions such as engines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号