首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We study efficient discovery of proximity word-association patterns, defined by a sequence of strings and a proximity gap, from a collection of texts with the positive and the negative labels. We present an algorithm that finds alld-stringsk-proximity word-association patterns that maximize the number of texts whose matching agree with their labels. It runs in expected time complexityO(k d−1n log d n) and spaceO(k d−1n) with the total lengthn of texts, if texts are uniformly random strings. We also show that the problem to find one of the best word-association patterns with arbitrarily many strings in MAX SNP-hard. Shinichi Shimozono, Ph.D.: He is an Associate Professor of the Department of Artificial Intelligence at Kyushu Institute of Technology Iizuka, Japan. He obtained the B.S. degree in Physics from Kyushu University, awarded M.S. degree from Graduate School of Information Science in Kyushu University, and his Dr. Sci. degree in 1996 from Kyushu University. His research interests are primarily in the design and analysis of algorithms for intractable problems. Hiroki Arimura, Ph.D.: He is an Associate Professor of the Department of Informatics at Kyushu University, Fukuoka, Japan. He is also a researcher with Precursory Research for Embryonic Science and Technology, Japan Science and Technology Corporation (JST) since 1999. He received the B.S. degree in 1988 in Physics, the M.S. degree in 1979 and the Dr.Sci. degree in 1994 in Information Systems from Kyushu University. His research interests include data mining, computational learning theory, and inductive logic programming. Setsuo Arikawa, Ph.D.: He is a Professor of the Department of Informatics and the Director of University Library at Kyushu University, Fukuoka, Japan. He received the B.S. degree in 1964, the M.S. degree in 1966 and the Dr.Sci. degree in 1969 all in Mathematics from Kyushu University. His research interests include Discovery Science, Algorithmic Learning Theory, Logic and Inference/Reasoning in AI, Pattern Matching Algorithms and Library Science. He is the principal investigator of the Discovery Science Project sponsored by the Grant-in Aid for Scientific Research on Priority Area from the Ministry of ESSC, Japan.  相似文献   

2.
A logic-based approach to the specification of active database functionality is presented which not only endows active databases with a well-defined and well-understood formal semantics, but also tightly integrates them with deductive databases. The problem of endowing deductive databases with rule-based active behaviour has been addressed in different ways. Typical approaches include accounting for active behaviour by extending the operational semantics of deductive databases, or, conversely, accounting for deductive capabilities by constraining the operational semantics of active databases. The main contribution of the paper is an alternative approach in which a class of active databases is defined whose operational semantics is naturally integrated with the operational semantics of deductive databases without either of them strictly subsuming the other. The approach is demonstrated via the formalization of the syntax and semantics of an active-rule language that can be smoothly incorporated into existing deductive databases, due to the fact that the standard formalization of deductive databases is reused, rather than altered or extended. One distinctive feature of the paper is its use of ahistory, as defined in the Kowalski-Sergot event-calculus, to define event occurrences, database states and actions on these. This has proved to be a suitable foundation for a comprehensive logical account of the concept set underpinning active databases. The paper thus contributes a logical perspective to the ongoing task of developing a formal theory of active databases. Alvaro Adolfo Antunes Fernandes, Ph.D.: He received a B.Sc. in Economics (Rio de Janeiro, 1984), an M.Sc. in Knowledge-Based Systems (Edinburgh, 1990) and a Ph.D. in Computer Science (Heriot-Watt, 1995). He worked as a Research Associate at Heriot-Watt University from December 1990 until December 1995. In January 1996 he joined the Department of Mathematical and Computing Sciences at Goldsmiths College, University of London, as a Lecturer. His current research interests include advanced data- and knowledge-base technology, logic programming, and software engineering. M. Howard Williams, Ph.D., D.Sc.: He obtained his Ph.D. in ionospheric physics and recently a D.Sc. in Computer Science. He was appointed as the first lecturer in Computer Science at Rhodes University in 1970. During the following decade he rose to Professor of Computer Science and in 1980 was appointed as Professor of Computer Science at Heriot-Watt University. From 1980 to 1988 he served as Head of Department and then as director of research until 1992. He is now head of the Database Research Group at Heriot-Watt University. His current research interests include active databases, deductive objectoriented databases, spatial databases, parallel databases and telemedicine. Norman W. Paton, Ph.D.: He received a B.Sc. in Computing Science from the University of Aberdeen in 1986. From 1986 to 1989 he worked as a Research Assistant at the University of Aberdeen, receiving a Ph. D. in 1989. From 1989 to 1995 he was a Lecturer in Computer Science at Heriot-Watt University. Since July 1995, he has been a Senior Lecturer in Department of Computer Science at the University of Manchester. His current research interests include active databases, deductive object-oriented databases, spatial databases and database interfaces.  相似文献   

3.
Databases in large organizations are used for supporting the storage and manipulation of both operational and aggregate data. The operational data is characterized by frequent updates based on changes to values of items in the real world. Aggregate data is used in statistical analysis for decision making, forcasting and formulation of business strategies. The accuracy of the results of statistical processing depends on the processing resources allocated to the update transactions relative to the query and statistical reporting requests. Different types of statistical processing may require varying degress of data accuracy. For example, long range planning decisions may require less accurate data than those needed for tactical decisions. In this paper we propose a model for estimating the accuracy of a database and for the planning of computing resources based on database system performance and accuracy requirements of various groups of users.  相似文献   

4.
Partial information in databases can arise when information from several databases is combined. Even if each database is complete for some “world”, the combined databases will not be, and answers to queries against such combined databases can only be approximated. In this paper we describe various situations in which a precise answer cannot be obtained for a query asked against multiple databases. Based on an analysis of these situations, we propose a classification of constructs that can be used to model approximations.

The main goal of the paper is to study several formal models of approximations and their semantics. In particular, we obtain universality properties for these models of approximations. Universality properties suggest syntax for languages with approximations based on the operations which are naturally associated with them. We prove universality properties for most of the approximation constructs. Then we design languages built around datatypes given by the approximation constructs. A straightforward approach results in languages that have a number of limitations. In an attempt to overcome those limitations, we explain how all the languages can be embedded into a language for conjunctive and disjunctive sets from Libkin and Wong (1996) and demonstrate its usefulness in querying independent databases. We also discuss the semantics of approximation constructs and the relationship between them.  相似文献   


5.
The semantics of engineering data can be represented in terms of constraints and can be maintained via constraint checking and enforcement. Conventional databases (e.g. Relational Databases) are inadequate for maintaining engineering data semantics because they have no effective means for representing and checking/enforcing constrains. As an extension of relational databases, deductive databases overcome this inadequacy by enabling constraint representation and checking. However, they too lack the constraint enforcement ability. a logic-based mechanism for enforcing constrainst in deductive databases is presented in this paper. The mechanism is composed of two operators: a truth enforcement operator and a falsity enforcement operator. The mechanics of these operators and their use for integrity enfocement are described herein.  相似文献   

6.
7.
8.
Abstract This article examines the use of small-scale databases in introductory Social Science courses at degree level. It suggests that databases can act as an introduction to both statistical practice and investigative learning, but only if supported by considerations of methodology. The results of a research project where students used a specially written database are reported. This indicates that there are both strengths and weaknesses in the use of databases, but it is suggested that these weaknesses can be overcome to a large extent by carefully considering what we expect students to be able to do with statistics. The use of generic databases and spreadsheets as methods of handling data is also discussed.  相似文献   

9.
In this paper, we deal with the problem of verifying local stratifiability of logic programs and databases presented by Przymusinski. The notion of dependency graphs is generalized from representing the priority relation between predicate symbols to representing the priority between atoms. Necessary and sufficient conditions for the local stratifiability of logic programs are presented and algorithms for performing the verification are developed. Finally, we prove that a database DB containing clauses with disjunctive consequents can easily be converted into a logic program P such that DB is locally stratified iff P is locally stratified. Yi-Dong Shen, Dr.: Department of Computer Science, Chongqing University, Chongqing, 630044, P.R. China (Present Address) c/o Ping Ran, Department of Heat Power Engineering, Chongqing UniversityResearch interests: Artificial Intelligence, Deductive Databases, Logic Programming, Non-Monotonic Reasoning, Parallel Processing  相似文献   

10.
Today’s applications are both object-oriented and based on a new type of three-tiered client–server architecture with clients, processing servers, and data servers as cornerstones. By recognising these trends, industry and researchers have been engaged in defining standards and technologies for communicating the components of Distributed Information Systems and for providing compatible mechanisms to access databases, but a key problem with these complex architectures is still their performance. This paper presents a tool for predicting the performance of systems based on CORBA and DCOM as distributed-object architectures, and OLE-DB and PL/SQL as data-access architectures. The tool is an extension of SMART, a workbench that exploits analytical and simulation performance models to predict the performance of database applications.  相似文献   

11.
Some significant progress related to multidimensional data analysis has been achieved in the past few years, including the design of fast algorithms for computing datacubes, selecting some precomputed group-bys to materialize, and designing efficient storage structures for multidimensional data. However, little work has been carried out on multidimensional query optimization issues. Particularly the response time (or evaluation cost) for answering several related dimensional queries simultaneously is crucial to the OLAP applications. Recently, Zhao et al. first exploited this problem by presenting three heuristic algorithms. In this paper we first consider in detail two cases of the problem in which all the queries are either hash-based star joins or index-based star joins only. In the case of the hash-based star join, we devise a polynomial approximation algorithm which delivers a plan whose evaluation cost is $ O(n^{\epsilon }$) times the optimal, where n is the number of queries and is a fixed constant with . We also present an exponential algorithm which delivers a plan with the optimal evaluation cost. In the case of the index-based star join, we present a heuristic algorithm which delivers a plan whose evaluation cost is n times the optimal, and an exponential algorithm which delivers a plan with the optimal evaluation cost. We then consider a general case in which both hash-based star-join and index-based star-join queries are included. For this case, we give a possible improvement on the work of Zhao et al., based on an analysis of their solutions. We also develop another heuristic and an exact algorithm for the problem. We finally conduct a performance study by implementing our algorithms. The experimental results demonstrate that the solutions delivered for the restricted cases are always within two times of the optimal, which confirms our theoretical upper bounds. Actually these experiments produce much better results than our theoretical estimates. To the best of our knowledge, this is the only development of polynomial algorithms for the first two cases which are able to deliver plans with deterministic performance guarantees in terms of the qualities of the plans generated. The previous approaches including that of [ZDNS98] may generate a feasible plan for the problem in these two cases, but they do not provide any performance guarantee, i.e., the plans generated by their algorithms can be arbitrarily far from the optimal one. Received: July 21, 1998 / Accepted: August 26, 1999  相似文献   

12.
Over the years many efficient algorithms for the multiplierless design of multiple constant multiplications (MCMs) have been introduced. These algorithms primarily focus on finding the fewest number of addition/subtraction operations that generate the MCM. Although the complexity of an MCM design is decreased by reducing the number of operations, their solutions may not lead to an MCM design with optimal area at gate-level since they do not consider the implementation costs of the operations in hardware. This article introduces two approximate algorithms that aim to optimize the area of the MCM operation by taking into account the gate-level implementation of each addition and subtraction operation which realizes a constant multiplication. To find the optimal tradeoff between area and delay, the proposed algorithms are further extended to find an MCM design with optimal area under a delay constraint. Experimental results clearly indicate that the solutions of the proposed algorithms lead to significantly better MCM designs at gate-level when compared to those obtained by the solutions of algorithms designed for the optimization of the number of operations.  相似文献   

13.
Modern spatial database applications built on top of distributed and heterogeneous spatial information sources such as conventional spatial databases underlying Geographical Information Systems (GIS), spatial data files and spatial information acquired or inferred from the Web, suffer from data integration and topological consistency problems. This more-and-more conveys in incomplete information, which makes answering range queries over incomplete spatial databases a leading research challenge in spatial database systems research. A significant instance of this setting is represented by the application scenario in which the geometrical information on a sub-set of spatial database objects is incomplete whereas the spatial database still stores topological relations among these objects (e.g., containment relations). Focusing on the spatial database application scenario above, in this paper we propose and experimentally assess a novel technique for efficiently answering range queries over incomplete spatial databases via integrating geometrical information and topological reasoning. We also propose I-SQE (Spatial Query Engine for Incomplete Information), an innovative query engine implementing this technique. Our proposed technique results not only effective but also efficient against both synthetic and real-life spatial data sets, and it finally allows us to enhance the quality and the expressive power of retrieved answers by meaningfully taking advantages from the amenity of representing spatial database objects via both the geometrical and the topological level.  相似文献   

14.
Spatial database operations are typically performed in two steps. In the filtering step, indexes and the minimum bounding rectangles (MBRs) of the objects are used to quickly determine a set of candidate objects. In the refinement step, the actual geometries of the objects are retrieved and compared to the query geometry or each other. Because of the complexity of the computational geometry algorithms involved, the CPU cost of the refinement step is usually the dominant cost of the operation for complex geometries such as polygons. Although many run-time and pre-processing-based heuristics have been proposed to alleviate this problem, the CPU cost still remains the bottleneck. In this paper, we propose a novel approach to address this problem using the efficient rendering and searching capabilities of modern graphics hardware. This approach does not require expensive pre-processing of the data or changes to existing storage and index structures, and is applicable to both intersection and distance predicates. We evaluate this approach by comparing the performance with leading software solutions. The results show that by combining hardware and software methods, the overall computational cost can be reduced substantially for both spatial selections and joins. We integrated this hardware/software co-processing technique into a popular database to evaluate its performance in the presence of indexes, pre-processing and other proprietary optimizations. Extensive experimentation with real-world data sets show that the hardware-accelerated technique not only outperforms the run-time software solutions but also performs as well if not better than pre-processing-assisted techniques.  相似文献   

15.
A database allows its users to reduce uncertainty about the world. However, not all properties of all objects can always be stored in a database. As a result, the user may have to use probabilistic inference rules to estimate the data required for his decisions. A decision based on such estimated data may not be perfect. The authors call the costs associated with such suboptimal decisions the cost of incomplete information. This cost can be reduced by expanding the database to contain more information; such expansion will increase the data-related costs because of more data collection, manipulation, storage, and retrieval. A database designer must then consider the trade-off between the cost of incomplete information and the data-related costs, and choose a design that minimizes the overall cost to the organization. In temporal databases, the sheer volume of the data involved makes such a trade-off at design time all the more important. They develop probabilistic inference rules that allow one to infer missing values in spatial, as well as temporal, dimension. They then use the framework for developing guidelines for designing and reorganizing temporal databases, which explicitly includes a trade-off between the incomplete information and the data-related costs  相似文献   

16.
The design of distributed computer systems (DCS) requires compromise among several desirable and conflicting objectives. Design tools for this purpose should, therefore, facilitate the process of making such tradeoffs. To this end, this paper presents a prototype Decision Support System (DSS) which uses multicriteria decisions making techniques as the underlying methodology to aid the designer in making compromises in a systematic and efficient manner.While there are several isolated subproblems of DCS design that can be modelled and solved quantitatively, there are also, many design aspects that are difficult to quantify and/or formalize. Thus the design process requires a synthesis of analytic model execution and informed judgement on the part of the designer. The DSS aids in this iterative process by executing appropriate models and generating a sequence of ‘Pareto-optimal’ or non-dominated set of solution vectors. The relatively unstructured task of making tradeoffs among the components of each vector is left in the hands of the designer. Efficiency is achieved by avoiding needles search through clearly inferior solutions and focusing on non-dominated ones only. Various details of the prototype, implemented on a UNIVAC 1100 and an IBM PC AT, are highlighted in an example session. The potential advantages of using multicriteria techniques as opposed to the widespread practice of using single criteria optimization methods are also noted in this paper.  相似文献   

17.
为了确保在服务组合中获得Pareto最优解集,把服务组合建模为多个服务质量属性同时优化的多目标优化问题,提出了一种依据服务质量属性类型的通用预处理方法,采用多个信息素表和单个启发式信息表的多目标蚁群算法,蚂蚁随机选择一种信息素表建构可行解,每个蚁群周期完成后所有信息素都会蒸发,但每个优化函数只有一个最优解获得信息素增加,经过多过蚁群周期后即可解获得最优解集.实验结果表明,该方法可为Web服务组合提供一种很好的优化方案,具有很高的准确率.  相似文献   

18.
In areas as diverse as earth remote sensing, astronomy, and medical imaging, image acquisition technology has undergone tremendous improvements in recent years. The vast amounts of scientific data are potential treasure-troves for scientific investigation and analysis. Unfortunately, advances in our ability to deal with this volume of data in an effective manner have not paralleled the hardware gains. While special-purpose tools for particular applications exist, there is a dearth of useful general-purpose software tools and algorithms which can assist a scientist in exploring large scientific image databases. This paper presents our recent progress in developing interactive semi-automated image database exploration tools based on pattern recognition and machine learning technology. We first present a completed and successful application that illustrates the basic approach: the SKICAT system used for the reduction and analysis of a 3 terabyte astronomical data set. SKICAT integrates techniques from image processing, data classification, and database management. It represents a system in which machine learning played a powerful and enabling role, and solved a difficult, scientifically significant problem. We then proceed to discuss the general problem of automated image database exploration, the particular aspects of image databases which distinguish them from other databases, and how this impacts the application of off-the-shelf learning algorithms to problems of this nature. A second large image database is used to ground this discussion: Magellan's images of the surface of the planet Venus. The paper concludes with a discussion of current and future challenges.  相似文献   

19.
In this paper we study a cell of the subdivision induced by a union ofn half-lines (or rays) in the plane. We present two results. The first one is a novel proof of theO(n) bound on the number of edges of the boundary of such a cell, which is essentially of methodological interest. The second is an algorithm for constructing the boundary of any cell, which runs in optimal (n logn) time. A by-product of our results are the notions of skeleton and of skeletal order, which may be of interest in their own right.This work was partly supported by CEE ESPRIT Project P-940, by the Ecole Normale Supérieure, Paris, and by NSF Grant ECS-84-10902.This work was done in part while this author was visiting the Ecole Normale Supérieure, Paris, France.  相似文献   

20.
In this paper we study a cell of the subdivision induced by a union ofn half-lines (or rays) in the plane. We present two results. The first one is a novel proof of theO(n) bound on the number of edges of the boundary of such a cell, which is essentially of methodological interest. The second is an algorithm for constructing the boundary of any cell, which runs in optimal Θ(n logn) time. A by-product of our results are the notions of skeleton and of skeletal order, which may be of interest in their own right.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号