首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Semantic schema theory is a theoretical model used to describe the behavior of evolutionary algorithms. It partitions the search space to schemata, defined in semantic level, and studies their distribution during the evolution. Semantic schema theory has definite advantages over popular syntactic schema theories, for which the reliability and usefulness are criticized. Integrating semantic awareness in genetic programming (GP) in recent years sheds new light also on schema theory investigations. This paper extends the recent work in semantic schema theory of GP by utilizing information based clustering. To this end, we first define the notion of semantics for a tree based on the mutual information between its output vector and the target and introduce semantic building blocks to facilitate the modeling of semantic schema. Then, we propose information based clustering to cluster the building blocks. Trees are then represented in terms of the active occurrence of building block clusters and schema instances are characterized by an instantiation function over this representation. Finally, the expected number of schema samples is predicted by the suggested theory. In order to evaluate the suggested schema, several experiments were conducted and the generalization, diversity preserving capability and efficiency of the schema were investigated. The results are encouraging and remarkably promising compared with the existing semantic schema.  相似文献   

3.
This paper addresses the problem of handling semantic heterogeneity during database schema integration. We focus on the semantics of terms used as identifiers in schema definitions. Our solution does not rely on the names of the schema elements or the structure of the schemas. Instead, we utilize formal ontologies consisting of intensional definitions of terms represented in a logical language. The approach is based on similarity relations between intensional definitions in different ontologies. We present the definitions of similarity relations based on intensional definitions in formal ontologies. The extensional consequences of intensional relations are addressed. The paper shows how similarity relations are discovered by a reasoning system using a higher-level ontology. These similarity relations are then used to derive an integrated schema in two steps. First, we show how to use similarity relations to generate the class hierarchy of the global schema. Second, we explain how to enhance the class definitions with attributes. This approach reduces the cost of generating or re-generating global schemas for tightly-coupled federated databases.  相似文献   

4.
5.
Multimedia data mining refers to pattern discovery, rule extraction and knowledge acquisition from multimedia database. Two typical tasks in multimedia data mining are of visual data classification and clustering in terms of semantics. Usually performance of such classification or clustering systems may not be favorable due to the use of low-level features for image representation, and also some improper similarity metrics for measuring the closeness between multimedia objects as well. This paper considers a problem of modeling similarity for semantic image clustering. A collection of semantic images and feed-forward neural networks are used to approximate a characteristic function of equivalence classes, which is termed as a learning pseudo metric (LPM). Empirical criteria on evaluating the goodness of the LPM are established. A LPM based k-Mean rule is then employed for the semantic image clustering practice, where two impurity indices, classification performance and robustness are used for performance evaluation. An artificial image database with 11 semantics is employed for our simulation studies. Results demonstrate the merits and usefulness of our proposed techniques for multimedia data mining.  相似文献   

6.
7.
When transforming relational database (RDB) schema into object-oriented database(OODB) schema, much effort was put on examining key and inclusion dependency (ID) constraints to identify class and establish inheritance and association between classes. However, in order to further remove the original data redundancy and update anomaly, multi-valued dependency (MVD) should also be examined. In this paper, we discuss class structures and define well-structured classes. Based on MVDs, a theorem is given transforming a relation schema into a well-structured class. To transform RDB schema into OODB schema, a composition process simplifying the input RDB schema and an algorithm transforming the simplified RDB schema into well-structured OODB classes are developed.  相似文献   

8.
A methodology for integration of heterogeneous databases   总被引:6,自引:0,他引:6  
The transformation of existing local databases to meet diverse application needs at the global level is performed through a four-layered procedure that stresses total schema integration and virtual integration of local databases. The proposed methodology covers both schema integration and database integration, and uses a four-layered schema architecture (local schemata, local object schemata, global schema, and global view schemata) with each layer presenting an integrated view of the concepts that characterize the layer below. Mechanisms for accomplishing this objective are presented in theoretical terms, along with a running example. Object equivalence classes, property equivalence classes, and other related concepts are discussed in the context of logical integration of heterogeneous schemata, while object instance equivalence classes and property instance equivalence classes, and other related concepts are discussed for data integration purposes. The proposed methodology resolves naming conflicts, scaling conflicts, type conflicts, and level of abstraction, and other types of conflicts during schema integration, and data inconsistencies during data integration  相似文献   

9.
10.
A method for learning knowledge from a database is used to address the bottleneck of manual knowledge acquisition. An attempt is made to improve representation with the assistance of experts and from computer resident knowledge. The knowledge representation is described in the framework of a conceptual schema consisting of a semantic model and an event model. A concept classifies a domain into different subdomains. As a method of knowledge acquisition, inductive learning techniques are used for rule generation. The theory of rough sets is used in designing the learning algorithm. Examples of certain concepts are used to induce general specifications of the concepts called classification rules. The basic approach is to partition the information into equivalence classes and to derive conclusions based on equivalence relations. In a sense, what is involved is a data-reduction process, where the goal is to reduce a large database of information to a small number of rules describing the domain. This completely integrated approach includes user interface, semantics, constraints, representations of temporal events, induction, etc  相似文献   

11.
12.
While recent data management technologies, such as object oriented techniques, address the problem of database schema evolution, standard information systems currently in use raise challenging evolution problems. This paper examines database evolution from the developer point of view. It shows how requirements changes are propagated to database schemas, to data and to programs through a general strategy. This strategy requires the documentation of database design. When absent, such documentation has to be rebuilt through reverse engineering techniques. Our approach, called DB-MAIN, relies on a generic database model and on transformational paradigm that states that database engineering processes can be modeled by schema transformations. Indeed, a transformation provides both structural and instance mappings that formally define how to modify database structures and contents. We describe both the complete and a simplified approaches, and compare their merits and drawbacks. We then analyze the problem of program modification and describe a CASE tool that can assist developers in their task of system evolution. We illustrate our approach with Biomaze, a biochemical knowledge-based the database of which is rapidly evolving.  相似文献   

13.
On using partial supervision for text categorization   总被引:1,自引:0,他引:1  
We discuss the merits of building text categorization systems by using supervised clustering techniques. Traditional approaches for document classification on a predefined set of classes are often unable to provide sufficient accuracy because of the difficulty of fitting a manually categorized collection of documents in a given classification model. This is especially the case for heterogeneous collections of Web documents which have varying styles, vocabulary, and authorship. Hence, we investigate the use of clustering in order to create the set of categories and its use for classification of documents. Completely unsupervised clustering has the disadvantage that it has difficulty in isolating sufficiently fine-grained classes of documents relating to a coherent subject matter. We use the information from a preexisting taxonomy in order to supervise the creation of a set of related clusters, though with some freedom in defining and creating the classes. We show that the advantage of using partially supervised clustering is that it is possible to have some control over the range of subjects that one would like the categorization system to address, but with a precise mathematical definition of how each category is defined. An extremely effective way then to categorize documents is to use this a priori knowledge of the definition of each category. We also discuss a new technique to help the classifier distinguish better among closely related clusters.  相似文献   

14.
Multi-label learning deals with the problem where each instance is associated with multiple labels simultaneously. The task of this learning paradigm is to predict the label set for each unseen instance, through analyzing training instances with known label sets. In this paper, a neural network based multi-label learning algorithm named Ml-rbf is proposed, which is derived from the traditional radial basis function (RBF) methods. Briefly, the first layer of an Ml-rbf neural network is formed by conducting clustering analysis on instances of each possible class, where the centroid of each clustered groups is regarded as the prototype vector of a basis function. After that, second layer weights of the Ml-rbf neural network are learned by minimizing a sum-of-squares error function. Specifically, information encoded in the prototype vectors corresponding to all classes are fully exploited to optimize the weights corresponding to each specific class. Experiments on three real-world multi-label data sets show that Ml-rbf achieves highly competitive performance to other well-established multi-label learning algorithms.  相似文献   

15.
Extracting Schema from an OEM Database   总被引:1,自引:0,他引:1       下载免费PDF全文
While the schema-less feature of the OEM(Object Exchange Modl)gives flexibility in representing semi-structured data,it brings difficulty in formulating database queries. Extracting schema from an OEM database then becomes an important research topic.This paper presents a new approach to this topic with th following reatures.(1)In addition to representing th nested label structure of an OEM database,the proposed OEM schema keeps up-tp-date information about instance objects of the database,The object-level information is useful in speeding up query evaluation.(2)The OEM schema is explicitly represented as a label-set,which is easy to construct and update.(3)The OEM schema of a database is statically built and dynamically updated.The time complexity of building the OEM schems is linear in the size of the OEM database.(4)The approach is applicable to a wide range of areas where the underlying schema is much smaller than the database itself(e.g.data warehouses that are made from a set of heterogeneous databases).  相似文献   

16.
Representation and reasoning of actions is a wide spread area in the domain of Artificial Intelligence. The representation involves natural language instructions, which are based on the linguistic concepts and the reasoning methodology deals with the logical structures. In the computational domain, several theories pertaining to the state-space approach have been proposed to represent and reason out actions. Considering these aspects, this paper provides an account of work from the viewpoint of linguistics, logic and action representation formalisms. Based on this study, this paper then proposes a seven axes categorization scheme, that can be used to compare and analyze different theories.  相似文献   

17.
基于VPRS的信息系统近似决策规则优化   总被引:1,自引:0,他引:1  
基于变精度粗糙集模型,在决策信息系统中定义近似协调等价类的一种近似约简;构造相应的区分函数,利用布尔推理理论求取近似协调等价类的近似约简,并由此获取近似决策规则的简化决策规则。利用这个方法得到的简化决策规则,与原系统中的近似决策规则是相容的。  相似文献   

18.
Exception handling plays a key role in dynamic workflow management that enables streamlined business processes. Handling application-specific exceptions is a knowledge-intensive process involving different decision-making strategies and a variety of knowledge, especially much fuzzy knowledge. Current efforts in workflow exception management are not adequate to support the knowledge-based exception handling. This paper proposes a hybrid exception handling approach based on two extended knowledge models, i.e., generalized fuzzy event–condition–action (GFECA) rule and typed fuzzy Petri net extended by process knowledge (TFPN-PK). The approach realizes integrated representation and reasoning of fuzzy and non-fuzzy knowledge as well as specific application domain knowledge and workflow process knowledge. In addition, it supports two handling strategies, i.e., direct decision and analysis-based decision, during exception management. The approach fills in the gaps in existing related researches, i.e., only providing the capability of direct exception handling and neglecting fuzzy knowledge. Based on TFPN-PK, a weighted fuzzy reasoning algorithm is designed to address the reasoning problem of uncertain goal propositions and known goal concepts by combining forward reasoning with backward reasoning and therefore to facilitate cause analysis and handling of workflow exceptions. A prototype system is developed to implement the proposed approach.  相似文献   

19.
A path-method (PM) is a mechanism to retrieve or to update information relevant to one class, in an object-oriented database (OODB), that is not stored with that class but with some other class. The PM traverses a chain of classes and connections that ends at the class where the required information is stored. However, it is a difficult task for a user to write PMs. This is because it might require comprehensive knowledge of many classes of the conceptual schema. But a typical user has often incomplete or even inconsistent knowledge of the schema. Currently we are developing a system, called Path-Method Generator (PMG), which generates PMs automatically according to a naive user's requests. One algorithm of PMG uses numerical access relevance between pairs of classes as a guide for the traversal of an OODB schema. In this paper we define the notion of access relevance to measure the significance of the (indirect) connection between any two classes in an OODB and present efficient algorithms to compute access relevance. The manual PM generation in an interoperable multi object-oriented database (IM-OODB) is even more difficult than for one OODB since a user has to be familiar with several OODBs. We use a hierarchical approach for developing efficient online algorithms for the computation of access relevances in an IM-OODB, based on precomputed access relevances for each autonomous OODB. In an IM-OODB the access relevances are used as a guide in generating PMs between the classes of different OODBs.  相似文献   

20.
The text categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which proved effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, for text categorization, the integration of external WordNet lexical information to supplement training data for a semi-supervised clustering algorithm which (i) uses a finite design set of labeled data to (ii) help agglomerative hierarchical clustering algorithms (AHC) partition a finite set of unlabeled data and then (iii) terminates without the capacity to classify other objects. This algorithm is the “semi-supervised agglomerative hierarchical clustering algorithm” (ssAHC). Our experiments use Reuters 21578 database and consist of binary classifications for categories selected from the 89 TOPICS classes of the Reuters collection. Using the vector space model (VSM), each document is represented by its original feature vector augmented with external feature vector generated using WordNet. We verify experimentally that the integration of WordNet helps ssAHC improve its performance, effectively addresses the classification of documents into categories with few training documents, and does not interfere with the use of training data. © 2001 John Wiley & Sons, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号