首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
增量ETL过程的并行化是提高ODS数据实时性的有效途径。结合通信顺序进程理论研究了增量ETL过程模型,形式化分析了增量ETL过程事件在并行环境下执行状态的变换过程,提出了增量ETL过程并行调度算法,解决了增量ETL过程在并行环境下调度策略的问题。应用及实践表明,模型及算法具有源系统负载小、数据的实时性高等特点。  相似文献   

2.
Knowledge and Information Systems - NoSQL technologies have become a common component in many information systems and software applications. These technologies are focused on performance, enabling...  相似文献   

3.
ETL中的数据增量抽取机制研究   总被引:7,自引:0,他引:7  
为实现数据仓库中数据的高效更新,探讨了在数据仓库的ETL(数据的抽取、转换和装载)子系统中进行数据增量抽取时所采用的常见机制,对这些机制的原理、条件、方法以及运行效率等方面进行了详细的阐述,并从兼容性、完备性、性能和侵入性4个方面分析和比较了各种数据增量抽取机制的优劣性.最后,总结了选取数据增量抽取机制所应遵循的主要原则和标准.  相似文献   

4.
Distributed and Parallel Databases - When NoSQL database systems are used in an agile software development setting, data model changes occur frequently and thus, data is routinely stored in...  相似文献   

5.
Update propagation and transaction atomicity are major obstacles to the development of replicated databases. Many practical applications, such as automated teller machine networks, flight reservation, and part inventory control, do not require these properties. In this paper we present an approach for incrementally updating a distributed, replicated database without requiring multi-site atomic commit protocols. We prove that the mechanism is correct, as it asymptotically performs all the updates on all the copies. Our approach has two important characteristics: it is progressive, and non-blocking.Progressive means that the transaction's coordinator always commits, possibly together with a group of other sites. The update is later propagated asynchronously to the remaining sites.Non-blocking means that each site can take unilateral decisions at each step of the algorithm. Sites which cannot commit updates are brought to the same final state by means of areconciliation mechanism. This mechanism uses the history logs, which are stored locally at each site, to bring sites to agreement. It requires a small auxiliary data structure, called reception vector, to keep track of the time unto which the other sites are guaranteed to be up-to-date. Several optimizations to the basic mechanism are also discussed. Recommended by: Ahmed Elmagarmid  相似文献   

6.
分布式ETL负载均衡策略研究   总被引:1,自引:0,他引:1  
在分析分布式ETL中负载均衡重要性的基础上,针对传统ETL应用于分布式数据仓库中效率低的缺陷,提出一种根据ETL节点所抽取的数据类型不同对分布式ETL节点抽取的数据进行分割的策略,以及一种新的负载均衡模型—链网模型和Routers相结合的R-CN模型。在此基础上提出一种基于ETL数据分片和R-CN模型相结合的分布式ETL节点负载调度和均衡策略。此策略使ETL节点的数据处理能力有了很大的提高,有效地提高了分布式ETL的效率。  相似文献   

7.
NoSQL systems are increasingly adopted for Web applications requiring scalability that relational database systems cannot meet. Although NoSQL systems have not been designed to support joins, as they are applied to a wide variety of applications, the need to support joins has emerged. Furthermore, joins performed in NoSQL systems are generally similarity joins, rather than exact-match joins, which find similar pairs of records. Since Web applications often use the MapReduce framework, we develop a solution to perform similarity joins in NoSQL systems using the MapReduce framework.  相似文献   

8.
The ICL Distributed Array Processor (DAP) is an SIMD array processor containing a large, 2-D array of bit serial processing elements. The architecture of the DAP makes it well suited to data processing applications where searching operations must be carried out on large numbers of data records. This paper discusses the use of the DAP for two such applications, these being the scanning of serial text files and the clustering of a range of types of database. The processing efficiency of the DAP, when compared with a serial processor, is greatest when fixed length records are processed.  相似文献   

9.
Over the last decade 3D face models have been extensively used in many applications such as face recognition, facial animation and facial expression analysis. 3D Morphable Models (MMs) have become a popular tool to build and fit 3D face models to images. Critical to the success of MMs is the ability to build a generic 3D face model. Major limitations in the MMs building process are: (1) collecting 3D data usually involves the use of expensive laser scans and complex capture setups, (2) the number of available 3D databases is limited, and typically there is a lack of expression variability and (3) finding correspondences and registering the 3D model is a labor intensive and error prone process.  相似文献   

10.
In recent years because of substantial use of wireless sensor network the distributed estimation has attracted the attention of many researchers. Two popular learning algorithms: incremental least mean square (ILMS) and diffusion least mean square (DLMS) have been reported for distributed estimation using the data collected from sensor nodes. But these algorithms, being derivative based, have a tendency of providing local minima solution particularly for minimization of multimodal cost function. Hence for problems like distributed parameters estimation of IIR systems, alternative distributed algorithms are required to be developed. Keeping this in view the present paper proposes two population based incremental particle swarm optimization (IPSO) algorithms for estimation of parameters of noisy IIR systems. But the proposed IPSO algorithms provide poor performance when the measured data is contaminated with outliers in the training samples. To alleviate this problem the paper has proposed a robust distributed algorithm (RDIPSO) for IIR system identification task. The simulation results of benchmark IIR systems demonstrate that the proposed algorithms provide excellent identification performance in all cases even when the training samples are contaminated with outliers.  相似文献   

11.
Knowledge discovery in databases using lattices   总被引:3,自引:0,他引:3  
The rapid pace at which data gathering, storage and distribution technologies are developing is outpacing our advances in techniques for helping humans to analyse, understand, and digest the vast amounts of resulting data. This has led to the birth of knowledge discovery in databases (KDD) and data mining—a process that has the goal to selectively extract knowledge from data. A range of techniques, including neural networks, rule-based systems, case-based reasoning, machine learning, statistics, etc. can be applied to the problem. We discuss the use of concept lattices, to determine dependences in the data mining process. We first define concept lattices, after which we show how they represent knowledge and how they are formed from raw data. Finally, we show how the lattice-based technique addresses different processes in KDD, especially visualization and navigation of discovered knowledge.  相似文献   

12.
银行系统中使用的数据库类型和数量逐渐增多,这对银行系统的数据库运维提出了更高的要求.通过研究建立统一的数据库集中运维管理平台,实现了异构数据库的整合运维,节约了数据库运维管理的人力投入和厂商资源成本,达到了主动和预防性的数据库运维模式,提高了数据库运维管理的效率及事件处理的时效性,提升了银行系统的稳定性.  相似文献   

13.
Development processes in engineering disciplines are inherently complex. Throughout the development process, the system to be built is modeled from different perspectives, on different levels of abstraction, and with different intents. Since state-of-the-art development processes are highly incremental and iterative, models of the system are not constructed in one shot; rather, they are extended and improved repeatedly. Furthermore, models are related by manifold dependencies and need to be maintained mutually consistent with respect to these dependencies. Thus, tools are urgently needed which assist developers in maintaining consistency between inter-dependent and evolving models. These tools have to operate incrementally, i.e., they have to propagate changes performed on one model into related models which are affected by these changes. In addition, they need to support user interactions in settings where the effects of changes cannot be determined automatically and deterministically. We present an algorithm for incremental and interactive consistency maintenance which meets these requirements. The algorithm is based on graphs, which are used as the data model for representing the models to be integrated, and graph transformation rules, which describe the modifications of the graphs to be performed on a high level of abstraction. This paper is an extended version of [6].  相似文献   

14.
Incremental mining has attracted the attention of many researchers due to its usefulness in online applications. Many algorithms have thus been proposed for incrementally mining frequent itemsets. Maintaining a frequent-itemset lattice (FIL) is difficult for databases with large numbers of frequent itemsets, especially huge databases, due to the storage of links of nodes in the lattice. However, generating association rules from a FIL has been shown to be more effective than traditional methods such as directly generating rules from frequent itemsets or frequent closed itemsets. Therefore, when the number of frequent itemsets is not huge (i.e., they can be stored in the lattice without excessive memory overhead), the lattice-based approach outperforms approaches which mine association rules from frequent itemsets/frequent closed itemsets. However, incremental algorithms for building FILs have not yet been proposed. This paper proposes an effective approach for the maintenance of a FIL based on the pre-large concept in incremental mining. The building process of a FIL is first improved using two proposed theorems regarding the paternity relation between two nodes in the lattice. An effective approach for maintaining a FIL with dynamically inserted data is then proposed based on the pre-large and the diffset concepts. The experimental results show that the proposed approach outperforms the batch approach for building a FIL in terms of execution time.  相似文献   

15.
Eliminating technical obstacles in innovation pipelines using CAIs   总被引:1,自引:0,他引:1  
Ill-structured problems are difficult types to solve. When this type of problem is faced in an innovation pipeline, a technical obstacle emerges. Comparison between the definitions of ill-structured and inventive problems in theory of inventive problem solving (TRIZ) shows that the latter is a sub-set of the former in engineering. As a result, computer-aided innovation (CAI) systems (CAIs) based on TRIZ can be applied to solve some ill-structured problems that appear in an innovation pipeline. A model including two technical obstacles is developed for an innovation pipeline, and a case study is carried out to show how to eliminate the technical obstacles using the model.  相似文献   

16.
Data mining can be defined as a process for finding trends and patterns in large data. An important technique for extracting useful information, such as regularities, from usually historical data, is called as association rule mining. Most research on data mining is concentrated on traditional relational data model. On the other hand, the query flocks technique, which extends the concept of association rule mining with a ‘generate-and-test’ model for different kind of patterns, can also be applied to deductive databases. In this paper, query flocks technique is extended with view definitions including recursive views. Although in our system query flock technique can be applied to a data base schema including both the intensional data base (IDB) or rules and the extensible data base (EDB) or tabled relations, we have designed an architecture to compile query flocks from datalog into SQL in order to be able to use commercially available data base management systems (DBMS) as an underlying engine of our system. However, since recursive datalog views (IDB's) cannot be converted directly into SQL statements, they are materialized before the final compilation operation. On this architecture, optimizations suitable for the extended query flocks are also introduced. Using the prototype system, which is developed on a commercial database environment, advantages of the new architecture together with the optimizations, are also presented.  相似文献   

17.
In the context of a federated database containing similar information in local component databases, we consider the problem of discovering the different local representations which database designers might use to map the same entity. An extension of the normal SQL system catalog to hold additional domain metadata is discussed. This enables common local mappings to be discovered and allows SQL retrieval queries expressed in terms of the logical federated schema to be transformed to match the appropriate local database schema. The result is a practical means of coping with differing mappings in federated databases without having to resort to special higher order languages or a layer of information mediation middleware. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

18.
The rapid growth of life science databases demands the fusion of knowledge from heterogeneous databases to answer complex biological questions. The discrepancies in nomenclature, various schemas and incompatible formats of biological databases, however, result in a significant lack of interoperability among databases. Therefore, data preparation is a key prerequisite for biological database mining. Integrating diverse biological molecular databases is an essential action to cope with the heterogeneity of biological databases and guarantee efficient data mining. However, the inconsistency in biological databases is a key issue for data integration. This paper proposes a framework to detect the inconsistency in biological databases using ontologies. A numeric estimate is provided to measure the inconsistency and identify those biological databases that are appropriate for further mining applications. This aids in enhancing the quality of databases and guaranteeing accurate and efficient mining of biological databases.  相似文献   

19.
Data cubes capture general trends aggregated from multidimensional data from a categorical relation. When provided with two relations, interesting knowledge can be exhibited by comparing the two underlying data cubes. Trend reversals or particular phenomena irrelevant in one data cube may indeed clearly appear in the other data cube. In order to capture such trend reversals, we have proposed the concept of Emerging Cube. In this article, we emphasize on two new approaches for computing Emerging Cubes. Both are devised to be integrated within standard Olap systems, since they do not require any additional nor complex data structures. Our first approach is based on Sql. We propose three queries with different aims. The most efficient query uses a particular data structure merging the two input relations to achieve a single data cube computation. This query works fine even when voluminous data are processed. Our second approach is algorithmic and aims to improve efficiency and scalability while preserving integration capability. The E-Idea algorithm works a´ laBuc and takes the specific features of Emerging Cubes into account. E-Idea is automaton-based and adapts its behavior to the current execution context. Our proposals are validated by various experiments where we measure query response time. Comparative experiments show that E-Idea’s response time is proportional to the size of the Emerging Cube. Experiments also demonstrate that extracting Emerging Cubes can be computed in practice, in a time compatible with user expectations.  相似文献   

20.
Distributed databases allow us to integrate data from different sources which have not previously been combined. The Dempster–Shafer theory of evidence and evidential reasoning are particularly suited to the integration of distributed databases. Evidential functions are suited to represent evidence from different sources. Evidential reasoning is carried out by the well‐known orthogonal sum. Previous work has defined linguistic summaries to discover knowledge by using fuzzy set theory and using evidence theory to define summaries. In this paper we study linguistic summaries and their applications to knowledge discovery in distributed databases. © 2000 John Wiley & Sons, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号