首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
随着面向对象编程技术的推广和Web技术的广泛应用,迫切需要在数据库领域提供对这些新技术的高效支持,而传统的关系数据库由于其本身的局限性在这些方面表现不佳。针对这一问题,并结合对象数据库及XML数据库的发展现状,提出了一种基于应用驱动的多接口混合型后关系数据库体系结构的概念模型,并在理论上加以验证。  相似文献   

2.
With the era of data explosion coming, multidimensional visualization, as one of the most helpful data analysis technologies, is more frequently applied to the tasks of multidimensional data analysis. Correlation analysis is an efficient technique to reveal the complex relationships existing among the dimensions in multidimensional data. However, for the multidimensional data with complex dimension features,traditional correlation analysis methods are inaccurate and limited. In this paper, we introduce the improved Pearson correlation coefficient and mutual information correlation analysis respectively to detect the dimensions’ linear and non-linear correlations. For the linear case,all dimensions are classified into three groups according to their distributions. Then we correspondingly select the appropriate parameters for each group of dimensions to calculate their correlations. For the non-linear case,we cluster the data within each dimension. Then their probability distributions are calculated to analyze the dimensions’ correlations and dependencies based on the mutual information correlation analysis. Finally,we use the relationships between dimensions as the criteria for interactive ordering of axes in parallel coordinate displays.  相似文献   

3.
A new multidimensional data structure, multidimensional tree (MD-tree), is proposed. The MD-tree is developed by extending the concept of the B-tree to the multidimensional data, so that the MD-tree is a height balanced tree similar to the B-tree. The theoretical worst-case storage utilization is guaranteed to hold more than 66.7% (2/3) of full capacity. The structure of the MD-tree and the algorithms to perform the insertion, deletion, and spatial searching are described. By the series of simulation tests, the performances of the MD-tree and conventional methods are compared. The results indicate that storage utilization is more than 80% in practice, and that retrieval performance and dynamic characteristics are superior to conventional methods  相似文献   

4.
Innovations in Systems and Software Engineering - Materialized views are heavily used to speed up the query response time of any data centric application. In the literature, the construction and...  相似文献   

5.
Traffic congestion is a major concern in many cities around the world. Previous work mainly focuses on the prediction of congestion and analysis of traffic flows, while the congestion correlation between road segments has not been studied yet. In this paper, we propose a three-phase framework to explore the congestion correlation between road segments from multiple real world data. In the first phase, we extract congestion information on each road segment from GPS trajectories of over 10,000 taxis, define congestion correlation and propose a corresponding mining algorithm to find out all the existing correlations. In the second phase, we extract various features on each pair of road segments from road network and POI data. In the last phase, the results of the first two phases are input into several classifiers to predict congestion correlation. We further analyze the important features and evaluate the results of the trained classifiers through experiments. We found some important patterns that lead to a high/low congestion correlation, and they can facilitate building various transportation applications. In addition, we found that traffic congestion correlation has obvious directionality and transmissibility. The proposed techniques in our framework are general, and can be applied to other pairwise correlation analysis.  相似文献   

6.
Correlation analysis is regarded as a significant challenge in the mining of multidimensional data streams. Great emphasis is generally placed on one-dimensional data streams with the existing correlation analysis methods for the mining of data streams. Therefore, the identification of underlying correlation among multivariate arrays (e.g. Sensor data) has long been ignored. The technique of canonical correlation analysis (CCA) has rarely been applied in multidimensional data streams. In this study, a novel correlation analysis algorithm based on CCA, called ApproxCCA, is proposed to explore the correlations between two multidimensional data streams in the environment with limited resources. By introducing techniques of unequal probability sampling and low-rank approximation to reduce the dimensionality of the product matrix composed by the sample covariance matrix and sample variance matrix, ApproxCCA successfully improves computational efficiency while ensuring the analytical precision. Experimental results of synthetic and real data sets have indicated that the computational bottleneck of traditional CCA can be overcome with ApproxCCA, and the correlations between two multidimensional data streams can also be detected accurately.  相似文献   

7.
The most effective technique to enhance performances of multidimensional databases consists in materializing redundant aggregates called views. In the classical approach to materialization, each view includes all and only the measures of the cube it aggregates. In this paper we investigate the benefits of materializing views in vertical fragments, aimed at minimizing the workload response time. We formalize the fragmentation problem as a 0–1 integer linear programming problem, which is then solved by means of a standard integer programming solver to determine the optimal fragmentation for a given workload. Finally, we demonstrate the usefulness of fragmentation by presenting a large set of experimental results based on the TPC-H benchmark.  相似文献   

8.
Using mobile brokerage service as an example, we propose and test a multidimensional and hierarchical model of mobile service (m-service) quality using a sample of 338 respondents from the two largest m-service providers in China: China Mobile and China Unicom. Through three-stage validation, we are able to confirm all three levels of the proposed hierarchical structure where a customer’s perceived m-service quality includes primary dimensions of interaction, outcome, and environment qualities. Each primary dimension further has its sub-dimensions. Our empirical results also show that corporate image moderates the effects of environment and outcome qualities on the service quality. Our proposed model provides implications for future research on mobile commerce.  相似文献   

9.
A new method for multi-dimensional distribution analysis using a data compression technique applied to the knowledge-based mean-force potentials between residues for the analysis of protein sequence-structure compatibility performs much better than that of conventional 1D distance-based potentials derived from binned distributions.  相似文献   

10.
By introducing a form of reorder for multidimensional data, we propose a unified fast algo-rithm that jointly employs one-dimensional W transform and multidimensional discrete polynomial trans-form to compute eleven types of multidimensional discrete orthogonal transforms, which contain three types of m-dimensional discrete cosine transforms ( m-D DCTs) ,four types of m-dimensional discrete W transforms ( m-D DWTs) ( m-dimensional Hartley transform as a special case), and four types of generalized discrete Fourier transforms ( m-D GDFTs). For real input, the number of multiplications for all eleven types of the m-D discrete orthogonal transforms needed by the proposed algorithm are only 1/m times that of the commonly used corresponding row-column methods, and for complex input, it is further reduced to 1/(2m) times. The number of additions required is also reduced considerably. Furthermore, the proposed algorithm has a simple computational structure and is also easy to be im-plemented on computer, and th  相似文献   

11.
An algorithm that fits a continuous function to sparse multidimensional data is presented. The algorithm uses a representation in terms of lower-dimensional component functions of coordinates defined in an automated way and also permits dimensionality reduction. Neural networks are used to construct the component functions.

Program summary

Program title: RS_HDMR_NNCatalogue identifier: AEEI_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEI_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 19 566No. of bytes in distributed program, including test data, etc.: 327 856Distribution format: tar.gzProgramming language: MatLab R2007bComputer: any computer running MatLabOperating system: Windows XP, Windows Vista, UNIX, LinuxClassification: 4.9External routines: Neural Network Toolbox Version 5.1 (R2007b)Nature of problem: Fitting a smooth, easily integratable and differentiatable, function to a very sparse (∼2-3 points per dimension) multidimensional (D?6) large (∼104-105 data) dataset.Solution method: A multivariate function is represented as a sum of a small number of terms each of which is a low-dimensional function of optimised coordinates. The optimal coordinates reduce both the dimensionality and the number of the terms. Neural networks (including exponential neurons) are used to obtain a general and robust method and a functional form which is easily differentiated and integrated (in the case of exponential neurons).Running time: Depends strongly on the dataset to be modelled and the chosen structure of the approximating function, ranges from about a minute for ∼103 data in 3-D to about a day for ∼105 data in 15-D.  相似文献   

12.

在人工智能领域,ChatGPT作为一种重要的技术突破,引起了广泛的关注. 本文将探讨ChatGPT在人工智能发展中的地位及其对未来AI的影响. 首先,介绍了ChatGPT所展现出的优秀对话生成能力,使其几乎可以胜任所有自然语言处理任务,并将作为数据生成器、知识挖掘工具、模型调度员、自然交互界面在各种场景得到应用. 接着,分析了其在事实错误、毒害内容生成、安全性、公平性、可解释性、数据隐私等方面的局限,并讨论了作为辅助人类工具的ChatGPT明确能力边界和提高能力范围的重要性. 然后,从概念经典表示对“真”定义进行了分析,并从概念三指不等价的角度阐释性了ChatGPT无法区分真假的原因. 在论述AI未来时,从拓展应用、克服局限、探索理论分析了中短期技术趋势,并从感知、认知、情感、行为智能四个层面的关系讨论了长期发展路径. 最后,探讨了ChatGPT作为认知智能的代表,对包括认知成本、教育要求、图灵测试认识、学术界的机遇与挑战、信息茧房、能源环境问题和生产力提升等方面可能产生的影响.

  相似文献   

13.
目的 图像协同分割技术是通过多幅参考图像以实现前景目标与背景区域的分离,并已被广泛应用于图像分类和目标识别等领域中。不过,现有多数的图像协同分割算法只适用于背景变化较大且前景几乎不变的环境。为此,提出一种新的无监督协同分割算法。方法 本文方法是无监督式的,在分级图像分割的基础上通过渐进式优化框架分别实现前景和背景模型的更新估计,同时结合图像内部和不同图像之间的分级区域相似度关联进一步增强上述模型估计的鲁棒性。该无监督的方法不需要进行预先样本学习,能够同时处理两幅或多幅图像且适用于同时存在多个前景目标的情况,并且能够较好地适应前景物体类的变化。结果 通过基于iCoseg和MSRC图像集的实验证明,该算法无需图像间具有显著的前景和背景差异这一约束,与现有的经典方法相比更适用于前景变化剧烈以及同时存在多个前景目标等更为一般化的图像场景中。结论 该方法通过对分级图像分割得到的超像素外观分布分别进行递归式估计来实现前景和背景的有效区分,并同时融合了图像内部以及不同图像区域之间的区域关联性来增加图像前景和背景分布估计的一致性。实验表明当前景变化显著时本文方法相比于现有方法具有更为鲁棒的表现。  相似文献   

14.
Successful data warehouse (DW) design needs to be based upon a requirement analysis phase in order to adequately represent the information needs of DW users. Moreover, since the DW integrates the information provided by data sources, it is also crucial to take these sources into account throughout the development process to obtain a consistent reconciliation of data sources and information needs. In this paper, we start by summarizing our approach to specify user requirements for data warehouses and to obtain a conceptual multidimensional model capturing these requirements. Then, we make use of the multidimensional normal forms to define a set of Query/View/Transformation (QVT) relations to assure that the conceptual multidimensional model obtained from user requirements agrees with the available data sources that will populate the DW. Thus, we propose a hybrid approach to develop DWs, i.e., we firstly obtain the conceptual multidimensional model of the DW from user requirements and then we verify and enforce its correctness against data sources by using a set of QVT relations based on multidimensional normal forms. Finally, we provide some snapshots of the CASE tool we have used to implement our QVT relations.  相似文献   

15.
Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided.  相似文献   

16.
Management of multidimensional discrete data   总被引:1,自引:0,他引:1  
Spatial database management involves two main categories of data: vector and raster data. The former has received a lot of in-depth investigation; the latter still lacks a sound framework. Current DBMSs either regard raster data as pure byte sequences where the DBMS has no knowledge about the underlying semantics, or they do not complement array structures with storage mechanisms suitable for huge arrays, or they are designed as specialized systems with sophisticated imaging functionality, but no general database capabilities (e.g., a query language). Many types of array data will require database support in the future, notably 2-D images, audio data and general signal-time series (1-D), animations (3-D), static or time-variant voxel fields (3-D and 4-D), and the ISO/IEC PIKS (Programmer's Imaging Kernel System) BasicImage type (5-D). In this article, we propose a comprehensive support ofmultidimensional discrete data (MDD) in databases, including operations on arrays of arbitrary size over arbitrary data types. A set of requirements is developed, a small set of language constructs is proposed (based on a formal algebraic semantics), and a novel MDD architecture is outlined to provide the basis for efficient MDD query evaluation.  相似文献   

17.
Successful data warehouse (DW) design needs to be based upon a requirement analysis phase in order to adequately represent the information needs of DW users. Moreover, since the DW integrates the information provided by data sources, it is also crucial to take these sources into account throughout the development process to obtain a consistent reconciliation of data sources and information needs. In this paper, we start by summarizing our approach to specify user requirements for data warehouses and to obtain a conceptual multidimensional model capturing these requirements. Then, we make use of the multidimensional normal forms to define a set of Query/View/Transformation (QVT) relations to assure that the conceptual multidimensional model obtained from user requirements agrees with the available data sources that will populate the DW. Thus, we propose a hybrid approach to develop DWs, i.e., we firstly obtain the conceptual multidimensional model of the DW from user requirements and then we verify and enforce its correctness against data sources by using a set of QVT relations based on multidimensional normal forms. Finally, we provide some snapshots of the CASE tool we have used to implement our QVT relations.  相似文献   

18.
19.
The problem of computing the empirical cumulative distribution function (ECDF) of N points in k-dimensional space has been studied and motivated recently by Bentley [1], whose solution uses recursive multidimensional divide-and-conquer. In this paper, the problem is treated as a generalization of the problem of computing the inversion of a permutation. An algorithm of Knuth [3] is then extended to yield an O(kN(log2N)k?1) solution to the ECDF problem, which is comparable to Bentley's solution. Neither solution approaches the O(kN log2N) lower bound, and they are worse than the O(kN2) ‘brute force’ algorithm for large k. The new algorithm, however, has the advantage of being highly parallel so that fast solution exists with parallel processors.  相似文献   

20.
Existing models for cluster analysis typically consist of a number of attributes that describe the objects to be partitioned and one single latent variable that represents the clusters to be identified. When one analyzes data using such a model, one is looking for one way to cluster data that is jointly defined by all the attributes. In other words, one performs unidimensional clustering. This is not always appropriate. For complex data with many attributes, it is more reasonable to consider multidimensional clustering, i.e., to partition data along multiple dimensions. In this paper, we present a method for performing multidimensional clustering on categorical data and show its superiority over unidimensional clustering.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号