首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an /spl epsi/-approximate summary can be maintained so that, given a quantile query (/spl phi/,/spl epsi/), the data item at rank /spl lceil//spl phi/N/spl rceil/ may be approximately obtained within the rank error precision /spl epsi/N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different /spl phi/ and /spl epsi/ poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.  相似文献   

2.
In this paper, we present a novel approach for multimedia data indexing and retrieval that is machine independent and highly flexible for sharing multimedia data across applications. Traditional multimedia data indexing and retrieval problems have been attacked using the central data server as the main focus, and most of the indexing and query-processing for retrieval are highly application dependent. This precludes the use of created indices and query processing mechanisms for multimedia data which, in general, have a wide variety of uses across applications. The approach proposed in this paper addresses three issues: 1. multimedia data indexing; 2. inference or query processing; and 3. combining indices and inference or query mechanism with the data to facilitate machine independence in retrieval and query processing. We emphasize the third issue, as typically multimedia data are huge in size and requires intra-data indexing. We describe how the proposed approach addresses various problems faced by the application developers in indexing and retrieval of multimedia data. Finally, we present two applications developed based on the proposed approach: video indexing; and video content authorization for presentation.  相似文献   

3.
4.
The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building block of a multimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems; however it is a new research area for multimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this paper, a query optimization strategy is proposed for processing spatio-temporal queries in video database systems. The proposed strategy includes reordering algorithms to be applied on query execution tree. The performance results obtained by testing the reordering algorithms on different query sets are also presented.  相似文献   

5.
Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and representation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representation of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining techniques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The review includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.  相似文献   

6.
Query languages for multi-sensor data sources are generally dealing with spatial–temporal data that in many applications are of geographical type. Such applications are quite often concerned with dynamic activities where the collected sensor data are streaming in from multiple sensors. Data uncertainty is one of the most important issues, which the query language must deal with. Other aspects of concern are sensor data fusion but also association of multiple object observations. Demonstration of the dynamic aspects are generally difficult as scenarios in real-time cannot easily be set up, tested and run realistically. To overcome this problem the query language sigma query language (ΣQL) has been attached to a simulation framework. Together with this framework scenarios can be set up to form the basis for test and dynamic illustration of the query language. Eventually the query language can be used to support decision making as well. Within the simulation framework input data are coming from sensor models that eventually can be replaced by data from real sensors. Services can be integrated with the information system, used for various purposes and supported by the various capabilities of the query language. A consequence of this approach is that the information delivered by the services, including the query language, can be used as input to an operational picture that eventually can be used to demonstrate on-going dynamic processes. In this work, an extension to ΣQL, called VisualΣQL, will be discussed together with some other relevant services useful in dynamic situations as complements to the query language. Furthermore, the use of the system will be illustrated and discussed by means of a scenario that has been run in the simulation environment.  相似文献   

7.

The continuous k-nearest neighbor query is one of the most important query types to share multimedia data or to continuously identify transportable users in LBS. Various methods have been proposed to efficiently process the continuous k-NN query. However, most of the existing methods suffer from high computation time and larger memory requirement because they unnecessarily access cells to find the nearest cells on a grid index. Furthermore, most methods do not consider the movement of a query. In this paper, we propose a new processing scheme to process the continuous k nearest neighbor query for efficiently support multimedia data sharing and transmission in LBS. The proposed method uses the patterns of the distance relationships among the cells in a grid index. The basic idea is to normalize the distance relationships as certain patterns. Using this approach, the proposed scheme significantly improves the overall performance of the query processing. It is shown through various experiments that our proposed method outperforms the existing methods in terms of query processing time and storage overhead.

  相似文献   

8.
Indexing high-dimensional data for efficient in-memory similarity search   总被引:3,自引:0,他引:3  
In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., >32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. We present a novel index structure, called /spl Delta/-tree, to speed up the high-dimensional query in main memory environment. The /spl Delta/-tree is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using principal component analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called /spl Delta//sup +/-tree, that globally clusters the data space and then partitions clusters into small regions. The /spl Delta//sup +/-tree can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the /spl Delta//sup +/-tree is superior in most cases.  相似文献   

9.
A multimedia presentation is a synchronized, and possibly interactive, delivery of multimedia data to users. We expect that, in the future, multimedia presentations will be stored into and queried from multimedia databases. In an earlier work, we have designed a graphical query language, called GVISUAL, that allows users to query multimedia presentations based on content information. In this paper, we discuss GVISUAL query processing techniques for multimedia presentations. More specifically, we discuss the translation of GVISUAL queries into an operator-based language, called O-Algebra, with three new operators, and efficient implementations of the new O-Algebra operators using a coding system called nodecodes.  相似文献   

10.
Spatial databases-accomplishments and research needs   总被引:8,自引:0,他引:8  
Spatial databases, addressing the growing data management and analysis needs of spatial applications such as geographic information systems, have been an active area of research for more than two decades. This research has produced a taxonomy of models for space, spatial data types and operators, spatial query languages and processing strategies, as well as spatial indexes and clustering techniques. However, more research is needed to improve support for network and field data, as well as query processing (e.g., cost models, bulk load). Another important need is to apply spatial data management accomplishments to newer applications, such as data warehouses and multimedia information systems. The objective of this paper is to identify recent accomplishments and associated research needs of the near term  相似文献   

11.
Cloud computing techniques take the form of distributed computing by utilizing multiple computers to execute computing simultaneously on the service side. To process the increasing quantity of multimedia data, numerous large-scale multimedia data storage computing techniques in the cloud computing have been developed. Of all the techniques, Hadoop plays a key role in the cloud computing. Hadoop, a computing cluster formed by low-priced hardware, can conduct the parallel computing of petabytes of multimedia data. Hadoop features high-reliability, high-efficiency, and high-scalability. The numerous large-scale multimedia data computing techniques include not only the key core techniques, Hadoop and MapReduce, but also the data collection techniques, such as File Transfer Protocol and Flume. In addition, distributed system configuration allocation, automatic installation, and monitoring platform building and management techniques are all included. As a result, only with the integration of all the techniques, a reliable large-scale multimedia data platform can be offered. In this paper, we introduce how cloud computing can make a breakthrough by proposing a multimedia social network dataset on Hadoop platform and implementing a prototype version. Detailed specifications and design issues are discussed as well. An important finding of this article is that we can save more time if we conduct the multimedia social networking analysis using Cloud Hadoop Platform rather than using a single computer. The advantages of cloud computing over the traditional data processing practices are fully demonstrated in this article. The applicable framework designs and the tools available for the large-scale data processing are also proposed. We show the experimental multimedia data including data sizes and processing time.  相似文献   

12.
Uncertain data are data with uncertainty information,which exist widely in database applications.In recent years,uncertainty in data has brought challenges in almost all database management areas such as data modeling,query representation,query processing,and data mining.There is no doubt that uncertain data management has become a hot research topic in the field of data management.In this study,we explore problems in managing uncertain data,present state-of-the-art solutions,and provide future research directions in this area.The discussed uncertain data management techniques include data modeling,query processing,and data mining in uncertain data in the forms of relational,XML,graph,and stream.  相似文献   

13.
It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms.  相似文献   

14.
Adaptable similarity queries based on quadratic form distance functions are widely popular in data mining application domains including multimedia, CAD, molecular biology or medical image databases. Recently it has been recognized that quantization of feature vectors can substantially improve query processing for Euclidean distance functions, as demonstrated by the scan-based VA-file and the index structure IQ-tree. In this paper, we address the problem that determining quadratic form distances between quantized vectors is difficult and computationally expensive. Our solution provides a variety of new approximation techniques for quantized vectors which are combined by an extended multistep query processing architecture. In our analysis section, we show that the filter steps complement each other. Consequently, it is useful to apply our filters in combination. We show the superiority of our approach over other architectures and over competitive query processing methods. In our experimental evaluation, the sequential scan is outperformed by a factor of 2.3. Compared to the X-tree on 64 dimensional color histogram data, we measured an improvement factor of 5.7.  相似文献   

15.
Providing integrated access to multiple, distributed, heterogeneous databases and other information sources has become one of the leading issues in database research and the industry. One of the most effective approaches is to extract and integrate information of interest from each source in advance and store them in a centralized repository (known as a data warehouse). When a query is posed, it is evaluated directly at the warehouse without accessing the original information sources. One of the techniques that this approach uses to improve the efficiency of query processing is materialized view(s). Essentially, materialized views are used for data warehouses, and various methods for relational databases have been developed. In this paper, we first discuss an object deputy approach to realize materialized object views for data warehouses which can also incorporate object-oriented databases. A framework has been developed using Smalltalk to prepare data for data warehousing, in which an object deputy model and database connecting tools have been implemented. The object deputy model can provide an easy-to-use way to resolve inconsistency and conflicts while preparing data for data warehousing, as evidenced by our empirical study.  相似文献   

16.
We describe two scenarios of user tasks in which access to multimedia data plays a significant role. Because current multimedia databases cannot support these tasks, we introduce three new requirements on multimedia databases: multimedia objects should be active objects, querying is an interaction process, and query processing uses multiple representations. We discuss three techniques to handle multimedia objects as active objects. Also, we introduce a promising database architecture to meet the new user requirements. Agents within the database handle objects' representations, and a search engine on top of a conventional database handles relevance feedback and multiple representations.  相似文献   

17.
The internet revolution has made information acquisition easy and cheap and has been producing massive high-dimensional multimedia data, including text, audio, images, animation, video, etc. High-dimensional multimedia data bring new opportunities to modern society and challenges to researchers of the multimedia domain as well. The goal of this special issue is bridging the gap between machine learning methods and the real requirement of high-dimensional multimedia domain, aiming at gaining insight into the relationship between the current multimedia and the past ones, and also accurately predicting the future trends of multimedia data. Specifically, this special issue targeted the most recent technical progresses on learning techniques for high-dimensional multimedia data  相似文献   

18.
In recent years, the availability of complex data repositories (e.g., multimedia, genomic, semistructured databases) has paved the way to new potentials as to data querying. In this scenario, similarity and fuzzy techniques have proven to be successful principles for effective data retrieval. However, most proposals are domain specific and lack of a general and integrated approach to deal with generalized complex queries, i.e., queries where multiple conditions are expressed, possibly on complex as well as on traditional data. To overcome such limitations, much work has been devoted to the development of middleware systems to support query processing on multiple repositories. On a similar line, We present a formal framework to permeate complex similarity and fuzzy queries within a relational database system. As an example, we focus on multimedia data, which is represented in an integrated view with common database data. We have designed an application layer that relies on an algebraic query language, extended with MM-tailored operators, and that maps complex similarity and fuzzy queries to standard SQL statements that can be processed by a relational database system, exploiting standard facilities of modern extensible RDBMS. To show the applicability of our proposal, we implemented a prototype that provides the user with rich query capabilities, ranging from traditional database queries to complex queries gathering a mixture of Boolean, similarity, and fuzzy predicates on the data.  相似文献   

19.
This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that support fault tolerance and load balancing. The proposed algorithms process point and range queries over the multidimensional indexed space in only O(log n) hops in expectance, where n is the network size. For nearest neighbor queries, two processing alternatives are discussed. The first, termed eager processing, has low latency (expected value of O(log n) hops) but may involve a large number of peers. The second, termed iterative processing, has higher latency (expected value of O(log2 n) hops) but involves far fewer peers. A detailed experimental evaluation demonstrates that our query processing techniques outperform existing methods for settings involving real spatial data as well as in the case of high dimensional synthetic data.  相似文献   

20.
Unstructured peer-to-peer infrastructure has been widely employed to support large-scale distributed applications. Many of these applications, such as location-based services and multimedia content distribution, require the support of range selection queries. Under the widely-adopted query shipping protocols, the cost of query processing is affected by the number of result copies or replicas in the system. Since range queries can return results that include poorly-replicated data items, the cost of these queries is usually dominated by the retrieval cost of these data items. In this work, we propose a popularity-aware prefetch-based approach that can effectively facilitate the caching of poorly-replicated data items that are potentially requested in subsequent range queries, resulting in substantial cost savings. We prove that the performance of retrieving poorly-replicated data items is guaranteed to improve under an increasing query load. Extensive experiments show that the overall range query processing cost decreases significantly under various query load settings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号