首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we identify a novel and interesting type of queries, contextual ranking queries, which return the ranks of query tuples among some context tuples given in the queries. Contextual ranking queries are useful for olap and decision support applications in non-traditional data exploration. They provide a mechanism to quickly identify where tuples stand within the context. In this paper, we extend the sql language to express contextual ranking queries and propose a general partition-based framework for processing them. In this framework, we use a novel method that utilizes bitmap indices built on ranking functions. This method can efficiently identify a small number of candidate tuples, thus achieves lower cost than alternative methods. We analytically investigate the advantages and drawbacks of these methods, according to a preliminary cost model. Experimental results suggest that the algorithm using bitmap indices on ranking functions can be substantially more efficient than other methods.  相似文献   

2.
3.
4.
A key problem of retrieving, integrating and mining rich and high quality information from massive Deep Web Databases (WDBs) online is how to automatically and effectively discover and recognize domain-specific WDBs’ entry points, i.e., forms, in the Web. It has been a challenging task because domain-specific WDBs’ forms with dynamic and heterogeneous properties are very sparsely distributed over several trillion Web pages. Although significant efforts have been made to address the problem and its special cases, more effective solutions remain to be further explored towards achieving both the satisfactory harvest rate and coverage rate of domain-specific WDBs’ forms simultaneously. In this paper, an Enhanced Form-Focused Crawler for domain-specific WDBs (E-FFC) has been proposed as a novel framework to address existing solutions’ limitations. The E-FFC, based on the divide and conquer strategy, employs a series of novel and effective strategies/algorithms, including a two-step page classifier, a link scoring strategy, classifiers for advanced searchable and domain-specific forms, crawling stopping criteria, etc. to its end achieving the optimized harvest rate and coverage rate of domain-specific WDBs’ forms simultaneously. Experiments of the E-FFC over a number of real Web pages in a set of representative domains have been conducted and the results show that the E-FFC outperforms the existing domain-specific Deep Web Form-Focused Crawlers in terms of the harvest rate, coverage rate and crawling robustness.  相似文献   

5.
K.  Wen-Syan  M.   《Data & Knowledge Engineering》2000,35(3):259-298
Since media-based evaluation yields similarity values, results to a multimedia database query, Q(Y1,…,Yn), is defined as an ordered list SQ of n-tuples of the form X1,…,Xn. The query Q itself is composed of a set of fuzzy and crisp predicates, constants, variables, and conjunction, disjunction, and negation operators. Since many multimedia applications require partial matches, SQ includes results which do not satisfy all predicates. Due to the ranking and partial match requirements, traditional query processing techniques do not apply to multimedia databases. In this paper, we first focus on the problem of “given a multimedia query which consists of multiple fuzzy and crisp predicates, providing the user with a meaningful final ranking”. More specifically, we study the problem of merging similarity values in queries with multiple fuzzy predicates. We describe the essential multimedia retrieval semantics, compare these with the known approaches, and propose a semantics which captures the requirements of multimedia retrieval problem. We then build on these results in answering the related problem of “given a multimedia query which consists of multiple fuzzy and crisp predicates, finding an efficient way to process the query.” We develop an algorithm to efficiently process queries with unordered fuzzy predicates (sub-queries). Although this algorithm can work with different fuzzy semantics, it benefits from the statistical properties of the semantics proposed in this paper. We also present experimental results for evaluating the proposed algorithm in terms of quality of results and search space reduction.  相似文献   

6.
Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Moreover, we tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. Furthermore, we study three interesting and useful aggregate PIR queries, that is, MAX, top-m, and AVG? PIRs. Moreover, we also study an important query type, PIR with uncertain query object (namely UQ-PIR), and design specific rules to facilitate the pruning. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings.  相似文献   

7.
Efficient fuzzy ranking queries in uncertain databases   总被引:1,自引:1,他引:0  
Recently, uncertain data have received dramatic attention along with technical advances on geographical tracking, sensor network and RFID etc. Also, ranking queries over uncertain data has become a research focus of uncertain data management. With dramatically growing applications of fuzzy set theory, lots of queries involving fuzzy conditions appear nowadays. These fuzzy conditions are widely applied for querying over uncertain data. For instance, in the weather monitoring system, weather data are inherent uncertainty due to some measurement errors. Weather data depicting heavy rain are desired, where ??heavy?? is ambiguous in the fuzzy query. However, fuzzy queries cannot ensure returning expected results from uncertain databases. In this paper, we study a novel kind of ranking queries, Fuzzy Ranking queries (FRanking queries) which extend the traditional notion of ranking queries. FRanking queries are able to handle fuzzy queries submitted by users and return k results which are the most likely to satisfy fuzzy queries in uncertain databases. Due to fuzzy query conditions, the ranks of tuples cannot be evaluated by existing ranking functions. We propose Fuzzy Ranking Function to calculate tuples?? ranks in uncertain databases for both attribute-level and tuple-level uncertainty models. Our ranking function take both the uncertainty and fuzzy semantics into account. FRanking queries are formally defined based on Fuzzy Ranking Function. In the processing of answering FRanking queries, we present a pruning method which safely prunes unnecessary tuples to reduce the search space. To further improve the efficiency, we design an efficient algorithm, namely Incremental Membership Algorithm (IMA) which efficiently answers FRanking queries by evaluating the ranks of incremental tuples under each threshold for the fuzzy set. We demonstrate the effectiveness and efficiency of our methods through the theoretical analysis and experiments with synthetic and real datasets.  相似文献   

8.
周帆  李树全  肖春静  吴跃 《计算机应用》2010,30(10):2605-2609
传感器网络等技术的广泛应用产生了大量不确定数据。近年来,对于不确定数据的处理和查询成为数据库和数据挖掘领域研究的热点。其中,传统关系数据库中的top-k查询和排序查询怎样拓展到不确定数据是其中的焦点之一。研究近年来提出的不确定数据库上top-k查询和排序查询算法,归纳和比较目前各种不同查询算法所适应的语义世界和应用场景,并详细分析各种算法的执行效率和算法复杂度。另外,对于不确定数据top-k查询和排序查询所面临的挑战和可能的研究方向进行了总结。  相似文献   

9.
A unified approach to ranking in probabilistic databases   总被引:1,自引:0,他引:1  
Ranking is a fundamental operation in data analysis and decision support and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank the tuples in a probabilistic dataset in recent years. In this article, we present a unified approach to ranking and top-k query processing in probabilistic databases by viewing it as a multi-criterion optimization problem and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databases, and we instead propose two parameterized ranking functions, called PRF ω and PRF e, that generalize or can approximate many of the previously proposed ranking functions. We present novel generating functions-based algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of our parameterized ranking functions, especially PRF e, at approximating other ranking functions and the scalability of our proposed algorithms for exact or approximate ranking.  相似文献   

10.
11.
Internet users may suffer the empty or too little answer problem when they post a strict query to the Web database. To address this problem, we develop a general framework to enable automatically query relaxation and top-k result ranking. Our framework consists of two processing steps. The first step is query relaxation. Based on the user original query, we speculate how much the user cares about each specified attribute by measuring its specified value distribution in the database. The rare distribution of the specified value of the attribute indicates the attribute may important for the user. According to the attribute importance, the original query is then rewritten as a relaxed query by expanding each query criterion range. The relaxed degree on each specified attribute is varied with the attribute weight adaptively. The most important attribute is relaxed with the minimum degree so that the answer returned by the relaxed query can be most relevant to the user original intention. The second step is top-k result ranking. In this step, we first generate user contextual preferences from query history and then use them to create a priori orders of tuples during the off-line pre-processing. Only a few representative orders are saved, each corresponding to a set of contexts. Then, these orders and associated contexts are used at querying time to expeditiously provide top-k relevant answers by using the top-k evaluation algorithm. Results of a preliminary user study demonstrate our query relaxation, and top-k result ranking methods can capture the users preferences effectively. The efficiency and effectiveness of our approach is also demonstrated.  相似文献   

12.
13.
There is a significant commercial and research interest in location-based web search engines. Given a number of search keywords and one or more locations (geographical points) that a user is interested in, a location-based web search retrieves and ranks the most textually and spatially relevant web pages. In this type of search, both the spatial and textual information should be indexed. Currently, no efficient index structure exists that can handle both the spatial and textual aspects of data simultaneously and accurately. Existing approaches either index space and text separately or use inefficient hybrid index structures with poor performance and inaccurate results. Moreover, most of these approaches cannot accurately rank web-pages based on a combination of space and text and are not easy to integrate into existing search engines. In this paper, we propose a new index structure called Spatial-Keyword Inverted File for Points to handle point-based indexing of web documents in an integrated/efficient manner. To seamlessly find and rank relevant documents, we develop a new distance measure called spatial tf-idf. We propose four variants of spatial-keyword relevance scores and two algorithms to perform top-k searches. As verified by experiments, our proposed techniques outperform existing index structures in terms of search performance and accuracy.  相似文献   

14.
Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem—rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key–Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.  相似文献   

15.
基于Web服务的异构数据库共享及同步机制   总被引:1,自引:0,他引:1  
分析了企业数据集成中存在异构数据库共享和同步问题,提出了一种基于Web Service的异地异构数据库集成方法,把分布在异地的异构数据库源通过Web Service连接起来形成一个异构的中心数据库,为用户提供一个透明统一的接口,用户不仅能够对中心数据库进行查询,还能够对中心数据库进行增、删、改的操作,并使之同步到异构源数据库中,同时源数据库端的数据和结构的改动也能同步到中心数据库上,之后对关键技术做了详细描述.最后,用实例表明了研究的框架如何应用于实际应用中.  相似文献   

16.
《Computer Networks》2002,38(6):779-794
This paper describes the design and use of a synthetic web proxy workload generator called ProWGen to investigate the sensitivity of web proxy cache replacement policies to five selected web workload characteristics. Three representative cache replacement policies are considered in the simulation study: a recency-based policy called least-recently-used, a frequency-based policy called least-frequently-used-with-aging, and a size-based policy called greedy-dual-size.Trace-driven simulations with synthetic workloads from ProWGen show the relative sensitivity of these cache replacement policies to three web workload characteristics: the slope of the Zipf-like document popularity distribution, the degree of temporal locality in the document referencing behaviour, and the correlation (if any) between document size and document popularity. The three replacement policies are relatively insensitive to the percentage of one-timers in the workload, and to the Pareto tail index of the heavy-tailed document size distribution. Performance differences between the three cache replacement policies are also highlighted.  相似文献   

17.
Multimedia Tools and Applications - Discovering the relevant web services for specific applications in the dynamically changing business world becomes very critical. Researchers have used many...  相似文献   

18.
The development of efficient algorithms for learning from large relational databases is an important task in applicative machine learning. In this paper, we study knowledge discovery in relational databases and develop an attribute-oriented learning method which extracts generalization rules from relational databases. The method adopts the artificial intelligence “learning-from-examples” paradigm and applies in the learning process an attribute-oriented concept tree ascending technique which integrates database operations with the learning process and provides a simple and efficient way of learning from databases. The method learns both characteristic rules and classification rules of a learning concept, where a characteristic rule characterizes the properties shared by all the facts of the class being learned; while a classification rule characterizes the properties that distinguish the class being learned from other classes. The learning result could be a conjunctive rule or a rule with a small number of disjuncts. Moreover, learning can be performed with databases containing noisy data and exceptional cases using database statistics. Our analysis of the algorithms shows that attribute-oriented induction substantially reduces the computational complexity of the database learning process. Le développement d'algorithmes efficaces permettant l'apprentissage à partir de bases de donnees relationnelles est une fonction importante de l'apprentissage automatique applicatif. Dans cet article, les auteurs examinent la découverte des connaissances dans les bases de données relationnelles et élaborent une méthode d'apprentissage orientée sur l'attribut qui extrait des bases de données relationnelles les règies de généralisation. La méthode adopte le paradigme d'apprentissage à partir d'exemples et applique au processus d'apprentissage la technique de l'arbre des concepts orientés sur l'attribut qui incorpore les opérations de base de données au processus d'apprentissage, ce qui permet d'obtenir une méthode simple et efficace d'apprentissage à partir des bases de données. La méthode fait l'apprentissage des règies caractéristiques et des règies de classification d'un concept d'apprentissage; la règie caractéristique qualifie les pro-priétés communes à tous les faits d'une categorie faisant l'objet d'un apprentissage alors que la règie de classification caractérise les propriétés qui distinguent la catégorie faisant l'objet d'un apprentissage des autres catégories. Le résultat peut ětre une règie conjonctive ou une règie ayant un petit nombre de disjonctifs. Qui plus est, 1′apprentissage peut se faire avec des bases de données contenant des donnees bruitees et des cas exceptionnels utilisant des statistiques de bases de données. L'analyse des algorithmes démontre que l'induction orientée sur l'attribut réduit considérablement la complexité informàtique du processus d'apprentissage des bases de données.  相似文献   

19.
《Applied Soft Computing》2007,7(1):398-410
Personalized search engines are important tools for finding web documents for specific users, because they are able to provide the location of information on the WWW as accurately as possible, using efficient methods of data mining and knowledge discovery. The types and features of traditional search engines are various, including support for different functionality and ranking methods. New search engines that use link structures have produced improved search results which can overcome the limitations of conventional text-based search engines. Going a step further, this paper presents a system that provides users with personalized results derived from a search engine that uses link structures. The fuzzy document retrieval system (constructed from a fuzzy concept network based on the user's profile) personalizes the results yielded from link-based search engines with the preferences of the specific user. A preliminary experiment with six subjects indicates that the developed system is capable of searching not only relevant but also personalized web pages, depending on the preferences of the user.  相似文献   

20.
Vitense HS  Jacko JA  Emery VK 《Ergonomics》2003,46(1-3):68-87
Multimodal interfaces offer great potential to humanize interactions with computers by employing a multitude of perceptual channels. This paper reports on a novel multimodal interface using auditory, haptic and visual feedback in a direct manipulation task to establish new recommendations for multimodal feedback, in particular uni-, bi- and trimodal feedback. A close examination of combinations of uni-, bi- and trimodal feedback is necessary to determine which enhances performance without increasing workload. Thirty-two participants were asked to complete a task consisting of a series of 'drag-and-drops' while the type of feedback was manipulated. Each participant was exposed to three unimodal feedback conditions, three bimodal feedback conditions and one trimodal feedback condition that used auditory, visual and haptic feedback alone, and in combination. Performance under the different conditions was assessed with measures of trial completion time, target highlight time and a self-reported workload assessment captured by the NASA Task Load Index (NASA-TLX). The findings suggest that certain types of bimodal feedback can enhance performance while lowering self-perceived mental demand.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号