共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
It is widely recognized that effective ranking methods for relational data (e.g., tuples) enable users to overcome the limitations of the traditional Boolean retrieval model and the hardness of structured query writing. To determine the rank of a tuple, term frequency-based methods, such as tf × idf (term frequency × inverse document frequency) schemes, have been commonly adopted in the literature by simply considering a tuple as a single document. However, in many cases, we have noted that tf × idf schemes may not produce effective rankings or specific orderings for relational data with categorical attributes, which is pervasive today. To support fundamental aspects of relational data, we apply the notions of correlation analysis to estimate the extent of relationships between queries and data. This paper proposes a probabilistic ranking model to exploit statistical relationships that exist in relational data of categorical attributes. Given a set of query terms, information on correlative attribute values to the query terms is used to estimate the relevance of the tuple to the query. To quantify the information, we compute the extent of the dependency between correlative attribute values on a Bayesian network. Moreover, we avoid the prohibitive cost of computing insignificant ranking features based on a limited assumption of node independence. Our probabilistic ranking model is domain-independent and leverages only data statistics without any prior knowledge such as user query logs. Experimental results show that our work improves the effectiveness of rankings for real-world datasets and has a reasonable query processing efficiency compared to related work. 相似文献
3.
A hypercube algorithm to solve the list ranking problem is presented. Let n be the length of the list, and let p be the number of processors of the hypercube. The algorithm described runs in time O(n /p ) when n =Ω(p 1+ε) for any constant ε>0, and in time O(n log n /p +log3 p ) otherwise. This clearly attains a linear speedup when n =Ω(p 1+ε). Efficient balancing and routing schemes had to be used to achieve the linear speedup. The authors use these techniques to obtain efficient hypercube algorithms for many basic graph problems such as tree expression evaluation, connected and biconnected components, ear decomposition, and st-numbering. These problems are also addressed in the restricted model of one-port communication 相似文献
4.
5.
Efficient aggregation algorithms for compressed data warehouses 总被引:9,自引:0,他引:9
Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms 相似文献
6.
PeiZong Lee 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(8):825-839
Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this sequence of Do-loops is larger than a threshold value. Based on this observation, we can prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure. Experimental studies on a 32-node nCUBE-2 computer are also presented 相似文献
7.
Square-root algorithms for least-squares estimation 总被引:1,自引:0,他引:1
We present several new algorithms, and more generally a new approach, to recursive estimation algorithms for linear dynamical systems. Earlier results in this area have been obtained by several others, especially Potter, Golub, Dyer and McReynolds, Kaminski, Schmidt, Bryson, and Bierman on what are known as square-root algorithms. Our results are more comprehensive. They also show bow constancy of parameters can be exploited to reduce the number of computations and to obtain new forms of the Chandrasekhar-type equations for computing the filter gain. Our approach is essentially based on certain simple geometric interpretations of the overall estimation problem. One of our goals is to attract attention to non-Riccati-based studies of estimation problems. 相似文献
8.
9.
Generalization properties of support vector machines, orthogonal least squares and zero-order regularized orthogonal least squares algorithms are studied using simulation. For high signal-to-noise ratios (40 dB), mixed results are obtained, but for a low signal-to-noise ratio, the prediction performance of support vector machines is better than the orthogonal least squares algorithm in the examples considered. However, the latter can usually give a parsimonious model with very fast training and testing time. Two new algorithms are therefore proposed that combine the orthogonal least squares algorithm with support vector machines to give a parsimonious model with good prediction accuracy in the low signal-to-noise ratio case. 相似文献
10.
Watermarking relational data: framework,algorithms and analysis 总被引:3,自引:0,他引:3
We enunciate the need for watermarking database relations to deter data piracy, identify the characteristics of relational data that pose unique challenges for watermarking, and delineate desirable properties of a watermarking system for relational data. We then present an effective watermarking technique geared for relational data. This technique ensures that some bit positions of some of the attributes of some of the tuples contain specific values. The specific bit locations and values are algorithmically determined under the control of a secret key known only to the owner of the data. This bit pattern constitutes the watermark. Only if one has access to the secret key can the watermark be detected with high probability. Detecting the watermark requires access neither to the original data nor the watermark, and the watermark can be easily and efficiently maintained in the presence of insertions, updates, and deletions. Our analysis shows that the proposed technique is robust against various forms of malicious attacks as well as benign updates to the data. Using an implementation running on DB2, we also show that the algorithms perform well enough to be used in real-world applications.Received: 29 July 2002, Accepted: 10 December 2002, Published online: 10 July 2003Edited by P. BernsteinA preliminary version of this paper appeared in the Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002. 相似文献
11.
Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals,
such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information
is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive
information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using
publishing data. The goal is to protect sensitive information in the presence of data inference using derived association
rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing
a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing
information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate
these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published
data and improve the performance of existing algorithms.
Supported by the Program for New Century Excellent Talents in Universities (Grant No. NCET-06-0290), the National Natural
Science Foundation of China (Grant Nos. 60828004, 60503036), and the Fok Ying Tong Education Foundation Award (Grant No. 104027) 相似文献
12.
Wolf J.L. Dias D.M. Yu P.S. Turek J. 《Knowledge and Data Engineering, IEEE Transactions on》1994,6(6):990-997
Parallel processing is an attractive option for relational database systems. As in any parallel environment however, load balancing is a critical issue which affects overall performance. Load balancing for one common database operation in particular, the join of two relations, can be severely hampered for conventional parallel algorithms, due to a natural phenomenon known as data skew. In a pair of recent papers (J. Wolf et al., 1993; 1993), we described two new join algorithms designed to address the data skew problem. We propose significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. The paper then focuses on the comparative performance of the improved algorithms and their more conventional counterparts. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew 相似文献
13.
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Different algorithms offer different choices of the optimization criterion. Two popular least-squares algorithms for performing this task are the Bellman residual method, which minimizes the Bellman residual, and the fixed point method, which minimizes the projection of the Bellman residual. When used within policy iteration, the fixed point algorithm tends to ultimately find better performing policies whereas the Bellman residual algorithm exhibits more stable behavior between rounds of policy iteration. We propose two hybrid least-squares algorithms to try to combine the advantages of these algorithms. We provide an analytical and geometric interpretation of hybrid algorithms and demonstrate their utility on a simple problem. Experimental results on both small and large domains suggest hybrid algorithms may find solutions that lead to better policies when performing policy iteration. 相似文献
14.
Gae-won You 《Information Sciences》2008,178(20):3925-3942
As data of an unprecedented scale are becoming accessible on the Web, personalization, of narrowing down the retrieval to meet the user-specific information needs, is becoming more and more critical. For instance, while web search engines traditionally retrieve the same results for all users, they began to offer beta services to personalize the results to adapt to user-specific contexts such as prior search history or other application contexts. In a clear contrast to search engines dealing with unstructured text data, this paper studies how to enable such personalization in the context of structured data retrieval. In particular, we adopt contextual ranking model to formalize personalization as a cost-based optimization over collected contextual rankings. With this formalism, personalization can be abstracted as a cost-optimal retrieval of contextual ranking, closely matching user-specific retrieval context. With the retrieved matching context, we adopt a machine learning approach, to effectively and efficiently identify the ideal personalized ranked results for this specific user. Our empirical evaluations over synthetic and real-life data validate both the efficiency and effectiveness of our framework. 相似文献
15.
Beyer D. Noack A. Lewerentz C. 《IEEE transactions on pattern analysis and machine intelligence》2005,31(2):137-149
Calculating with graphs and relations has many applications in the analysis of software systems, for example, the detection of design patterns or patterns of problematic design and the computation of design metrics. These applications require an expressive query language, in particular, for the detection of graph patterns, and an efficient evaluation of the queries even for large graphs. In this paper, we introduce RML, a simple language for querying and manipulating relations based on predicate calculus, and CrocoPat, an interpreter for RML programs. RML is general because it enables the manipulation not only of graphs (i.e., binary relations), but of relations of arbitrary arity. CrocoPat executes RML programs efficiently because it internally represents relations as binary decision diagrams, a data structure that is well-known as a compact representation of large relations in computer-aided verification. We evaluate RML by giving example programs for several software analyses and CrocoPat by comparing its performance with calculators for binary relations, a Prolog system, and a relational database management system. 相似文献
16.
This paper presents new algorithms-fuzzy c-medoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)-for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the well-known relational fuzzy c-means algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis 相似文献
17.
The use of a special purpose VLSI chip for relational operations is proposed The chip is structured like a tree with processors
at the nodes, called TOP (Tree of Processors). Each node is capable of storing a data element and of performing elementary
operations on elements. A table ofn tuples ofk elements each (e. g., a relation defined as in data base theory) is stored inn subtrees of at leastk nodes each, at the lowest levels of TOP. The upper portion of TOP is used for routing and bookkeeping purposes. A number
of elementary operations are defined for the nodes, and high level operations on tables are performed as combinations of the
former ones. In particular, some operations for data input/output and update are discussed, and the basic operations of UNION,
DIFFERENCE, PROJECTION, PRODUCT, SELECTION, and JOIN, defined in relational algebra, are studied for TOP realization. Even
the most complex operations are executed inO (kn) steps, that is the size of data. This result is optimal in our system, where we assume that data are transmitted to TOP's
through channels of constan bandwidth.
Dedicated to Professor S. Faedo on his 70th birthday
This research has been partially supported by Ministero della Pubblica Istruzione of Italy. 相似文献
18.
Parallel algorithms for relational coarsest partition problems 总被引:2,自引:0,他引:2
Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are P-complete and hence it may not be possible to design polylog time parallel algorithms for these problems. In this paper, we present two efficient parallel algorithms for RCPP in which its associated label transition system is assumed to have m transitions and n states. The first algorithm runs in O(n1+ϵ) time using m/nϵ CREW PRAM processors, for any fixed ϵ<1. This algorithm is analogous to and optimal with respect to the sequential algorithm of P.C. Kanellakis and S.A. Smolka (1990). The second algorithm runs in O(n log n) time using m/n CREW PRAM processors. This algorithm is analogous to and nearly optimal with respect to the sequential algorithm of R. Paige and R.E. Tarjan (1987) 相似文献
19.
Vali Derhami Elahe Khodadadian Mohammad Ghasemzadeh Ali Mohammad Zareh Bidoki 《Applied Soft Computing》2013,13(4):1686-1692
Ranking web pages for presenting the most relevant web pages to user's queries is one of the main issues in any search engine. In this paper, two new ranking algorithms are offered, using Reinforcement Learning (RL) concepts. RL is a powerful technique of modern artificial intelligence that tunes agent's parameters, interactively. In the first step, with formulation of ranking as an RL problem, a new connectivity-based ranking algorithm, called RL_Rank, is proposed. In RL_Rank, agent is considered as a surfer who travels between web pages by clicking randomly on a link in the current page. Each web page is considered as a state and value function of state is used to determine the score of that state (page). Reward is corresponded to number of out links from the current page. Rank scores in RL_Rank are computed in a recursive way. Convergence of these scores is proved. In the next step, we introduce a new hybrid approach using combination of BM25 as a content-based algorithm and RL_Rank. Both proposed algorithms are evaluated by well known benchmark datasets and analyzed according to concerning criteria. Experimental results show using RL concepts leads significant improvements in raking algorithms. 相似文献
20.
关联是数据挖掘领域的一个重要研究课题。对模糊关联规则挖掘进行了研究,针对普通关联规则不能精确表达数据库中模糊信息关联性的问题,提出了一种新的模糊关联规则挖掘算法FARM_New,结果表明算法是有效的,提高了模糊挖掘的速度。 相似文献