首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The increasing popularity of location-based social networks encourages more and more users to share their experiences. It deeply impacts the decision of customers when shopping, traveling, and so on. This paper studies the problem of top-K valuable documents query over geo-textual data stream. Many researchers have studied this problem. However, they do not consider the reliability of documents, where some unreliable documents may mislead customers to make improper decisions. In addition, they lack the ability to prune documents with low representativeness. In order to increase user satisfaction in recommendation systems, we propose a novel framework named PDS. It first employs an efficiently machine learning technique named ELM to prune unreliable documents, and then uses a novel index named \(\mathcal {GH}\) to maintain documents. For one thing, this index maintains a group of pruning values to filter low quality documents. For another, it utilizes the unique property of sliding window to further enhance the PDS performance. Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.  相似文献   

2.
Reachability is a fundamental problem on large-scale networks emerging nowadays in various application domains, such as social networks, communication networks, biological networks, road networks, etc. It has been studied extensively. However, little existing work has studied reachability with realistic constraints imposed on graphs with real-valued edge or node weights. In fact, such weights are very common in many real-world networks, for example, the bandwidth of a link in communication networks, the reliability of an interaction between two proteins in PPI networks, and the handling capacity of a warehouse/storage point in a distribution network. In this paper, we formalize a new yet important reachability query in weighted undirected graphs, called weight constraint reachability (WCR) query that asks: is there a path between nodes \(a\) and \(b\), on which each real-valued edge (or node) weight satisfies a range constraint. We discover an interesting property of WCR, based on which, we design a novel edge-based index structure to answer the WCR query in \(O(1)\) time. Furthermore, we consider the case when the index cannot entirely fit in the memory, which can be very common for emerging massive networks. An I/O-efficient index is proposed, which provides constant I/O (precisely four I/Os) query time with \(O(|V|\log |V|)\) disk-based index size. Extensive experimental studies on both real and synthetic datasets demonstrate the efficiency and scalability of our solutions in answering the WCR query.  相似文献   

3.
The best way of selecting samples in algebraic attacks against block ciphers is not well explored and understood. We introduce a simple strategy for selecting the plaintexts and demonstrate its strength by breaking reduced-round KATAN32, LBlock and SIMON. For each case, we present a practical attack on reduced-round version which outperforms previous attempts of algebraic cryptanalysis whose complexities were close to exhaustive search. The attack is based on the selection of samples using cube attack and ElimLin which was presented at FSE’12, and a new technique called Universal Proning. In the case of LBlock, we break 10 out of 32 rounds. In KATAN32, we break 78 out of 254 rounds. Unlike previous attempts which break smaller number of rounds, we do not guess any bit of the key and we only use structural properties of the cipher to be able to break a higher number of rounds with much lower complexity. We show that cube attacks owe their success to the same properties and therefore can be used as a heuristic for selecting the samples in an algebraic attack. The performance of ElimLin is further enhanced by the new Universal Proning technique, which allows to discover linear equations that are not found by ElimLin.  相似文献   

4.
Bit commitment schemes are at the basis of modern cryptography. Since information-theoretic security is impossible both in the classical and in the quantum regime, we examine computationally secure commitment schemes. In this paper we study worst-case complexity assumptions that imply quantum bit commitment schemes. First, we show that QSZK \({\not\subseteq}\) QMA implies a computationally hiding and statistically binding auxiliary-input quantum commitment scheme. We then extend our result to show that the much weaker assumption QIP \({\not\subseteq}\) QMA (which is weaker than PSPACE \({\not\subseteq}\) PP) implies the existence of auxiliary-input commitment schemes with quantum advice. Finally, to strengthen the plausibility of the separation QSZK \({\not\subseteq}\) QMA, we find a quantum oracle relative to which honest-verifier QSZK is not contained in QCMA, the class of languages that can be verified using a classical proof in quantum polynomial time.  相似文献   

5.
6.
Multiversion databases store both current and historical data. Rows are typically annotated with timestamps representing the period when the row is/was valid. We develop novel techniques to reduce index maintenance in multiversion databases, so that indexes can be used effectively for analytical queries over current data without being a heavy burden on transaction throughput. To achieve this end, we re-design persistent index data structures in the storage hierarchy to employ an extra level of indirection. The indirection level is stored on solid-state disks that can support very fast random I/Os, so that traversing the extra level of indirection incurs a relatively small overhead. The extra level of indirection dramatically reduces the number of magnetic disk I/Os that are needed for index updates and localizes maintenance to indexes on updated attributes. Additionally, we batch insertions within the indirection layer in order to reduce physical disk I/Os for indexing new records. In this work, we further exploit SSDs by introducing novel DeltaBlock techniques for storing the recent changes to data on SSDs. Using our DeltaBlock, we propose an efficient method to periodically flush the recently changed data from SSDs to HDDs such that, on the one hand, we keep track of every change (or delta) for every record, and, on the other hand, we avoid redundantly storing the unchanged portion of updated records. By reducing the index maintenance overhead on transactions, we enable operational data stores to create more indexes to support queries. We have developed a prototype of our indirection proposal by extending the widely used generalized search tree open-source project, which is also employed in PostgreSQL. Our working implementation demonstrates that we can significantly reduce index maintenance and/or query processing cost by a factor of 3. For the insertion of new records, our novel batching technique can save up to 90 % of the insertion time. For updates, our prototype demonstrates that we can significantly reduce the database size by up to 80 % even with a modest space allocated for DeltaBlocks on SSDs.  相似文献   

7.
Many real-world knowledge-based systems must deal with information coming from different sources that invariably leads to incompleteness, overspecification, or inherently uncertain content. The presence of these varying levels of uncertainty doesn’t mean that the information is worthless – rather, these are hurdles that the knowledge engineer must learn to work with. In this paper, we continue work on an argumentation-based framework that extends the well-known Defeasible Logic Programming (DeLP) language with probabilistic uncertainty, giving rise to the Defeasible Logic Programming with Presumptions and Probabilistic Environments (DeLP3E) model. Our prior work focused on the problem of belief revision in DeLP3E, where we proposed a non-prioritized class of revision operators called AFO (Annotation Function-based Operators) to solve this problem. In this paper, we further study this class and argue that in some cases it may be desirable to define revision operators that take quantitative aspects into account, such as how the probabilities of certain literals or formulas of interest change after the revision takes place. To the best of our knowledge, this problem has not been addressed in the argumentation literature to date. We propose the QAFO (Quantitative Annotation Function-based Operators) class of operators, a subclass of AFO, and then go on to study the complexity of several problems related to their specification and application in revising knowledge bases. Finally, we present an algorithm for computing the probability that a literal is warranted in a DeLP3E knowledge base, and discuss how it could be applied towards implementing QAFO-style operators that compute approximations rather than exact operations.  相似文献   

8.
The Compact Muon Solenoid (CMS) experiment at the European Organization for Nuclear Research (CERN) deploys its data collections, simulation and analysis activities on a distributed computing infrastructure involving more than 70 sites worldwide. The historical usage data recorded by this large infrastructure is a rich source of information for system tuning and capacity planning. In this paper we investigate how to leverage machine learning on this huge amount of data in order to discover patterns and correlations useful to enhance the overall efficiency of the distributed infrastructure in terms of CPU utilization and task completion time. In particular we propose a scalable pipeline of components built on top of the Spark engine for large-scale data processing, whose goal is collecting from different sites the dataset access logs, organizing them into weekly snapshots, and training, on these snapshots, predictive models able to forecast which datasets will become popular over time. The high accuracy achieved indicates the ability of the learned model to correctly separate popular datasets from unpopular ones. Dataset popularity predictions are then exploited within a novel data caching policy, called PPC (Popularity Prediction Caching). We evaluate the performance of PPC against popular caching policy baselines like LRU (Least Recently Used). The experiments conducted on large traces of real dataset accesses show that PPC outperforms LRU reducing the number of cache misses up to 20% in some sites.  相似文献   

9.
Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over uncertain graph data under the probabilistic semantics. Specifically, a measure called ${\varphi}$ -frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two real numbers ${0 < \varphi, \tau < 1}$ , the goal is to quickly find all subgraphs with ${\varphi}$ -frequent probability at least τ. Due to the NP-hardness of the problem and to the #P-hardness of computing the ${\varphi}$ -frequent probability of a subgraph, an approximate mining algorithm is proposed to produce an ${(\varepsilon, \delta)}$ -approximate set Π of “frequent subgraphs”, where ${0 < \varepsilon < \tau}$ is error tolerance, and 0 <?δ?< 1 is a confidence bound. The algorithm guarantees that (1) any frequent subgraph S is contained in Π with probability at least ((1 ? δ) /2) s , where s is the number of edges in S; (2) any infrequent subgraph with ${\varphi}$ -frequent probability less than ${\tau - \varepsilon}$ is contained in Π with probability at most δ/2. The theoretical analysis shows that to obtain any frequent subgraph with probability at least 1 ? Δ, the input parameter δ of the algorithm must be set to at most ${1 - 2 (1 - \Delta)^{1 / \ell_{\max}}}$ , where 0 <?Δ <?1, and ? max is the maximum number of edges in frequent subgraphs. Extensive experiments on real uncertain graph data verify that the proposed algorithm is practically efficient and has very high approximation quality. Moreover, the difference between the probabilistic semantics and the expected semantics on mining frequent subgraphs over uncertain graph data has been discussed in this paper for the first time.  相似文献   

10.
We introduce a rewrite-based specification language for modelling probabilistic concurrent and distributed systems. The language, based on PMaude, has both a rigorous formal basis and the characteristics of a high-level rule-based programming language. Furthermore, we provide tool support for performing discrete-event simulations of models written in PMaude, and for statistically analyzing various quantitative aspects of such models based on the samples that are generated through discrete-event simulation. Because distributed and concurrent communication protocols can be modelled using actors (concurrent objects with asynchronous message passing), we provide an actor PMaude module. The module aids writing specifications in a probabilistic actor formalism. This allows us to easily write specifications that are purely probabilistic – and not just non-deterministic. The absence of such (un-quantified) non-determinism in a probabilistic system is necessary for a form of statistical analysis that we also discuss. Specifically, we introduce a query language called Quantitative Temporal Expressions (or QuaTEx in short), to query various quantitative aspects of a probabilistic model. We also describe a statistical technique to evaluate QuaTEx expressions for a probabilistic model.  相似文献   

11.
This paper extends Common2, the family of objects that implement and are wait-free implementable from 2 consensus objects, in two ways: First, the stack object is shown to be in the family, refuting a conjecture to the contrary [6]. Second, Common2 is investigated in the unbounded concurrency model, whereas until now it was considered only in an n-process model. We show that the fetch-and-add, test-and-set , and stack objects are in Common2 even with respect to this stronger notion of wait-free implementation. Our constructions rely on a wait-free implementation of immediate snapshots in the unbounded concurrency model, which was previously not known to be possible. The introduction of unbounded concurrency to the study of Common2 opens several directions of research: are there objects that have n-process implementations but are not unbounded concurrency implementable? We conjecture that swap is such an object. Additionally, the hope is that a queue impossibility proof, which eludes us in the n-process model, will be easier to establish in the unbounded concurrency model.  相似文献   

12.
Continuous visible nearest neighbor query processing in spatial databases   总被引:1,自引:0,他引:1  
In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms.  相似文献   

13.
In this paper, we introduce a new problem termed query reverse engineering (QRE). Given a database \(D\) and a result table \(T\) —the output of some known or unknown query \(Q\) on \(D\) —the goal of QRE is to reverse-engineer a query \(Q'\) such that the output of query \(Q'\) on database \(D\) (denoted by \(Q'(D)\) ) is equal to \(T\) (i.e., \(Q(D)\) ). The QRE problem has useful applications in database usability, data analysis, and data security. In this work, we propose a data-driven approach, TALOS for Tree-based classifier with At Least One Semantics, that is based on a novel dynamic data classification formulation and extend the approach to efficiently support the three key dimensions of the QRE problem: whether the input query is known/unknown, supporting different query fragments, and supporting multiple database versions.  相似文献   

14.
This paper presents the Argonauts multi-agent framework which was developed as part of a one year student project at Technische Universität Dortmund. The Argonauts framework builds on a BDI approach to model rational agents that act cooperatively in a dynamic and indeterministically changing environment. However, our agent model extends the traditional BDI approach in several aspects, most notably by incorporating motivation into the agent’s goal selection mechanism. The framework has been applied by the Argonauts team in the 2010 version of the annual multi-agent programming contest organized by Technische Universität Clausthal. In this paper, we present a high-level specification and analysis of the actual system used for solving the given scenario. We do this by applying the GAIA methodology, a high-level and iterative approach to model communication and roles in multi-agent scenarios. We further describe the technical details and insights gained during our participation in the multi-agent programming contest.  相似文献   

15.
16.
17.
The goal of fairness-aware classification is to categorize data while taking into account potential issues of fairness, discrimination, neutrality, and/or independence. For example, when applying data mining technologies to university admissions, admission criteria must be non-discriminatory and fair with regard to sensitive features, such as gender or race. In this context, such fairness can be formalized as statistical independence between classification results and sensitive features. The main purpose of this paper is to analyze this formal fairness in order to achieve better trade-offs between fairness and prediction accuracy, which is important for applying fairness-aware classifiers in practical use. We focus on a fairness-aware classifier, Calders and Verwer’s two-naive-Bayes (CV2NB) method, which has been shown to be superior to other classifiers in terms of fairness. We hypothesize that this superiority is due to the difference in types of independence. That is, because CV2NB achieves actual independence, rather than satisfying model-based independence like the other classifiers, it can account for model bias and a deterministic decision rule. We empirically validate this hypothesis by modifying two fairness-aware classifiers, a prejudice remover method and a reject option-based classification (ROC) method, so as to satisfy actual independence. The fairness of these two modified methods was drastically improved, showing the importance of maintaining actual independence, rather than model-based independence. We additionally extend an approach adopted in the ROC method so as to make it applicable to classifiers other than those with generative models, such as SVMs.  相似文献   

18.
19.
We study the problem of answering k -hop reachability queries in a directed graph, i.e., whether there exists a directed path of length $k$ , from a source query vertex to a target query vertex in the input graph. The problem of $k$ -hop reachability is a general problem of the classic reachability (where $k=\infty $ ). Existing indexes for processing classic reachability queries, as well as for processing shortest path distance queries, are not applicable or not efficient for processing $k$ -hop reachability queries. We propose an efficient index for processing $k$ -hop reachability queries. Our experimental results on a wide range of real datasets show that our method is efficient and scalable in terms of both index construction and query processing.  相似文献   

20.
Order dependencies (ODs) describe a relationship of order between lists of attributes in a relational table. ODs can help to understand the semantics of datasets and the applications producing them. They have applications in the field of query optimization by suggesting query rewrites. Also, the existence of an OD in a table can provide hints on which integrity constraints are valid for the domain of the data at hand. This work is the first to describe the discovery problem for order dependencies in a principled manner by characterizing the search space, developing and proving pruning rules, and presenting the algorithm Order, which finds all order dependencies in a given table. Order traverses the lattice of permutations of attributes in a level-wise bottom-up manner. In a comprehensive evaluation, we show that it is efficient even for various large datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号