共查询到20条相似文献,搜索用时 15 毫秒
2.
Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However,
all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of
multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering
a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem.
We further extend our methods to allow partial matches and matches using a keyword ontology. An extensive empirical evaluation
using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and
the efficiency of our methods. 相似文献
3.
Keyword search is the most popular technique for querying large tree-structured datasets, often of unknown structure, in the web. Recent keyword search approaches return lowest common ancestors (LCAs) of the keyword matches ranked with respect to their relevance to the keyword query. A major challenge of a ranking approach is the efficiency of its algorithms as the number of keywords and the size and complexity of the data increase. To face this challenge most of the known approaches restrict their ranking to a subset of the LCAs (e.g., SLCAs, ELCAs), missing relevant results.In this work, we design novel top-k-size stack-based algorithms on tree-structured data. Our algorithms implement ranking semantics for keyword queries which is based on the concept of LCA size. Similar to metric selection in information retrieval, LCA size reflects the proximity of keyword matches in the data tree. This semantics does not rank a predefined subset of LCAs and through a layered presentation of results, it demonstrates improved effectiveness compared to previous relevant approaches. To address performance challenges our algorithms exploit a lattice of the partitions of the keyword set, which empowers a linear time performance. This result is obtained without the support of auxiliary precomputed data structures. An extensive experimental study on various and large datasets confirms the theoretical analysis. The results show that, in contrast to other approaches, our algorithms scale smoothly when the size of the dataset and the number of keywords increase. 相似文献
4.
对于加密云数据的搜索,传统的关键词模糊搜索方案虽然能搜索到相关文档,但是搜索的结果并不令人满意。在用户输入正确的情况下,无法完成近似搜索,当用户出现拼写错误时,返回的结果中包含大量无关关键词文档,严重浪费了带宽资源。针对目前在加密云数据下关键词模糊搜索的缺陷,提出了一种新型的关键词模糊搜索方案,通过对关键词计算相关度分数并对文档根据相关度分数进行排序,将top-k(即相关度最高的k个文档)个文档返回给搜索用户,减少了不必要的带宽浪费和用户寻找有效文档的时间消耗,提供了更加有效的搜索结果,并且通过引入虚假陷门集,增大了云服务器对文档关键词的分析难度,增加了系统的隐私性保护。 相似文献
5.
Keyword based search systems are becoming increasingly popular and are considered a key feature in many information management systems. Keyword based search approaches have the significant advantage of not requiring users to know how data is organized or stored. Typical approaches assume the dataset to be modeled as a graph, where answers to queries are sub-graphs ranked according to some criteria. Exploring the graph and building and ranking quality pose a number of challenges. In this paper, we discuss Y aanii, an approach for effective Keyword Search over graph-modeled Web data. Y aanii contains a novel approach to keyword search, by extracting the best results from the first set of answers and then combining a solution building algorithm with a ranking technique. In addition to the algorithms and the processes for building result sets, we provide a detailed study of the computational and ranking complexity of Y aanii and compare it with other approaches. We show that Y aanii is superior in terms of efficiency and quality of returned results from both the experimental and theoretical aspects. 相似文献
6.
As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability. 相似文献
7.
为实现加密数据的细粒度密文搜索,并确保第三方服务器诚实可靠地执行搜索过程,同时尽可能降低用户端的计算和通信代价,提出支持密文搜索可验证的属性基可搜索加密方案。通过引入对称密钥加密体制,承诺方案和强一次性消息认证码,以经典的属性加密方案为基础构造算法,实现密文关键字的细粒度搜索以及搜索过程的可验证性,并证明方案具有选择性的数据安全性和搜索索引安全性,以及验证可靠性。与同类方案相比,该方案在达到同等安全性要求的情况下,进一步提高了终端用户的计算和通信效率。 相似文献
8.
Mobile computing over intelligent mobile is affecting human’s habits of obtaining information over Internet, especially keyword search. Most of previous keyword search works are mainly focused on traditional web data sources, in which the performance can be improved by adding more computing power and/or building more offline-computed index. However, it is very challenging to apply the traditional keyword search methods to mobile web-based keyword search because mobile computing has many different features, e.g., frequent disconnections, variety of bandwidths, limited power of mobile devices, limited data size to be downloaded, etc.. To address this challenge, in this paper we design an adaptive mobile-based XML keyword search approach, called XBridge-Mobile, that can derive the semantics of a keyword query and generate a set of effective structured patterns by analyzing the given keyword query and the schemas of XML data sources. Each structured pattern represents one of user’s possible search intentions. The patterns will be firstly sent to the mobile client from web server. And then, the mobile client can select some interested patterns to load the results. By doing this, we can reduce the communication cost a lot between web server and mobile client because only the derived patterns and a few results need to be transferred, not all the keyword search results, by which we can save lots of expenses when the downloaded data is priced. In addition, we can economically maintain the frequent structured pattern queries in the mobile device, which can further reduce the expense of downloading data. At last, we analyze and propose a ranking function to measure the quality of keyword search results, design a set of algorithms to optimize mobile keyword search based on the maintained structured patterns, and present the experimental study of XBridge-Mobile with real XML datasets. 相似文献
9.
Many challenging real world problems involve multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as classifiers must be able to deal with huge numbers of examples and to adapt to change using limited time and memory while being ready to predict at any point. This paper proposes a new experimental framework for learning and evaluating on multi-label data streams, and uses it to study the performance of various methods. From this study, we develop a multi-label Hoeffding tree with multi-label classifiers at the leaves. We show empirically that this method is well suited to this challenging task. Using our new framework, which allows us to generate realistic multi-label data streams with concept drift (as well as real data), we compare with a selection of baseline methods, as well as new learning methods from the literature, and show that our Hoeffding tree method achieves fast and more accurate performance. 相似文献
10.
Frontiers of Information Technology & Electronic Engineering - Keyword search is an alternative for structured languages in querying graph-structured data. A result to a keyword query is a... 相似文献
11.
Various approaches for keyword search in different settings (e.g., relational databases and XML) actually deal with the problem of enumerating K-fragments. For a given set of keywords K, a K-fragment is a subtree T of the given data graph, such that T contains all the keywords of K and no proper subtree of T has this property. There are three types of K-fragments: directed, undirected and strong. This paper describes efficient algorithms for enumerating K-fragments. Specifically, for all three types of K-fragments, algorithms are given for enumerating all K-fragments with polynomial delay and polynomial space. It is shown how these algorithms can be enhanced to enumerate K-fragments in a heuristic order. For directed K-fragments and acyclic data graphs, an algorithm is given for enumerating with polynomial delay in the order of increasing weight (i.e., the ranked order), assuming that K is of a fixed size. 相似文献
12.
针对基于数据图的关系数据库关键词查询结果的排序问题, 提出了基于多因素的结果二度排序法。该方法结合结果结构权重和信息检索中常用的内容匹配, 首先采用结果路径权重衡量关键词之间的关联紧密程度对结果粗排序; 然后, 对于结构权重相等的结果, 引入信息元组中的关键词词频和包含关键词的信息量对结果细排序。实验分析表明, 该排序方法能将与查询条件高度相关的结果排在前面, 提高结果的查准率。 相似文献
13.
传统的可搜索加密方案仅支持精确匹配的搜索,在效率和性能上都不能适应云计算环境。用支持多种字符串相似性操作的R+树构建索引,实现了云计算中对加密数据的模糊关键字搜索;用编辑距离来量化关键字的相似度,提出了一种可以返回与关键字更接近的文件检索方法。通过字符串聚类提高了模糊关键字搜索的效率。 相似文献
14.
Cloud storage over the internet gives opportunities for easy data sharing. To preserve the privacy of sharing data, the outsourced data is usually encrypted. The searchable encryption technique provides a solution to find the target data in the encrypted form. And the public-key encryption with keyword search is regarded as a major approach for the searchable encryption technique. However, there are still several privacy leakage challenges for the further adoption of these major schemes. One is how to resist the keyword guessing attack which still leaks data user’s keywords privacy. Another is how to construct the access control policy to prevent illegal access of outsourced data sharing since illegal access always leak the privacy of user’s attribute. In our paper, we firstly try to design a novel secure keyword index to resist the keyword guessing attack from access pattern and search pattern. Second, we propose an attribute-based encryption scheme which supports an enhanced fine-grained access control search. This allows the authenticated users to access different data although their searching request contains the same queried keywords, and meanwhile unauthenticated users cannot get any attribute privacy information. Third, we give security proofs to show that the construction of keyword index is against keyword guessing attack from the access pattern and search pattern, and our scheme is proved to be IND-CPA secure (the indistinguishability under chosen plaintext attack) under the standard model. Finally, theoretical analyses and a series of experiments are conducted to demonstrate the efficiency of our scheme. 相似文献
15.
随着在线地图应用的普及,基于地图的空间对象检索成为一个重要的工具而被广泛使用,技术也比较成熟。人们在地图上经常进行确定性目标点查询,例如用户提交关键词“咖啡店”,地图应用会在地图上标记所有的咖啡店,用户还可以通过进一步操作获取咖啡店的详细信息。但实际生活中存在另一种需求,例如用户想找到一个区域,在这个区域内要有“咖啡店”、“学校”和“旅店”这三类对象,称这样的查询为不确定性区域检索查询。目前对地图应用的研究无法解决不确定性区域检索的问题。而利用矩形剪枝和top-k推荐能够通过用户提交的关键字,给用户返回若干候选区域。 相似文献
16.
Multimedia Tools and Applications - A substitute solution for various organizations of data owners to store their data in the cloud using storage as a service(SaaS). The outsourced sensitive data... 相似文献
17.
Searchable encryption (SE) techniques allow cloud clients to easily store data and search encrypted data in a privacy-preserving manner, where most of SE schemes treat the cloud server as honest-but-curious. However, in practice, the cloud server is a semi-honest-but-curious third-party, which only executes a fraction of search operations and returns a fraction of false search results to save its computational and bandwidth resources. Thus, it is important to provide a results verification method to guarantee the correctness of the search results. Existing SE schemes allow multiple data owners to upload different records to the cloud server, but these schemes have very high computational and storage overheads when applied in a different but more practical setting where each record is co-owned by multiple data owners. To address this problem, we develop a verifiable keyword search over encrypted data in multi-owner settings (VKSE-MO) scheme by exploiting the multisignatures technique. Thus, our scheme only requires a single index for each record and data users are assured of the correctness of the search results in challenging settings. Our formal security analysis proved that the VKSE-MO scheme is secure against a chosen-keyword attack under a random oracle model. In addition, our empirical study using a real-world dataset demonstrated the efficiency and feasibility of the proposed scheme in practice. 相似文献
18.
The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources such as memory and computation time. In this paper, two scalable clustering algorithms, bEMADS and gEMADS, are presented based on the Gaussian mixture model. Both summarize data into subclusters and then generate Gaussian mixtures from their data summaries. Their core algorithm, EMADS, is defined on data summaries and approximates the aggregate behavior of each subcluster of data under the Gaussian mixture model. EMADS is provably convergent. Experimental results substantiate that both algorithms can run several orders of magnitude faster than expectation-maximization with little loss of accuracy. 相似文献
19.
The proliferation of GPS-enabled smart mobile devices enables us to collect a large-scale trajectories of moving objects with GPS tags. While the raw trajectories that only consists of positional information have been studied extensively, many recent works have been focusing on enriching the raw trajectories with semantic knowledge. The resulting data, called activity trajectories, embed the information about behaviors of the moving objects and support a variety of applications for better quality of services. In this paper, we propose a Top-k Spatial Keyword (T kSK) query for activity trajectories, with the objective to find a set of trajectories that are not only close geographically but also meet the requirements of the query semantically. Such kind of query can deliver more informative results than existing spatial keyword queries for static objects, since activity trajectories are able to reflect the popularity of user activities and reveal preferable combinations of facilities. However, it is a challenging task to answer this query efficiently due to the inherent difficulties in indexing trajectories as well as the new complexity introduced by the textual dimension. In this work, we provide a comprehensive solution, including the novel similarity function, hybrid indexing structure, efficient search algorithm and further optimizations. Extensive empirical studies on real trajectory set have demonstrated the scalability of our proposed solution. 相似文献
20.
Volumetric datasets with multiple variables on each voxel over multiple time steps are often complex, especially when considering the exponentially large attribute space formed by the variables in combination with the spatial and temporal dimensions. It is intuitive, practical, and thus often desirable, to interactively select a subset of the data from within that high-dimensional value space for efficient visualization. This approach is straightforward to implement if the dataset is small enough to be stored entirely in-core. However, to handle datasets sized at hundreds of gigabytes and beyond, this simplistic approach becomes infeasible and thus, more sophisticated solutions are needed. In this work, we developed a system that supports efficient visualization of an arbitrary subset, selected by range-queries, of a large multivariate time-varying dataset. By employing specialized data structures and schemes of data distribution, our system can leverage a large number of networked computers as parallel data servers, and guarantees a near optimal load-balance. We demonstrate our system of scalable data servers using two large time-varying simulation datasets. 相似文献
|