首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 187 毫秒
1.
为了解决Web数据库多查询结果问题,提出了一种基于改进决策树算法的Web数据库查询结果自动分类方法.该方法在离线阶段分析系统中所有用户的查询历史并聚合语义上相似的查询,根据聚合的查询将原始数据划分成多个元组聚类,每个元组聚类对应一种类型的用户偏好.当查询到来时,基于离线阶段划分的元组聚类,利用改进的决策树算法在查询结果集上自动构建一个带标签的分层分类树,使得用户能够通过检查标签的方式快速选择和定位其所需信息.实验结果表明,提出的分类方法具有较低的搜索代价和较好的分类效果,能够有效地满足不同类型用户的个性化查询需求.  相似文献   

2.
针对普通的空间关键字查询通常会导致多查询结果的问题。本文提出了一种基于空间对象位置-文本相关度的top- k 查询与排序方法,用于获取与给定空间关键字查询在文本上相关且位置上相近的典型空间对象。该方法分为离线处理和在线查询处理2个阶段。在离线阶段,根据空间对象之间的位置相近性和文本相似性,度量任意一对空间对象之间的位置-文本关系紧密度。在此基础上,提出了基于概率密度的代表性空间对象选取算法,根据空间对象之间的位置-文本关系为每个代表性空间对象构建相应的空间对象序列。在线查询处理阶段,对于一个给定的空间关键字查询,利用Cosine相似度评估方法计算查询条件与代表性空间对象之间的相关度,然后使用阈值算法(threshold algorithm,TA)在预先创建的空间对象序列上快速选出top- k 个满足查询需求的典型空间对象。实验结果表明:提出的空间对象top- k 查询与排序方法能够有效地满足用户查询需求,并且具有较高的准确性、典型性和执行效率。  相似文献   

3.
现有的空间关键字查询处理模式大都仅支持位置相近和文本相似匹配,但不能将语义相近但形式上不匹配的对象提供给用户;并且,当前的空间-文本索引结构也不能对空间对象中的数值属性进行处理。针对上述问题,本文提出了一种支持语义近似查询的空间关键字查询方法。首先,利用词嵌入技术对用户原始查询进行扩展,生成一系列与原始查询关键字语义相关的查询关键字;然后,提出了一种能够同时支持文本和语义匹配,并利用Skyline方法对数值属性进行处理的混合索引结构AIR-Tree;最后,利用AIR-Tree进行查询匹配,返回top-k个与查询条件最为相关的有序空间对象。实验分析和结果表明,与现有同类方法相比,本文方法具有较高的执行效率和较好的用户满意度;基于AIR-Tree索引的查询效率较IRS-Tree索引提高了3.6%,在查询结果准确率上较IR-Tree和IRS-Tree索引分别提高了10.14%和16.15%。  相似文献   

4.
一种基于统计语义聚类的查询语言模型估计   总被引:2,自引:0,他引:2  
如何有效生成文档聚类并使用聚类信息提高检索效果是信息检索中的重要研究课题.如果假设文档中存在若干隐含的独立主题,那么文档可以看成是由这些隐含的独立主题混合噪声相互作用的结果.基于这个假设提出了一种基于独立分量分析的语义聚类技术,试图借助于独立分量分析的良好主题区分能力,将一组文档按照实际隐含的主题在语义空间上聚类.在语言模型的框架下,语义主题聚类将由用户初始查询按照一定的度量方式激活.利用激活语义聚类的信息估计一个反馈语义主题模型,并与初始查询模型一起形成新的查询模型.在5个TREC数据集上的实验结果表明:基于统计语义聚类估计的查询模型相比传统的查询模型以及其他基于聚类的语言模型在检索性能上有显著性提高.其主要原因是应用了和用户查询最相似的语义聚类信息来估计查询模型.  相似文献   

5.
搜索引擎根据特定关键字查询返回的结果,可以基于语义进行分类组织,提高用户查询效率。但分类方法是基于预定义类别的,由于类别不全或更新不及,对于互联网上的信息可能会造成遗漏。本文提出了一种将分类与聚类方法相结合的方法来优化搜索结果,即分类之后,用聚类的方法来处理未被归入任何类别的信息。研究表明,该方法可以兼顾效率和信息完整性。  相似文献   

6.
一种面向协作标签系统的图片检索聚类方法   总被引:2,自引:0,他引:2       下载免费PDF全文
为了更有效地进行图片检索,提出了一种面向Web2.0协作标签系统的图片检索聚类方法。该算法首先针对标签空间由于标签表达多样性带来的不一致问题,并通过挖掘标签间的词汇关系实现语义级查询扩展来得到语义可能相关的扩展图片结果集;然后根据标签间的相关度度量选出图片结果集中与查询标签高相关的标签集,接着采用一种自顶向下启发式的图划分算法来自动对次相关标签集进行分类。最后图片结果集即根据标签分类结果被聚类。为验证该方法的效果,从标签图片共享网站Flickr上随机下载了大量真实图片集以及所含带的标签元数据,在已实现的图片检索原型系统PivotBrowser上进行了大量实验,结果证明,该聚类算法能有效解决标签空间存在的标签表达不一致问题和标签查询歧义性问题,能提供更满意的用户检索。  相似文献   

7.
构件的合理分类是实现构件高效检索的基础和关键。针对目前应用广泛的刻面分类方法存在主观性因素的弊端,采用刻面分类和全文检索相结合的方法来描述构件。在此构件描述的基础上,利用聚类分析技术和语义分析技术提出一种基于语义的构件聚类索引树。并通过实验验证,该聚类索引树是可行的,有效地克服刻面分类方法的缺点,在一定程度上实现对构件的语义检索,而且具有较高的构件查全率和查准率。此外,用户在描述检索条件时,不再局限于限定的术语,更方便于普通用户。  相似文献   

8.
一种基于XML文档聚类的XML近似查询算法   总被引:1,自引:0,他引:1       下载免费PDF全文
提出了一种基于XML文档聚类的XML近似查询算法。给出了基于语义的XML文档间距离的计算方法,结合该语义距离,提出了基于网格的八邻域聚类算法对XML数据库进行聚类划分,进而利用在聚类过程中得到的聚类中心对静态有序选择算法的近似查询评估阶段进行优化,使得不用对XML数据库进行完全遍历就能及时返回满足用户需要的查询结果。最后,在汽车外形智能化设计的实验中表明该算法有效地提高了静态有序选择算法的查询效率。  相似文献   

9.
目前大多搜索引擎结果聚类算法针对用户查询生成的网页摘要进行聚类,由于网页摘要较短且质量良莠不齐,聚类效果难以保证。提出了一种基于频繁词义序列的检索结果聚类算法,利用WordNet结合句法和语义特征对搜索结果构建聚类及标签。不像传统的基于向量空间模型的聚类算法,考虑了词语在文档中的序列模式。算法首先对文本进行预处理,生成压缩文档以降低文本数据维度,构建广义后缀树,挖掘出最大频繁项集,然后获取频繁词义序列。从文档中获取的有序频繁项集可以更好地反映文档的主题,把相同主题的搜索结果聚类在一起,与用户查询相关度高的优先排序。实验表明,该算法可以获得与查询相关的高质量聚类及基于语义的聚类标签,具有更高的聚类准确度和更高的运行效率,并且可扩展性良好。  相似文献   

10.
QR-树处理海量空间数据时,其深度和R-树内目录矩形的重叠面积会变大,导致查询效率降低。针对该问题采用K-means算法对索引对象进行聚类分析,构造新的聚类中心使其能处理具有多种形体的索引对象,并在QR-树中引入超结点存储聚类结果。提出一种QCR-树空间索引结构来提高查询效率,给出QCR-树的插入、删除和查询算法。实验结果表明QCR-树的查询性能优于QR-树,适用于海量数据。  相似文献   

11.
A knowledge-based approach for retrieving images by content   总被引:10,自引:0,他引:10  
A knowledge based approach is introduced for retrieving images by content. It supports the answering of conceptual image queries involving similar-to predicates, spatial semantic operators, and references to conceptual terms. Interested objects in the images are represented by contours segmented from images. Image content such as shapes and spatial relationships are derived from object contours according to domain specific image knowledge. A three layered model is proposed for integrating image representations, extracted image features, and image semantics. With such a model, images can be retrieved based on the features and content specified in the queries. The knowledge based query processing is based on a query relaxation technique. The image features are classified by an automatic clustering algorithm and represented by Type Abstraction Hierarchies (TAHs) for knowledge based query processing. Since the features selected for TAH generation are based on context and user profile, and the TAHs can be generated automatically by a clustering algorithm from the feature database, our proposed image retrieval approach is scalable and context sensitive. The performance of the proposed knowledge based query processing is also discussed  相似文献   

12.
空间索引结构和查询技术在空间数据库中具有重要的作用,针对已有的方法在复杂空间数据对象的近似和组织方面的局限性,提出了一种基于最小外接矩形(MBR)、梯形和圆的新的索引结构(RTC树).为了有效处理复杂空间数据对象的最近邻(NN)关系查询问题,提出了基于RTC树的最近邻查询(NNRTC)算法,NNRTC算法利用剪枝规则可减少节点遍历和距离计算.针对障碍物对数据集中最近邻的影响问题,提出了障碍物环境下的基于RTC树的最近邻查询(BNNRTC)算法,BNNRTC算法先在理想空间进行查询,再对查询结果进行判断.为了有效处理动态单纯型连续近邻链查询问题,进一步给出了基于RTC树的动态单纯型连续近邻链查询(SCNNCRTC)算法.实验结果表明,相对基于R树的查询方法,所提的方法在处理数据量较大的复杂空间对象的数据集时可提高60%~80%的效率.  相似文献   

13.
阈值优化的文本密度聚类算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对DBSCAN算法的聚类性能受全局阈值影响而降低的问题,提出一种阈值优化的文本密度聚类算法。该算法使用k-近邻距离对对象进行排序,通过分位数区分密度不同的各序列,找到与其对应的优化,根据优化阈值使用密度聚类方法对对象进行聚类。改进后的聚类算法克服了阈值选取对聚类结果影响的问题,提高了聚类精确度和时间效率。采用树形结构存储聚簇,增加了聚簇的可读性。实验结果证明了该算法的有效性。  相似文献   

14.
15.
Recently, several techniques have been proposed to protect the user location privacy for location-based services in the Euclidean space. Applying these techniques directly to the road network environment would lead to privacy leakage and inefficient query processing. In this paper, we propose a new location anonymization algorithm that is designed specifically for the road network environment. Our algorithm relies on the commonly used concept of spatial cloaking, where a user location is cloaked into a set of connected road segments of a minimum total length L{\cal L} including at least K{\cal K} users. Our algorithm is “query-aware” as it takes into account the query execution cost at a database server and the query quality, i.e., the number of objects returned to users by the database server, during the location anonymization process. In particular, we develop a new cost function that balances between the query execution cost and the query quality. Then, we introduce two versions of our algorithm, namely, pure greedy and randomized greedy, that aim to minimize the developed cost function and satisfy the user specified privacy requirements. To accommodate intervals with a high workload, we introduce a shared execution paradigm that boosts the scalability of our location anonymization algorithm and the database server to support large numbers of queries received in a short time period. Extensive experimental results show that our algorithms are more efficient and scalable than the state-of-the-art technique, in terms of both query execution cost and query quality. The results also show that our algorithms have very strong resilience to two privacy attacks, namely, the replay attack and the center-of-cloaked-area attack.  相似文献   

16.
With the emergence of location-aware mobile device technologies, communication technologies and GPS systems, the location based queries have attracted great attentions in the database literature. In many user recommendation web services, the spatial preference query is used to suggest the objects based on their spatial proximity with the facilities. In this paper, we study the problem of general spatial skyline (GSSKY) which can provide the minimal candidate set of the optimal solutions for any monotonic distance based spatial preference query. Efficient progressive algorithm called P-GSSKY is proposed to significantly reduce the number of non-promising objects in the computation. Moreover, we also propose spatial join based algorithm, called J-GSSKY, which can compute GSSKY efficiently in terms of I/O cost. The paper conducts a comprehensive performance study of the proposed techniques based on both real and synthetic data.  相似文献   

17.
pSCAN算法的聚类结果受密度约束参数和相似度阈值参数的影响,如果用户提供的聚类参数得到的聚类结果无法满足需求,那么用户可以通过实例簇表达自己的聚类需求。针对实例簇表达聚类查询需求的问题,提出一种实例簇驱动的图结构聚类参数计算算法PART及其改进算法ImPART。首先,分析两个聚类参数对聚类结果的影响,并提取实例簇的相关子图;其次,对相关子图进行分析得到密度约束参数的可行区间,并根据当前密度约束参数和节点之间的结构相似度将实例簇内节点划分为核心节点和非核心节点;最后,依据节点划分结果计算出当前密度约束参数对应的最优相似度阈值参数,并在相关子图上对得到的参数进行验证和优化,直到得到满足实例簇需求的聚类参数。在真实数据集上的实验结果表明,所提算法能够为用户实例簇返回一组有效参数,且所提改进算法ImPART的运行时间比PART缩短了20%以上,能够快速有效地为用户返回满足实例簇要求的最优聚类参数。  相似文献   

18.
Buffer queries   总被引:2,自引:0,他引:2  
A class of commonly asked queries in a spatial database is known as buffer queries. An example of such a query is to "find house-power line pairs that are within 50 meters of each other." A buffer query involves two spatial data sets and a distance d. The answer to this query are pairs of objects, one from each input set, that are within distance d of each other. Given nonpoint spatial objects, evaluation of buffer queries could be a costly operation, even when the numbers of objects in the input data sets are relatively small. This paper addresses the problem of how to evaluate this class of queries efficiently. A fundamental problem with buffer query evaluation is to find an efficient algorithm for solving the minimum distance (miniDist) problem for lines and regions. An efficient minDist algorithm, which only requires a subsequence of segments from each object to be examined, is derived. Finding a fast minDist algorithm is the first step in evaluating a buffer query efficiently. It is observed that many, and sometimes even most, candidates can be proven in the answer without resorting to the relatively expensive minDist operation. A candidate is first evaluated with a least expensive technique-called O-object filtering. If it fails, a more costly operation, called 1-object filtering, is applied. Finally, if both filterings fail, the most expensive minDist algorithm is invoked. To show the effectiveness of the these techniques, they are incorporated into the well-known tree join algorithm and tested with real-life as well as artificial data sets. Extensive experiments show that the proposed algorithm outperforms existing techniques by a wide margin in both execution time as well as IO accesses. More importantly, the performance gain improves drastically with the increase of distance values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号