首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Data mining can be defined as a process for finding trends and patterns in large data. An important technique for extracting useful information, such as regularities, from usually historical data, is called as association rule mining. Most research on data mining is concentrated on traditional relational data model. On the other hand, the query flocks technique, which extends the concept of association rule mining with a ‘generate-and-test’ model for different kind of patterns, can also be applied to deductive databases. In this paper, query flocks technique is extended with view definitions including recursive views. Although in our system query flock technique can be applied to a data base schema including both the intensional data base (IDB) or rules and the extensible data base (EDB) or tabled relations, we have designed an architecture to compile query flocks from datalog into SQL in order to be able to use commercially available data base management systems (DBMS) as an underlying engine of our system. However, since recursive datalog views (IDB's) cannot be converted directly into SQL statements, they are materialized before the final compilation operation. On this architecture, optimizations suitable for the extended query flocks are also introduced. Using the prototype system, which is developed on a commercial database environment, advantages of the new architecture together with the optimizations, are also presented.  相似文献   

2.
Visual spatio-temporal function-based querying   总被引:1,自引:0,他引:1  
Visual interfaces are very important for human interactions in cyberworlds. Visual spatio-temporal querying should be one of the basic tools for data mining and retrieval in cyberworlds. In this paper, we propose a novel function-based query model for arbitrary shape spatio-temporal querying. The queries are defined as geometric shapes changing over time. In our model, data are interpreted geometrically as multidimensional points with time dimension or as moving points. The queries are formulated with geometric objects and operations over them to form the query solid changing over time. The proposed query model allows us to pose arbitrary shape spatio-temporal range queries. With the uniform geometric model we integrate visual mining and querying of time-dependent data employing 3D visualization tools. It allows for creating an intuitive visual interface using 2D projections of 3D query shapes. Our approach combines visualization of spatio-temporal data with visualization of the range query formulation employing very compact function-based query model. The implemented visual query system and its visual interface are proposed and described. An example of application of the system in analysis of simulation results in molecular dynamics is considered.  相似文献   

3.
根据工矿企业生产过程中监控系统的实际需要,建立了数据存储与查询系统的模型,重点阐述了数据查询的具体实现过程。研究结果表明:文章提出的数据库结构模型和数据查询方法能够满足目前工矿企业监控系统中数据管理的需要,同时也便于企业对事故的预测预报、以及事故发生后的原因分析。  相似文献   

4.
伴随车辆是公安刑侦部门对海量车辆通行信息检索的一类实战需求,目的是通过模糊条件查询得到潜在的结伴作案车辆,究其本质,可将此类查询转化为数据挖掘中关联规则挖掘问题。通过对公路车辆智能监测记录系统采集的过车数据进行分析,将伴随车辆查询转化为关联规则挖掘,利用数据挖掘技术对过车数据查询问题进行综合分析,实现高效率的伴随车辆查询算法AVD(Accompany Vehicles D iscovery)。算法分析表明,AVD不但能提供准确的伴随车辆查询结果,而且效率高、扩展性强,具有较高的可行性。  相似文献   

5.
时序图作为一种带有时间维度的图结构,在图数据的查询处理与挖掘工作中扮演着越来越重要的角色.与传统的静态图不同,时序图的结构会随时间序列发生改变,即时序图的边由时间激活.而且由于时序图上每条边都有记录时间的标签,所以时序图包含的信息量相较于静态图也更为庞大,这使得现有的数据查询处理方法不能很好地应用于时序图中.因此如何解决时序图上的数据查询处理与挖掘问题得到研究者们的关注.对现有的时序图上的查询处理与挖掘方法进行了综述,详细介绍了时序图的应用背景和基本定义,梳理了现有的时序图模型,并从图查询处理方法、图挖掘方法和时序图管理系统3个方面对时序图上现有的工作进行了详细的介绍和分析.最后对时序图上可能的研究方向进行了展望,为相关研究提供参考.  相似文献   

6.
汪晴  庄卫华 《计算机工程》2010,36(21):78-80
基于TF-IQF模型的建议方法不考虑用户查询行为的上下文,在满足用户个性化需求方面存在缺陷。针对这一情况,在该方法的基础上进行优化改进,根据不同用户的查询上下文来分析用户的查询偏好,重新排序系统推荐的查询。实验结果表明,改进方法能够给出个性化的查询建议,提高用户查询的满意度。  相似文献   

7.
We proposed to utilize the scalable peer-to-peer network to perform the content-based image retrieval and mining, i.e, P2P-CBIRM. The decentralized unstructured P2P model with certain overheads, i.e., peer clustering and update procedures, is adopted to compromise with the structured one while still reserving flexible routing control when peers join/leave or network fails. The peer CBIRM engine is designed to utilize multi-instance query with multi-feature types to effectively reduce network traffic while maintaining high retrieval accuracy. It helps to enhance the knowledge discovery and image data mining capability. The proposed P2P-CBIRM system provides the scalable retrieval and mining function that the query scope and retrieval accuracy can be adaptively and progressively controlled. To improve the query efficiency (recall-rate/query-scope), it effectively utilizes both: 1) forwarding query message (forward phase) to reduce the query scope and 2) transmitting retrieval results (backward phase) such that activated peers keep filtering high similarity images on the link-path toward the query peer. Experiments show that the query efficiency of the scalable retrieval approach is better than previous methods, i.e., firework query model and breadth-first search. It provides a scalable knowledge discovery platform for efficient image data mining applications. We also proposed to optimally configure the P2P-CBIRM system such that, under a certain number of online users, it would yield the highest recall rate. Simulations demonstrate that, with the optimal configuration, recall rates can be improved to 2.5 to 3 times larger while the network traffic of each peer is reduced to 30% of the original, under the same number of on-line users.  相似文献   

8.
新一代数据挖掘语言分析及应用   总被引:5,自引:0,他引:5  
关系查询语言的标准化为关系系统的开发奠定了基础,同样好的数据挖掘查询语言也必将有助于挖掘系统平台的标准化。由于数据挖掘覆盖了广泛的分析任务,而且每个任务又具有各自不同的需求,因而开发、设计完善的数据挖掘语言具有非常重要的意义。介绍了新一代数据挖掘语言及应用状况,标准化的数据挖掘工具将使客户从降低成本和投资、方便使用中受益,也将使数据挖掘成为企业决策系统中不可或缺的一部分。  相似文献   

9.
The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon?test.  相似文献   

10.
一个性能良好的模糊控制系统的关键是它的模糊控制查询表。文章在详尽分析典型的模糊控制系统的基础上,给出了一种采用布尔关联规则挖掘技术,从人工操作记录数据库中直接挖掘模糊控制查询表的原理和方法。  相似文献   

11.
将频繁项集挖掘和查询扩展技术应用于信息检索,提出一种新的基于频繁模式挖掘与查询扩展的信息检索系统模型及其算法,并给出该检索系统模型的设计思想,以及模型总体结构及其各模块功能,实现了系统原型。实验结果表明,该检索系统模型能有效地提高和改善信息检索性能。  相似文献   

12.
基于矩阵加权关联规则挖掘的伪相关反馈查询扩展   总被引:13,自引:0,他引:13  
黄名选  严小卫  张师超 《软件学报》2009,20(7):1854-1865
提出一种面向查询扩展的矩阵加权关联规则挖掘算法,给出与其相关的定理及其证明过程.该算法采用4种剪枝策略,挖掘效率得到极大提高.实验结果表明,其挖掘时间比原来的平均时间减少87.84%.针对现有查询扩展的缺陷,将矩阵加权关联规则挖掘技术应用于查询扩展,提出新的查询扩展模型和更合理的扩展词权重计算方法.在此基础上提出一种伪相关反馈查询扩展算法——基于矩阵加权关联规则挖掘的伪相关反馈查询扩展算法,该算法能够自动地从前列n 篇初检文档中挖掘与原查询相关的矩阵加权关联规则,构建规则库,从中提取与原查询相关的扩展词,实现查询扩展.实验结果表明,该算法的检索性能确实得到了很好的改善.与现有查询扩展算法相比,在相同的查全率水平级下,其平均查准率有了明显的提高.  相似文献   

13.
为了将完全加权关联规则挖掘技术应用于查询扩展,提出面向查询扩展的基于多种剪枝策略的完全加权词间关联规则挖掘算法,该算法能够极大地提高挖掘效率;提出了一种新的查询扩展模型和扩展词权重计算方法,使扩展词权值更加合理,在此基础上提出一种新的基于局部反馈的查询扩展算法,该算法利用完全加权关联规则挖掘算法自动从局部反馈的前列初检文档中挖掘与原查询相关的完全加权关联规则,构建规则库,从中提取与原查询相关的扩展词,实现查询扩展。实验结果表明,查询扩展算法的检索性能确实得到了很好的改善和提高,与现有查询扩展算法比较,在相同的查全率水平级下其平均查准率有了明显的提高。  相似文献   

14.
A recent approach to improve the performance of XML query evaluation is to cache the query results of frequent query patterns. Unfortunately, discovering these frequent query patterns is an expensive operation. In this paper, we develop a two-pass mining algorithm 2PXMiner that guarantees the discovery of frequent query patterns by scanning the database at most twice. By exploiting a transaction summary data structure, and an enumeration tree, we are able to determine the upper bounds of the frequencies of the candidate patterns, and to quickly prune away the infrequent patterns. We also design an index to trace the repeating candidate subtrees generated by sibling repetition, thus avoiding redundant computations. Experiments results indicate that 2PXMiner is both efficient and scalable.  相似文献   

15.
Inferring query intent is significant in information retrieval tasks. Query subtopic mining aims to find possible subtopics for a given query to represent potential intents. Subtopic mining is challenging due to the nature of short queries. Learning distributed representations or sequences of words has been developed recently and quickly, making great impacts on many fields. It is still not clear whether distributed representations are effective in alleviating the challenges of query subtopic mining. In this paper, we exploit and compare the main semantic composition of distributed representations for query subtopic mining. Specifically, we focus on two types of distributed representations: paragraph vector which represents word sequences with an arbitrary length directly, and word vector composition. We thoroughly investigate the impacts of semantic composition strategies and the types of data for learning distributed representations. Experiments were conducted on a public dataset offered by the National Institute of Informatics Testbeds and Community for Information Access Research. The empirical results show that distributed semantic representations can achieve outstanding performance for query subtopic mining, compared with traditional semantic representations. More insights are reported as well.  相似文献   

16.
数据挖掘语言标准化的研究是开发新一代数据挖掘系统的关键。DMX(Data Mining Extensions,数据挖掘扩展)是OLE DBFor DM规范支持的数据挖掘查询语言,支持数据挖掘系统直接对关系数据库进行挖掘,是数据挖掘原语标准化发展中的一个突破。该文介绍了OLE DB For DM规范下数据挖掘的主要步骤,给出了Microsoft SQL Server Analysis Services中基于DMX的实现方法。  相似文献   

17.
18.
Li  GuoHui  Sun  Ping  Yuan  Ling  Wang  MingLi  Cheng  HongJu 《Multimedia Tools and Applications》2019,78(21):30197-30219
Multimedia Tools and Applications - Orthogonal region query has always been an important topic in the field of database query, geographic information system, computer graphics, data mining and...  相似文献   

19.
语义Web环境下的关联规则挖掘是数据挖掘领域新的研究热点.本文针对SWRL数据集的特征,建立新的数据挖掘形式背景,将FCA用于关系型关联规则的挖掘,提出了基于搜索空间分割的关联规则挖掘方法.采用FCA作为频繁模式的压缩表示方式,从生成的闭查询导出的关联规则,可有效控制冗余规则的产生.将搜索空间进行划分可减小问题的规模,充分利用已有的挖掘过程的中间结果所提供的信息,减少了计算量.由于采用了分而治之的策略,本文的方法易于扩展到对海量语义Web数据的并行处理.  相似文献   

20.
Multidimensional Index Structures in Relational Databases   总被引:2,自引:0,他引:2  
Efficient query processing is one of the basic needs for data mining algorithms. Clustering algorithms, association rule mining algorithms and OLAP tools all rely on efficient query processors being able to deal with high-dimensional data. Inside such a query processor, multidimensional index structures are used as a basic technique. As the implementation of such an index structure is a difficult and time-consuming task, we propose a new approach to implement an index structure on top of a commercial relational database system. In particular, we map the index structure to a relational database design and simulate the behavior of the index structure using triggers and stored procedures. This can be easily done for a very large class of multidimensional index structures. To demonstrate the feasibility and efficiency, we implemented an X-tree on top of Oracle8. We ran several experiments on large databases and recorded a performance improvement up to a factor of 11.5 compared to a sequential scan of the database.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号