首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we consider instance selection as an important focusing task in the data preparation phase of knowledge discovery and data mining. Focusing generally covers all issues related to data reduction. First of all, we define a broader perspective on focusing tasks, choose instance selection as one particular focusing task, and outline the specification of concrete evaluation criteria to measure success of instance selection approaches. Thereafter, we present a unifying framework that covers existing approaches towards solutions for instance selection as instantiations. We describe specific examples of instantiations of this framework and discuss their strengths and weaknesses. Then, we outline an enhanced framework for instance selection, generic sampling, and summarize example evaluation results for several different instantiations of its implementation. Finally, we conclude with open issues and research challenges for instance selection as well as focusing in general.  相似文献   

2.
面向企业的数据集成建模方法研究   总被引:3,自引:0,他引:3  
针对目前企业普遍存在的信息孤岛林立和数据断层的局面,本文提出了一种进行企业数据集成的模型——企业统一数据模型,应用该模型可以较好地解决企业混乱的数据环境,建立一个统一高效的企业数据平台。同时本文结合在炼化企业建立统一数据模型的经验,详细论述了企业统一数据模型的建模方法。  相似文献   

3.
本文阐明了"数据结构"教学过程中应用案例的重要性,并对示例案例进行详细的分析、设计;描述了如何进行案例教学的全过程;最后将案例研究应用于实际教学中,结合实验教学展示了通过数据结构案例进行教学的一个实例。  相似文献   

4.
A New Approach to the Construction of Surfaces from Contour Data   总被引:12,自引:0,他引:12  
This paper presents a new approach to the construction of a surface from a stack of contour slices. Unlike most existing methods, this new approach handles ambiguous conditions consistently without employing an algorithm to establish a correspondence between vertices on one contour and those on the next. It is easy to implement and fast to compute, requiring only basic geometric properties, namely closedness and simplicity, to be available with contour data. The advantages of this new approach have also been demonstrated with solutions to a few classical problems from the literature and some practical problems in medical imaging. It can also be applied to geographical surveying and keyframe animations.  相似文献   

5.
Data Mining and Machine Oriented Modeling: A Granular Computing Approach   总被引:8,自引:0,他引:8  
From the processing point of view, data mining is machine derivation of interesting properties (to human) from the stored data. Hence, the notion of machine oriented data modeling is explored: An attribute value, in a relational model, is a meaningful label (a property) of a set of entities (granule). A model using these granules themselves as attribute values (their bit patterns or lists of members) is called a machine oriented data model. The model provides a good database compaction and data mining environment. For moderate size databases, finding association rules, decision rules, and etc., can be reduced to easy computation of set theoretical operations of granules. In the second part, these notions are extended to real world objects, where the universe is granulated (clustered) into granules by binary relations. Data modeling and mining with such additional semantics are formulated and investigated. In such models, data mining is essentially a machine calculus of granules-granular computing.  相似文献   

6.
Procedural models can simulate an object's behavior as well as its appearance. When combined with data flow methods, they provide a useful approach to image composition and animation.  相似文献   

7.
Adaptive Intrusion Detection: A Data Mining Approach   总被引:23,自引:0,他引:23  
In this paper we describe a data mining framework for constructingintrusion detection models. The first key idea is to mine system auditdata for consistent and useful patterns of program and user behavior.The other is to use the set of relevant system features presented inthe patterns to compute inductively learned classifiers that canrecognize anomalies and known intrusions. In order for the classifiersto be effective intrusion detection models, we need to have sufficientaudit data for training and also select a set of predictive systemfeatures. We propose to use the association rules and frequentepisodes computed from audit data as the basis for guiding the auditdata gathering and feature selection processes. We modify these twobasic algorithms to use axis attribute(s) and referenceattribute(s) as forms of item constraints to compute only therelevant patterns. In addition, we use an iterative level-wiseapproximate mining procedure to uncover the low frequency butimportant patterns. We use meta-learning as a mechanism to makeintrusion detection models more effective and adaptive. We report ourextensive experiments in using our framework on real-world audit data.  相似文献   

8.
在程序设计教学中,实例设计的优劣影响整个程序设计的教学效果。本文以VB程序设计教学为例,提出一种基于建构主义学习理论、融合情境创设教学理念的实例设计方法。使用生活中真实熟悉的软件实例,从整体上由浅入深渐进式把握程序设计的教学。教研组多次教学实践证明,使用该方法设计实例进行教学,易于激发学生的学习兴趣和探究欲望,利于学生的主动意义建构。  相似文献   

9.
Spatio-temporal databases deal with geometries changing over time. In general, geometries cannot only change in discrete steps, but continuously, and we are talking about moving objects. If only the position in space of an object is relevant, then moving point is a basic abstraction; if also the extent is of interest, then the moving region abstraction captures moving as well as growing or shrinking regions. We propose a new line of research where moving points and moving regions are viewed as 3-D (2-D space+time) or higher-dimensional entities whose structure and behavior is captured by modeling them as abstract data types. Such types can be integrated as base (attribute) data types into relational, object-oriented, or other DBMS data models; they can be implemented as data blades, cartridges, etc. for extensible DBMSs. We expect these spatio-temporal data types to play a similarly fundamental role for spatio-temporal databases as spatial data types have played for spatial databases. The paper explains the approach and discusses several fundamental issues and questions related to it that need to be clarified before delving into specific designs of spatio- temporal algebras.  相似文献   

10.
Enterprise data—the data that is created, used and shared by a corporation in conducting business—is a critical business resource that must be analyzed, architected and managed with data quality as a guiding principle. This paper presents results, practical insights, and lessons learned from a large scale study conducted in the telecommunications industry that synthesizes data quality issues into an architectural and management approach. We describe the real life case study and show how requirements for data quality were collected, how the data quality metrics were defined, what guidelines were established for intersystem data flows, what COTS (commercial off-the-shelf) technologies were used, and what results were obtained through a prototype effort. As a result of experience gained and lessons learned, we propose a comprehensive data quality approach that combines data quality and data architectures into a single framework with a series of steps, procedures, checklists, and tools. Our approach takes into account the technology, process, and people issues and extends the extant literature on data quality.  相似文献   

11.
信息如何被高效存储和传递的问题一直是计算机研究的一个重要课题,而解决这一问题的最常用的就是数据压缩技术。首先讲述了数据压缩的原理、分类,然后用哈夫曼方法编写了一个用于无损压缩的算法并对这个算法进行了详细的描述。这个算法不仅适用于文档类文件的压缩,还可以对图像类文件进行压缩。最后对这个算法进行分析得出结论。  相似文献   

12.
数据压缩是体数据可视化研究中的一个重要问题.随着体数据的维数、分辨率、变量个数不断增加,数据量呈指数增长,体数据的压缩研究显得更为重要.文中从有损压缩和无损压缩2个方面对已有的体数据压缩方法进行了总结,对无损压缩的各类方法的优缺点进行了比较,对有损压缩的一般流程和其中的重要步骤做了详细的介绍,同时着重介绍了时变体数据压缩方法.最后提出了该领域需进一步探索的方向.  相似文献   

13.
提出了一种实时心电图ECG数据压缩算法。它是将自适应变门限算法与转折点算法相结合。自适应变门限算法是对AZTEC算法的改进.它计算ECG信号的几个统计参数来确定可变门限值。转折点算法是分析采样点的趋势并只存储每对连续的采样点中的一个。它保留信号的斜坡标志发生变化的峰点和谷点。本文算法兼有这两种算法的优点。这种算法在较高压缩比的情况下重建心电图信号失真较小.  相似文献   

14.
迁移实例可以在不同的服务器间迁移,灵活的完成任务的同时也带来了更多的安全问题。本文在分析现有安全保护技术的基础上提出了为迁移实例建立具有免疫性的安全前哨,预先对迁移实例的下一工作位置进行检测,从而使迁移实例能够感知未来面临的危险因素。文中详细讨论了免疫体的构造、检测过程并与其他技术进行了比较。  相似文献   

15.
A new approach to solving some problems of cluster analysis is proposed, which reduces a multi-dimensional problem to a one-dimensional one.  相似文献   

16.
Learning to Match the Schemas of Data Sources: A Multistrategy Approach   总被引:5,自引:0,他引:5  
Doan  AnHai  Domingos  Pedro  Halevy  Alon 《Machine Learning》2003,50(3):279-301
The problem of integrating data from multiple data sources—either on the Internet or within enterprises—has received much attention in the database and AI communities. The focus has been on building data integration systems that provide a uniform query interface to the sources. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the query interface and the source schemas. Examples of mappings are element location maps to address and price maps to listed-price. We propose a multistrategy learning approach to automatically find such mappings. The approach applies multiple learner modules, where each module exploits a different type of information either in the schemas of the sources or in their data, then combines the predictions of the modules using a meta-learner. Learner modules employ a variety of techniques, ranging from Naive Bayes and nearest-neighbor classification to entity recognition and information retrieval. We describe the LSD system, which employs this approach to find semantic mappings. To further improve matching accuracy, LSD exploits domain integrity constraints, user feedback, and nested structures in XML data. We test LSD experimentally on several real-world domains. The experiments validate the utility of multistrategy learning for data integration and show that LSD proposes semantic mappings with a high degree of accuracy.  相似文献   

17.
越来越多的企业和组织选择通过广域、开放的互联网作为其协作平台,业务决策者往往需要即时汇聚并综合分析来自多个部门的资源信息以进行临机决策。如何即时构建满足用户需求的跨组织数据视图,动态维护视图和数据源之间的一致性是需要求解的一个关键问题。提出了一种互联网环境下跨组织业务数据视图的动态生成方法iViewer,利用数据服务来封装自治、异构和动态变化的数据源;通过可视化和易用的数据服务组合操作来动态构建数据视图;提出了一种基于轮询的视图动态更新算法,维护数据源和数据视图的一致性,从而使得数据视图能够随数据源的变化而自主变化;详述了iViewer方法的原理和过程,并通过一个火灾应急处置场景中,面向指挥中心的跨部门火灾救援设备数据视图的动态生成过程例证了iViewer方法的效果。  相似文献   

18.
Although many more complex learning algorithms exist, k-nearest neighbor is still one of the most successful classifiers in real-world applications. One of the ways of scaling up the k-nearest neighbors classifier to deal with large datasets is instance selection. Due to the constantly growing amount of data in almost any pattern recognition task, we need more efficient instance selection algorithms, which must achieve larger reductions while maintaining the accuracy of the selected subset.  相似文献   

19.
数据仓库设计的一种有效方法   总被引:6,自引:0,他引:6  
通过数据仓库和数据库设计的比较,该文提出了一种基于数据库设计思想的数据仓库设计方法,给出了设计步骤,并介绍了源数据分析。同时,该方法在实际应用中取得了满意的效果。  相似文献   

20.
Modeling Massive RFID Data Sets: A Gateway-Based Movement Graph Approach   总被引:1,自引:0,他引:1  
Massive Radio Frequency Identification (RFID) data sets are expected to become commonplace in supply chain management systems. Warehousing and mining this data is an essential problem with great potential benefits for inventory management, object tracking, and product procurement processes. Since RFID tags can be used to identify each individual item, enormous amounts of location-tracking data are generated. With such data, object movements can be modeled by movement graphs, where nodes correspond to locations and edges record the history of item transitions between locations. In this study, we develop a movement graph model as a compact representation of RFID data sets. Since spatiotemporal as well as item information can be associated with the objects in such a model, the movement graph can be huge, complex, and multidimensional in nature. We show that such a graph can be better organized around gateway nodes, which serve as bridges connecting different regions of the movement graph. A graph-based object movement cube can be constructed by merging and collapsing nodes and edges according to an application-oriented topological structure. Moreover, we propose an efficient cubing algorithm that performs simultaneous aggregation of both spatiotemporal and item dimensions on a partitioned movement graph, guided by such a topological structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号