共查询到20条相似文献,搜索用时 15 毫秒
1.
Thomas Reinartz 《Data mining and knowledge discovery》2002,6(2):191-210
In this paper, we consider instance selection as an important focusing task in the data preparation phase of knowledge discovery and data mining. Focusing generally covers all issues related to data reduction. First of all, we define a broader perspective on focusing tasks, choose instance selection as one particular focusing task, and outline the specification of concrete evaluation criteria to measure success of instance selection approaches. Thereafter, we present a unifying framework that covers existing approaches towards solutions for instance selection as instantiations. We describe specific examples of instantiations of this framework and discuss their strengths and weaknesses. Then, we outline an enhanced framework for instance selection, generic sampling, and summarize example evaluation results for several different instantiations of its implementation. Finally, we conclude with open issues and research challenges for instance selection as well as focusing in general. 相似文献
2.
面向企业的数据集成建模方法研究 总被引:3,自引:0,他引:3
针对目前企业普遍存在的信息孤岛林立和数据断层的局面,本文提出了一种进行企业数据集成的模型——企业统一数据模型,应用该模型可以较好地解决企业混乱的数据环境,建立一个统一高效的企业数据平台。同时本文结合在炼化企业建立统一数据模型的经验,详细论述了企业统一数据模型的建模方法。 相似文献
3.
4.
A New Approach to the Construction of Surfaces from Contour Data 总被引:12,自引:0,他引:12
This paper presents a new approach to the construction of a surface from a stack of contour slices. Unlike most existing methods, this new approach handles ambiguous conditions consistently without employing an algorithm to establish a correspondence between vertices on one contour and those on the next. It is easy to implement and fast to compute, requiring only basic geometric properties, namely closedness and simplicity, to be available with contour data. The advantages of this new approach have also been demonstrated with solutions to a few classical problems from the literature and some practical problems in medical imaging. It can also be applied to geographical surveying and keyframe animations. 相似文献
5.
Tsau Young Lin 《Applied Intelligence》2000,13(2):113-124
From the processing point of view, data mining is machine derivation of interesting properties (to human) from the stored data. Hence, the notion of machine oriented data modeling is explored: An attribute value, in a relational model, is a meaningful label (a property) of a set of entities (granule). A model using these granules themselves as attribute values (their bit patterns or lists of members) is called a machine oriented data model. The model provides a good database compaction and data mining environment. For moderate size databases, finding association rules, decision rules, and etc., can be reduced to easy computation of set theoretical operations of granules. In the second part, these notions are extended to real world objects, where the universe is granulated (clustered) into granules by binary relations. Data modeling and mining with such additional semantics are formulated and investigated. In such models, data mining is essentially a machine calculus of granules-granular computing. 相似文献
6.
《Computer Graphics and Applications, IEEE》1984,4(1):16-26
Procedural models can simulate an object's behavior as well as its appearance. When combined with data flow methods, they provide a useful approach to image composition and animation. 相似文献
7.
Adaptive Intrusion Detection: A Data Mining Approach 总被引:23,自引:0,他引:23
In this paper we describe a data mining framework for constructingintrusion detection models. The first key idea is to mine system auditdata for consistent and useful patterns of program and user behavior.The other is to use the set of relevant system features presented inthe patterns to compute inductively learned classifiers that canrecognize anomalies and known intrusions. In order for the classifiersto be effective intrusion detection models, we need to have sufficientaudit data for training and also select a set of predictive systemfeatures. We propose to use the association rules and frequentepisodes computed from audit data as the basis for guiding the auditdata gathering and feature selection processes. We modify these twobasic algorithms to use axis attribute(s) and referenceattribute(s) as forms of item constraints to compute only therelevant patterns. In addition, we use an iterative level-wiseapproximate mining procedure to uncover the low frequency butimportant patterns. We use meta-learning as a mechanism to makeintrusion detection models more effective and adaptive. We report ourextensive experiments in using our framework on real-world audit data. 相似文献
8.
9.
Spatio-Temporal Data Types: An Approach to Modeling and Querying Moving Objects in Databases 总被引:13,自引:2,他引:13
Martin Erwig Ralf Hartmut Gu¨ting Markus Schneider Michalis Vazirgiannis 《GeoInformatica》1999,3(3):269-296
Spatio-temporal databases deal with geometries changing over time. In general, geometries cannot only change in discrete steps, but continuously, and we are talking about moving objects. If only the position in space of an object is relevant, then moving point is a basic abstraction; if also the extent is of interest, then the moving region abstraction captures moving as well as growing or shrinking regions. We propose a new line of research where moving points and moving regions are viewed as 3-D (2-D space+time) or higher-dimensional entities whose structure and behavior is captured by modeling them as abstract data types. Such types can be integrated as base (attribute) data types into relational, object-oriented, or other DBMS data models; they can be implemented as data blades, cartridges, etc. for extensible DBMSs. We expect these spatio-temporal data types to play a similarly fundamental role for spatio-temporal databases as spatial data types have played for spatial databases. The paper explains the approach and discusses several fundamental issues and questions related to it that need to be clarified before delving into specific designs of spatio- temporal algebras. 相似文献
10.
Amjad Umar George Karabatis Linda Ness Bruce Horowitz Ahmed Elmagardmid 《Information Systems Frontiers》1999,1(3):279-301
Enterprise data—the data that is created, used and shared by a corporation in conducting business—is a critical business resource that must be analyzed, architected and managed with data quality as a guiding principle. This paper presents results, practical insights, and lessons learned from a large scale study conducted in the telecommunications industry that synthesizes data quality issues into an architectural and management approach. We describe the real life case study and show how requirements for data quality were collected, how the data quality metrics were defined, what guidelines were established for intersystem data flows, what COTS (commercial off-the-shelf) technologies were used, and what results were obtained through a prototype effort. As a result of experience gained and lessons learned, we propose a comprehensive data quality approach that combines data quality and data architectures into a single framework with a series of steps, procedures, checklists, and tools. Our approach takes into account the technology, process, and people issues and extends the extant literature on data quality. 相似文献
11.
12.
13.
提出了一种实时心电图ECG数据压缩算法。它是将自适应变门限算法与转折点算法相结合。自适应变门限算法是对AZTEC算法的改进.它计算ECG信号的几个统计参数来确定可变门限值。转折点算法是分析采样点的趋势并只存储每对连续的采样点中的一个。它保留信号的斜坡标志发生变化的峰点和谷点。本文算法兼有这两种算法的优点。这种算法在较高压缩比的情况下重建心电图信号失真较小. 相似文献
14.
迁移实例可以在不同的服务器间迁移,灵活的完成任务的同时也带来了更多的安全问题。本文在分析现有安全保护技术的基础上提出了为迁移实例建立具有免疫性的安全前哨,预先对迁移实例的下一工作位置进行检测,从而使迁移实例能够感知未来面临的危险因素。文中详细讨论了免疫体的构造、检测过程并与其他技术进行了比较。 相似文献
15.
A new approach to solving some problems of cluster analysis is proposed, which reduces a multi-dimensional problem to a one-dimensional one. 相似文献
16.
The problem of integrating data from multiple data sources—either on the Internet or within enterprises—has received much attention in the database and AI communities. The focus has been on building data integration systems that provide a uniform query interface to the sources. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the query interface and the source schemas. Examples of mappings are element location maps to address and price maps to listed-price. We propose a multistrategy learning approach to automatically find such mappings. The approach applies multiple learner modules, where each module exploits a different type of information either in the schemas of the sources or in their data, then combines the predictions of the modules using a meta-learner. Learner modules employ a variety of techniques, ranging from Naive Bayes and nearest-neighbor classification to entity recognition and information retrieval. We describe the LSD system, which employs this approach to find semantic mappings. To further improve matching accuracy, LSD exploits domain integrity constraints, user feedback, and nested structures in XML data. We test LSD experimentally on several real-world domains. The experiments validate the utility of multistrategy learning for data integration and show that LSD proposes semantic mappings with a high degree of accuracy. 相似文献
17.
越来越多的企业和组织选择通过广域、开放的互联网作为其协作平台,业务决策者往往需要即时汇聚并综合分析来自多个部门的资源信息以进行临机决策。如何即时构建满足用户需求的跨组织数据视图,动态维护视图和数据源之间的一致性是需要求解的一个关键问题。提出了一种互联网环境下跨组织业务数据视图的动态生成方法iViewer,利用数据服务来封装自治、异构和动态变化的数据源;通过可视化和易用的数据服务组合操作来动态构建数据视图;提出了一种基于轮询的视图动态更新算法,维护数据源和数据视图的一致性,从而使得数据视图能够随数据源的变化而自主变化;详述了iViewer方法的原理和过程,并通过一个火灾应急处置场景中,面向指挥中心的跨部门火灾救援设备数据视图的动态生成过程例证了iViewer方法的效果。 相似文献
18.
Although many more complex learning algorithms exist, k-nearest neighbor is still one of the most successful classifiers in real-world applications. One of the ways of scaling up the k-nearest neighbors classifier to deal with large datasets is instance selection. Due to the constantly growing amount of data in almost any pattern recognition task, we need more efficient instance selection algorithms, which must achieve larger reductions while maintaining the accuracy of the selected subset. 相似文献
19.
数据仓库设计的一种有效方法 总被引:6,自引:0,他引:6
通过数据仓库和数据库设计的比较,该文提出了一种基于数据库设计思想的数据仓库设计方法,给出了设计步骤,并介绍了源数据分析。同时,该方法在实际应用中取得了满意的效果。 相似文献
20.
Gonzalez Hector Han Jiawei Cheng Hong Li Xiaolei Klabjan Diego Wu Tianyi 《Knowledge and Data Engineering, IEEE Transactions on》2010,22(1):90-104
Massive Radio Frequency Identification (RFID) data sets are expected to become commonplace in supply chain management systems. Warehousing and mining this data is an essential problem with great potential benefits for inventory management, object tracking, and product procurement processes. Since RFID tags can be used to identify each individual item, enormous amounts of location-tracking data are generated. With such data, object movements can be modeled by movement graphs, where nodes correspond to locations and edges record the history of item transitions between locations. In this study, we develop a movement graph model as a compact representation of RFID data sets. Since spatiotemporal as well as item information can be associated with the objects in such a model, the movement graph can be huge, complex, and multidimensional in nature. We show that such a graph can be better organized around gateway nodes, which serve as bridges connecting different regions of the movement graph. A graph-based object movement cube can be constructed by merging and collapsing nodes and edges according to an application-oriented topological structure. Moreover, we propose an efficient cubing algorithm that performs simultaneous aggregation of both spatiotemporal and item dimensions on a partitioned movement graph, guided by such a topological structure. 相似文献