首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
提出了一种基于遗传算法的样本集数据分割方法。数据挖掘过程中该方法能够解决如何对一个样本集进行数据分割,从而得到最佳训练集和测试集的问题。通过该方法进行数据分割,不仅提高了分类模型的分类精度,而且能够最小化训练集和测试集之间的噪声百分比。最后,以一组软件项目样本数据为例说明该方法的有效性。  相似文献   

2.
Representing uncertain data: models, properties, and algorithms   总被引:1,自引:0,他引:1  
In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuitive models, which tend to be incomplete, and complete models, which tend to be nonintuitive and more complex than necessary for many applications. We present a space of models for representing uncertain data based on a variety of uncertainty constructs and tuple-existence constraints. We explore a number of properties and results for these models. We study completeness of the models, as well as closure under relational operations, and we give results relating closure and completeness. We then examine whether different models guarantee unique representations of uncertain data, and for those models that do not, we provide complexity results and algorithms for testing equivalence of representations. The next problem we consider is that of minimizing the size of representation of models, showing that minimizing the number of tuples also minimizes the size of constraints. We show that minimization is intractable in general and study the more restricted problem of maintaining minimality incrementally when performing operations. Finally, we present several results on the problem of approximating uncertain data in an insufficiently expressive model.  相似文献   

3.
As the need to analyze big data sets grows dramatically, the role that classification algorithms play in data mining techniques also increases. Big data analysis requires more of the data sets’ characteristics to be included, such as data structure, variety of sources, and the rate of update frequency. In this paper, we evaluate scenarios that examine which data set characteristics most affect the classification algorithms’ performance. It is still a complex issue to determine which algorithm is how strong or how weak in relation to which data set. Thus, our research experimentally examines how data set characteristics affect algorithm performance, both in terms of accuracy and in elapsed time. To do so, we use a multiple regression method to evaluate the causality between data set characteristics as independent variables, and performance metrics as dependent variables. We also examine the role that classification algorithms play as moderator in this causality. All benchmark data sets in a UCI database are used that are fit to run the classification algorithm. Based on the results of the experiment, we discuss the requirements of legacy classification algorithms to address big data analysis in a new business intelligence era.  相似文献   

4.
The implementation of a non-iterative atmospheric correction algorithm is described in detail and the performance of the algorithm is illustrated for several CZCS images. Chlorophyll retrieval is attempted using linear, power and polynomial regression for ratios of corrected images and the best correlation coefficients are in the region of 0-9. The same images are analysed in three spectral bands using the ISOCLS clustering algorithm and ocean areas are stratified into subclass patterns which correlate well with ratios and sea-truth. The monocluster blocks approach is used to extract training statistics for maximum likelihood classification of ocean areas and the results compare favourably with corresponding ratio images.  相似文献   

5.
徐雪松  舒俭 《计算机应用》2014,34(8):2285-2290
针对传统多模型数据集回归分析方法计算时间长、模型识别准确率低的问题,提出了一种新的启发式鲁棒回归分析方法。该方法模拟免疫系统聚类学习的原理,采用B细胞网络作为数据集的分类和存储工具,通过判断数据对模型的符合度进行分类,提高了数据分类的准确性,将模型集抽取过程分解成“聚类”“回归”“再聚类”的反复尝试过程,利用并行启发式搜索逼近模型集的解。仿真结果表明,所提方法回归分析时间明显少于传统算法,模型识别准确率明显高于传统算法。根据8模型数据集分析结果,传统算法中,效果最好的是基于RANSAC的逐次提取算法,其平均模型识别准确率为90.37%,需53.3947s;计算时间小于0.5s的传统算法,其准确率不足1%;所提算法仅需0.5094s,其准确率达到了98.25%。  相似文献   

6.
The interprocessor complete exchange communication pattern can be found in many important parallel algorithms. In this paper, we present algorithms for complete exchange on 2D mesh-connected multiprocessors. The unique feature of the proposed algorithms is that they are configurable where the time for message startups can be traded against larger message sizes. At one extreme, the algorithm minimizes the number of message startups at the expense of an increased amount of time spent in message transmission. At the other extreme, the time spent in message transmission is reduced at the expense of an increased number of message startups. The structure of the algorithms is such that intermediate solutions are feasible, i.e., the number of message startups can be increased slightly and the message transmission time is correspondingly reduced. The ability to configure these algorithms enables the algorithm characteristics to be matched with machine characteristics based on specific overheads for message initiation and link speeds to minimize overall execution time. In effect, the algorithms can be configured to strike the right balance between direct and message combining approaches on a specific architecture for a given problem size. We believe these algorithms are distinguished by this ability and contribute to efficient portable implementations of complete exchange algorithms  相似文献   

7.
Parallel algorithms for several common problems such as sorting and the FFT involve a personalized exchange of data among all the processors. Past approaches to doing complete exchange have taken one of two broad approaches: direct exchange or the indirect message-combining approaches. While combining approaches reduce the number of message startups, direct exchange minimizes the volume of data transmitted. This paper presents a family of hybrid algorithms for wormhole-routed 2D meshes that can effectively utilize the complementary strengths of these two approaches to complete exchange. The performance of hybrid algorithms using Cyclic Exchange and Scott's Direct Exchange are studied using analytical models, simulation, and implementation on a Cray T3D system. The results show that hybrids achieve lower completion times than either pure algorithm for a range of mesh sizes, data block sizes, and message startup costs. It is also demonstrated that barriers may be used to enhance performance by reducing message contention, whether or not the target system provides hardware support for barrier synchronization. The analytical models are shown useful in selecting the optimum hybrid for any given combination of system parameters (mesh size, message startup time, flit transfer time, and barrier cost) and the problem parameter (data block size)  相似文献   

8.
Choosing appropriate classification algorithms for a given data set is very important and useful in practice but also is full of challenges. In this paper, a method of recommending classification algorithms is proposed. Firstly the feature vectors of data sets are extracted using a novel method and the performance of classification algorithms on the data sets is evaluated. Then the feature vector of a new data set is extracted, and its k nearest data sets are identified. Afterwards, the classification algorithms of the nearest data sets are recommended to the new data set. The proposed data set feature extraction method uses structural and statistical information to characterize data sets, which is quite different from the existing methods. To evaluate the performance of the proposed classification algorithm recommendation method and the data set feature extraction method, extensive experiments with the 17 different types of classification algorithms, the three different types of data set characterization methods and all possible numbers of the nearest data sets are conducted upon the 84 publicly available UCI data sets. The results indicate that the proposed method is effective and can be used in practice.  相似文献   

9.
进化算法由于其强大的系统建模能力和空间搜索能力已被广泛应用于许多实际问题的求解中。然而,在算法进化的过程中存在个体适应值重复计算的问题,尤其在解决实际工程中的复杂问题时,适应值的计算会消耗大量时间。为此,利用哈希表的高速存取能力,将哈希表用于存取适应值的历史计算数据,从而避免优化过程中适应值的重复计算,并且对优化结果没有任何影响。仿真实验结果验证了此方法的有效性。  相似文献   

10.
The geometric processing of remote sensing images becomes a key issue in multi-source data integration, management and analysis for many geomatic applications. This paper first reviews the source of geometric distortions, compares the different mathematical models being currently used for geometric distortion modelling, details the algorithms, methods and processing steps and finally tracks the error propagation from the input to the final output data.  相似文献   

11.
A reprocessing of 12 years of global data from the Advanced Very High Resolution Radiometers on board the afternoon-viewing NOAA series satellites (NOAA-7, 9, and 11) is taking place as part of the NASA/NOAA Pathfinder project. A Pathfinder AVHRR land data set is being produced which is composed of global, 8 km NDVI with associated reflectances, brightness temperatures, solar and scan geometry, and cloud estimation. This data set is being processed using the best available methods in order to produce a consistent time series of data of unprecedented quality. Methods used in processing include a cross-satellite calibration, navigation using an orbital model and updated ephemerides, and correction for Rayleigh scattering. The data will be available to the community as both daily and composite data, and analysis of this long time series is expected to provide insight into terrestrial processes, seasonal and annual variability, and methods for handling large volume data sets.  相似文献   

12.
《Computers & Geosciences》1987,13(2):123-159
This paper presents a microprocessor-controlled counting system in combination with a software package. Together they form a powerful and time-saving tool for faultless handling of microfossil data.  相似文献   

13.
For a graph G=(V,E), a subset DV is an r-hop dominating set if every vertex uVD is at most r-hops away from D. It is a 2-connected r-hop dominating set if the subgraph of G induced by D is 2-connected. In this paper, we present two approximation algorithms to compute minimum 2-connected r-hop dominating set. The first one is a greedy algorithm using ear decomposition of 2-connected graphs. This algorithm is applicable to any 2-connected general graph. The second one is a three-phase algorithm which is only applicable to unit disk graphs. For both algorithms, performance ratios are given.  相似文献   

14.
15.
The present paper discusses an important task of noise suppression in images and video encountered in many applications including hydroacoustics. We provide the review of spatiotemporal methods and algorithms of noise suppression. We suggest a new technique for the noise suppression problem based on movement detector and the NLM algorithm (non-local means algorithm). The performance of the algorithms is considered using the filtering procedure of images and video. We also study filtering quality as a function of input parameters and we provide recommendations for selecting input parameters. The technique of estimating the filtering quality is suggested by combining such metrics as peak signal-to-noise ratio and the index of structural similarity.  相似文献   

16.
苑红星  卓雪雪  竺德  刘辉 《控制与决策》2022,37(6):1621-1631
决策粗糙集模型是当前粗糙集理论最为重要的研究分支之一.然而,由于现实环境下数据类型的复杂多样以及数据的动态更新,使得传统的决策粗糙集模型面临着一定的局限和不足,针对这一问题,提出一种混合型信息系统的邻域决策粗糙集模型,并设计出一种矩阵方法的邻域决策粗糙集增量式更新算法.首先,将传统的离散型决策粗糙集模型在混合型信息系统...  相似文献   

17.
Blood glucose control algorithms have evolved since the beginnings of the artificial pancreas in diabetes treatment. Although the main problem to solve remains as the regulation of blood glucose into the healthy physiological range, the schemes have evolved over time from on-off schemes to the data-based personalized schemes. The evolution has been in accordance with the understanding of glucose metabolism, the theoretical background to model it, and the availability of sensor technology. The algorithms have allowed the calculation of insulin infusion (sometimes glucagon or glucose), by the highly invasive intravenous route, up to schemes based on the minimally invasive subcutaneous route. Solutions have also been proposed to deal with delays in insulin action and glucose measurement, as well as robust schemes to reject disturbances due to meal intake, exercise, non-modeled dynamics, and parametric variations due to inter- and intravariability of metabolism. Other problem that control schemes have solved is the safety in insulin infusion, including the calculation of insulin on board to avoid episodes of hypoglycemia, guaranteeing glucose regulation in normoglycemia, and decreasing the time in hyperglycemia. To summarize the role of control algorithms in the development of the artificial pancreas, this paper presents a historical review of the proposed control algorithms, from the establishment of the paradigm of artificial pancreas to the present date.  相似文献   

18.
In this paper two heuristic algorithms are presented for the weighted set covering problem. The first algorithm uses a simple, polynomial procedure to construct feasible covering solutions. The procedure is shown to possess a worst case performance bound that is a function of the size of the problem. The second algorithm is a solution improvement procedure that attempts to form reduced cost composite solutions from available feasible covering solutions. Computational results are presented for both algorithms on several large set covering problems generated from airline crew scheduling data.  相似文献   

19.
We present optimization models and solution algorithms for the Vanpool Assignment Problem. A vanpool is typically a group of 9-15 passengers who share their commute to a common target location (typically an office building or corporate campus). Commuters in a vanpool drive from their homes to a park-and-ride location where they board a van and ride together to the target location; at the end of the work day they ride together back to the park-and-ride location. The Minimum Cost Vanpool Assignment Model (MCVAM) developed in this study is motivated by a program offered by Gulfstream Aerospace, a large employer in the Dallas/Fort-Worth area, Dallas Area Rapid Transit (DART), and Enterprise Rent-A-Car. Our MCVAM imposes constraints on the capacity of each van and quality-of-service constraints on the cost and travel time involved in joining a vanpool. The goal of the MCVAM is to minimize the total cost of a one-way trip to the target location for all employees (including those employees who opt-out of the program and choose not to join a vanpool). To the best of our knowledge, this is the first mathematical programming model proposed for the standard (one-stop) Vanpool Assignment Problem. The MCVAM models the current practice in vanpooling of using one park-and-ride location per vanpool. We also present a Two-Stop MCVAM (TSMCVAM) that offers significant cost savings compared to the MCVAM with little or no increase in trip times for most passengers by allowing vanpools to stop at a second park-and-ride location. We present heuristics for the TSMCVAM which are shown in a computational study to find solutions with optimality gaps ranging from 5% to 10% in CPU times ranging from 1 to 15 min for problem instances with up to 600 employees and 120 potential park-and-ride locations.  相似文献   

20.
In this paper we discuss the successful execution of the LIM+ challenge problems as proposed by Bledsoe. This problem set ranges from a 12-step nonequality proof to a complex 41-step paramodulation proof. Our theorem prover is based on RUE resolution, which incorporates the axioms of equality into the definition of resolution. We apply hyperresolution as a restriction strategy and produce RUE hyper-refutations without the use of paramodulation. We present an extensive treatment of the heuristics applied to find proofs, both standalone and interactive.This work was supported by the National Science Foundation Grant CCR-9024953.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号