首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two-level supersaturated designs (SSDs) are designs that examine more than n−1 factors in n runs. Although SSD literature for both construction and analysis is plentiful, the dearth of actual applications suggests that SSDs are still an unproven tool. Whether using forward selection or all-subsets regression, it is easy to select simple models from SSDs that explain a very large percentage of the total variation. Hence, naive p-values can persuade the user that included factors are indeed active. We propose the use of a global model randomization test in conjunction with all-subsets (or a shrinkage method) to more appropriately select candidate models of interest. For settings where the large number of factors makes repeated use of all-subsets expensive, we propose a short-cut approximation for the p-values. Two state-of-the-art model selection methods that have received considerable attention in recent years, Least Angle Regression and the Dantzig Selector, were likewise supplemented with the global randomization test. Finally, we propose a randomization test for reducing the number of terms in candidate models with small global p-values. Randomization tests effectively emphasize the limitations of SSDs, especially those with a large factor to run size ratio.  相似文献   

2.
Economic forces, driven by the desire to introduce flash into the high-end storage market without changing existing software-base, have resulted in the emergence of solid-state drives (SSDs), flash packaged in HDD form factors and capable of working with device drivers and I/O buses designed for HDDs. Unlike the use of DRAM for caching or buffering, however, certain idiosyncrasies of NAND Flash-based solid-state drives (SSDs) make their integration into hard disk drive (HDD)-based storage systems nontrivial. Flash memory suffers from limits on its reliability, is an order of magnitude more expensive than the magnetic hard disk drives (HDDs), and can sometimes be as slow as the HDD (due to excessive garbage collection (GC) induced by high intensity of random writes). Given the complementary properties of HDDs and SSDs in terms of cost, performance, and lifetime, the current consensus among several storage experts is to view SSDs not as a replacement for HDD, but rather as a complementary device within the high-performance storage hierarchy. Thus, we design and evaluate such a hybrid storage system with HybridPlan that is an improved capacity planning technique to administrators with the overall goal of operating within cost-budgets. HybridPlan is able to find the most cost-effective hybrid storage configuration with different types of SSDs and HDDs  相似文献   

3.
针对常用聚类分析算法应用于入侵检测系统所存在的两大方面的问题:一是其采用随机法确定初始聚类中心,不同的初始值可能产生不同的聚类结果;二是采用爬山式技术导致容易陷入局部最优解。基于此提出一种改进的聚类分析算法,通过确定两个最远初始聚类中心和基于最大最小距离的层次聚类、DBI指标来确定剩余初始聚类中心,该方法使上述问题得到解决,并通过仿真实验验证了该算法的可行性和优越性。  相似文献   

4.
Identification of active factors in supersaturated designs (SSDs) has been the subject of much recent study. Although several methods have been previously proposed, a solution to the problem beyond one or two active factors still seems to be unsatisfactory. The smoothly clipped absolute deviation (SCAD) penalty function for variable selection has nice theoretical properties, but due to its nonconvex nature, it poses computational issues in model fitting. As a result, so far it has not shown much promise for SSDs. Another issue regarding its inefficiency, particularly for SSDs, has been the method used for choosing the SCAD sparsity tuning parameter. The selection of the SCAD sparsity tuning parameter using the AIC and BIC information criteria, generalized cross-validation, and a recently proposed method based on the norm of the error in the solution of systems of linear equations are investigated. This is performed in conjunction with a recently developed more efficient algorithm for implementing the SCAD penalty. The small sample bias-corrected cAIC is found to yield a model size closer to the true model size. Results of the numerical study and real data analyses reveal that the SCAD is a valuable tool for identifying active factors in SSDs.  相似文献   

5.
Fuzzy c-means (FCM) algorithm is an important clustering method in pattern recognition, while the fuzziness parameter, m, in FCM algorithm is a key parameter that can significantly affect the result of clustering. Cluster validity index (CVI) is a kind of criterion function to validate the clustering results, thereby determining the optimal cluster number of a data set. From the perspective of cluster validation, we propose a novel method to select the optimal value of m in FCM, and four well-known CVIs, namely XB, VK, VT, and SC, for fuzzy clustering are used. In this method, the optimal value of m is determined when CVIs reach their minimum values. Experimental results on four synthetic data sets and four real data sets have demonstrated that the range of m is [2, 3.5] and the optimal interval is [2.5, 3].  相似文献   

6.
Log-structured merge tree (i.e., LSM-tree)-based key–value stores (i.e., KV stores) are widely used in big-data applications and provide high performance. NAND Flash-based Solid-state disks (i.e., SSDs) have become a popular storage device alternative to hard disk drives (i.e., HDDs) because of their high performance and low power consumption. LSM-tree KV stores with SSDs are deployed in large-scale storage systems, which aims to achieve high performance in the cloud. Write amplification in LSM-tree KV stores and NAND Flash memory in SSDs are defined as WA1 and WA2 in this paper. The former, which is attributed to compaction operations in LSM-tree-based KV stores, is a burden on I/O bandwidth between the host and the device. The latter, which results from out-place updates in NAND Flash memory, blocks user I/O requests between the host and NAND Flash memory, thereby degrading the SSD performance. Write amplification impairs the overall system performance. In this study, we explored the two-level cascaded write amplification in LSM-tree KV stores with SSDs. The cascaded write amplification is represented as WA. Our primary goal is to comprehensively study two-level cascaded write amplification on the host-side LSM-tree KV stores and the device-side SSDs. We quantitatively analyze the impact of two-level write amplification on overall performance. The cascaded write amplification is 16.44 (WA1 is 16.55; WA2 is 0.99) and 35.51 (WA1 is 16.6; WA2 is 2.14) for SSD-I and SSD-S with LevelDB’s default setting under DB_bench. The larger cascaded write amplification of KV stores has a bad impact on SSD performance and lifetime. The throughput of SSD-S and SSD-I under an 80%-write workload is approximately 0.28x and 0.31x of that under a 100%-write workload. Therefore, it is important to design a novel approach to balance the cost of an SSD lifetime caused by cascaded write amplification and its high performance under the read-write-mixed workloads. We attempt to reveal details of cascaded write amplification and hope that this study is useful for developers of LSM-tree-based KV stores and SSD software stacks.  相似文献   

7.
Spatial indexing on flash-based Solid State Drives (SSDs) has become a core aspect in spatial database applications, and has been carried out by flash-aware spatial indices. Although there are some flash-aware spatial indices proposed in the literature, they do not exploit all the benefits of SSDs, leading to loss of efficiency and durability. In this article, we propose eFIND, a new generic and efficient framework for flash-aware spatial indexing. eFIND takes into account the intrinsic characteristics of SSDs by employing (i) a write buffer to avoid expensive random writes, (ii) a flushing algorithm that smartly picks modifications to be flushed in batch to the SSD, (iii) a read buffer to decrease the overhead of random reads, (iv) a temporal control to avoid interleaved reads and writes, and (v) a log-structured approach to provide data durability. Performance tests showed the efficiency of eFIND. Compared to the state of the art, eFIND improved the construction of spatial indices from 43% to 77%, and the spatial query processing from 4% to 23%.  相似文献   

8.
According to previous studies, the Poisson model and negative binomial model could not accurately estimate the wafer yield. Numerous mathematical models proposed in past years were very complicated. Furthermore, other neural networks models can not provide a certain equation for managers to use. Thus, a novel design of this paper is to construct a new wafer yield model with a handy polynomial by using group method of data handling (GMDH). In addition to defect cluster index (CIM), 12 critical electrical test parameters are also considered simultaneously. Because the number of input variables for GMDH is inadvisable to be too many, principal component analysis (PCA) is used to reduce the dimensions of 12 critical electrical test parameters to a manageable few without much loss of information. The proposed approach is validated by a case obtained in a DRAM company in Taiwan.  相似文献   

9.
The classification of observations into groups is a general procedure in modern research. However, when searching for homogeneous groups the difficulty of deciding whether further division of a classification is necessary or not to obtain the desired homogeneous groups arises. The presented method, Combined cluster and discriminant analysis (CCDA), aims to facilitate this decision.CCDA consists of three main steps: (I) a basic grouping procedure; (II) a core cycle where the goodness of preconceived and random classifications is determined; and (III) an evaluation step where a decision has to be made regarding division into sub-groups. These steps of the proposed method were implemented in R in a package, under the name of ccda.To present the applicability of the method, a case study on the water quality samples of Neusiedler See is presented, in which CCDA classified the 33 original sampling locations into 17 homogeneous groups, which could provide a starting point for a later recalibration of the lake's monitoring network.  相似文献   

10.
Nowadays Video-On-Demand (VOD) caching systems are often equipped with hybrid storage devices, which have been designed to combine the high read speed of Solid State Disks (SSDs) and the large capacity of Hard Disk Drives (HDDs). However, the number of erase cycles of SSDs is limited. So it is important to control the write load of SSDs in real applications. This paper proposes a Feedback-based Adaptive Data Migration (FADM) method, which can utilize the real-time feedback of the write load of SSDs to adjust the rule of moving data between HDDs and SSDs. More specifically, a video in HDDs is allowed to be moved into SSDs when its popularity is higher than that of the least popular video in SSDs by a threshold. This threshold is adaptively adjusted according to the feedback of the write load of SSDs. With FADM, the desired lifetime of SSDs can be well guaranteed even under various user behaviors while good read performance can be maintained. Simulations are done to demonstrate the effectiveness of FADM.  相似文献   

11.
This paper addresses cluster synchronization of coupled harmonic oscillators over directed fixed and switching topologies, where uncoupled harmonic oscillators in the same cluster have the identical node dynamics, while any pair of nodes in different clusters are essentially distinguishing according to their local dynamics. A pinning control protocol is proposed for such coupled harmonic oscillators system, and then some graphic topology conditions easy to verify are established for cluster synchronization. The main contributions of the present investigation include: (i) cluster synchronization problem of coupled nonidentical harmonic oscillators is addressed over directed topology; (ii) desynchronizing motion of harmonic oscillators from different clusters doesn't depend on negative coupling weights but leader of every cluster; (iii) this paper deals with what kind of harmonic oscillators ought to be pinned in order to reach cluster synchronization. Finally, numerical examples and simulations demonstrate the obtained theoretical results.  相似文献   

12.
基于熵相关系数的关联性自动判别方法——COCA   总被引:2,自引:0,他引:2  
王珊  曹巍  覃雄派 《计算机应用》2006,26(9):2005-2008
数据库自管理、自调优中查询计划的自动优化是目前的关注热点。为保证优化器估值精度,用统计学方法,给出了一种基于熵相关系数的对字段关联性的自动判别的新算法——COCA。该算法有下列特点:(1)限制少,没有卡方检验的频数限制,卡方检验只有在列联表中至少有80%的格子频数大于5的情况下才可信; (2)结果多,卡方检验(CORDS)只判断字段之间是否有关联,新方法可计算字段之间双向的关联程度。实验表明, 新方法更坚固,产生更多的统计信息,可以支持后面更高效、准确地建立直方图。  相似文献   

13.
Downscaling techniques are used to obtain high-resolution climate projections for assessing the impacts of climate change at a regional scale. This study presents a statistical downscaling tool, SCADS, based on stepwise cluster analysis method. The SCADS uses a cluster tree to represent the complex relationship between large-scale atmospheric variables (namely predictors) and local surface variables (namely predictands). It can effectively deal with continuous and discrete variables, as well as nonlinear relations between predictors and predictands. By integrating ancillary functional modules of missing data detecting, correlation analysis, model calibration and graphing of cluster trees, the SCADS is capable of performing rapid development of downscaling scenarios for local weather variables under current and future climate forcing. An application of SCADS is demonstrated to obtain 10 km daily mean temperature and monthly precipitation projections for Toronto, Canada in 2070–2099. The contemporary reanalysis data derived from NARR is used for model calibration (1981–1990) and validation (1991–2000). The validated cluster trees are then applied for generating future climate projections.  相似文献   

14.
Topical Web crawling is an established technique for domain-specific information retrieval. However, almost all the conventional topical Web crawlers focus on building crawlers using different classifiers, which needs a lot of labeled training data that is very difficult to labelmanually. This paper presents a novel approach called clustering-based topical Web crawling which is utilized to retrieve information on a specific domain based on link-context and does not require any labeled training data. In order to collect domain-specific content units, a novel hierarchical clustering method called bottom-up approach is used to illustrate the process of clustering where a new data structure, a linked list in combination with CFu-tree, is implemented to store cluster label, feature vector and content unit. During clustering, four metrics are presented. First, comparison variation (CV) is defined to judge whether the closest pair of clusters can be merged. Second, cluster impurity (CIP) evaluates the cluster error. Then, the precision and recall of clustering are also presented to evaluate the accuracy and comprehensive degree of the whole clustering process. Link-context extraction technique is used to expand the feature vector of anchor text which improves the clustering accuracy greatly. Experimental results show that the performance of our proposed method overcomes conventional focused Web crawlers both in Harvest rate and Target recall.  相似文献   

15.
Clustering properties of hierarchical self-organizing maps   总被引:1,自引:0,他引:1  
A multilayer hierarchical self-organizing map (HSOM) is discussed as an unsupervised clustering method. The HSOM is shown to form arbitrarily complex clusters, in analogy with multilayer feedforward networks. In addition, the HSOM provides a natural measure for the distance of a point from a cluster that weighs all the points belonging to the cluster appropriately. In experiments with both artificial and real data it is demonstrated that the multilayer SOM forms clusters that match better to the desired classes than do direct SOM's, classical k-means, or Isodata algorithms.  相似文献   

16.
In breast cancer studies, researchers often use clustering algorithms to investigate similarity/dissimilarity among different cancer cases. The clustering algorithm design becomes a key factor to provide intrinsic disease information. However, the traditional algorithms do not meet the latest multiple requirements simultaneously for breast cancer objects. The Variable parameters, Variable densities, Variable weights, and Complicated Objects Clustering Algorithm (V3COCA) presented in this paper can handle these problems very well. The V3COCA (1) enables alternative inputs of none or a series of objects for disease research and computer aided diagnosis; (2) proposes an automatic parameter calculation strategy to create clusters with different densities; (3) enables noises recognition, and generates arbitrary shaped clusters; and (4) defines a flexibly weighted distance for measuring the dissimilarity between two complicated medical objects, which emphasizes certain medically concerned issues in the objects. The experimental results with 10,000 patient cases from SEER database show that V3COCA can not only meet the various requirements of complicated objects clustering, but also be as efficient as the traditional clustering algorithms.  相似文献   

17.
人工势场法由于其在构型组织能力上的不足,影响了该方法在集群航路规划上的应用,为此提出基于二重势函数法的集群航路规划法,通过第一重势能场形成集群到目标的可行路径,通过第二重势能场形成构型,从而实现集群航路规划.此外,针对人工势场法存在无谓避碰、陷阱问题等不足,通过引入碰撞危险度来确定障碍物影响距离以及虚拟障碍物,提出改进...  相似文献   

18.
To boost the performance of massive data processing, solid-state drives (SSDs) have been used as a kind of cache in the Hadoop system. However, most of existing SSD cache management algorithms are ignorant of the characteristics of upper-level applications. In this paper, we propose a novel SSD cache management algorithm called DSA, which can exploit the application-level data similarity to improve the SSD cache performance in Hadoop. Our algorithm takes both temporal similarity and user similarity in querying behaviors into account. We evaluate the effectiveness of our proposed DSA algorithm in a small-scale Hadoop cluster. Our experimental results show that our algorithm can achieve much better performance than other well-known algorithms (e.g., LRU, FIFO). We also clearly point out the underlying tradeoff between cache performance and SSD deployment cost, and identify a number of key factors that affect SSD cache performance. Our findings can provide useful guidelines on how to effectively integrate SSDs into Hadoop.  相似文献   

19.
Evolving NAND flash-based Solid State Drives (SSDs) tend to get denser and faster, and these are quickly becoming popular in a wide variety of applications. Flash-based SSDs are composed of dozens of non-volatile flash memories with multi-channel and multi-way architecture. Due to the physical limits, Flash Translation Layer (FTL) is employed for the management between host requests and flash requests operations. Among many roles of FTL, mapping management is main key of SSD performance. This paper presents tradeoffs of page-level FTL mapping granularity for appropriate target performance of SSDs. The mapping management is designed with regard to the SSD architecture such as multi-channel and multi-way. Three mapping tradeoff issues are addressed: static and dynamic mapping, mapping unit size, and caching issue. The simulation results shows that various page-level FTL mapping granularities have a decisive effect on SSD design; not only the performance issue, but also resource management.  相似文献   

20.
This paper evaluates the ability of small footprint, multiple return and pulsed airborne scanner data to classify tree genera hierarchically using stepwise cluster analysis. Leaf-on and leaf-off airborne scanner datasets obtained in the Washington Park Arboretum, Seattle, Washington, USA were used for tree genera classification. Parameters derived from structure and intensity data from the leaf-on and leaf-off laser scanning datasets were compared to ground truth data. Relative height percentiles and simple crown shapes using the ratio of a crown length to width were computed for the structure variables. Selected structure variables from the leaf-on dataset had higher classification rate (74.9%) than those from the leaf-off dataset (50.2%) for distinguishing deciduous from coniferous genera using linear discriminant functions.Unsupervised stepwise cluster analysis was conducted to find groupings of similar genera at consecutive steps using k-medoid algorithm. The three stepwise cluster analyses using different seasonal laser scanning datasets resulted in different outcomes, which imply that genera might be grouped differently depending on the timing of the data collection. When combining leaf-on and leaf-off LIDAR datasets, the cluster analysis could separate the deciduous genera from evergreen coniferous genera and could make further separations between evergreen coniferous genera. When using the leaf-on LIDAR dataset only, the cluster analysis did not separate deciduous from evergreen genera. The overall results indicate the importance of the timing of laser scanner data acquisition for tree genera separation and suggest that the potential of combining two LIDAR datasets for improved classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号