首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The study of confusion data is a well established practice in psychology. Although many types of analytical approaches for confusion data are available, among the most common methods are the extraction of 1 or more subsets of stimuli, the partitioning of the complete stimulus set into distinct groups, and the ordering of the stimulus set. Although standard commercial software packages can sometimes facilitate these types of analyses, they are not guaranteed to produce optimal solutions. The authors present a MATLAB *.m file for preprocessing confusion matrices, which includes fitting of the similarity-choice model. Two additional MATLAB programs are available for optimally clustering stimuli on the basis of confusion data. The authors also developed programs for optimally ordering stimuli and extracting subsets of stimuli using information from confusion matrices. Together, these programs provide several pragmatic alternatives for the applied researcher when analyzing confusion data. Although the programs are described within the context of confusion data, they are also amenable to other types of proximity data. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
针对经典K–means算法对不均衡数据进行聚类时产生的“均匀效应”问题,提出一种基于近邻的不均衡数据聚类算法(Clustering algorithm for imbalanced data based on nearest neighbor,CABON)。CABON算法首先对数据对象进行初始聚类,通过定义的类别待定集来确定初始聚类结果中类别归属有待进一步核定的数据对象集合;并给出一种类别待定集的动态调整机制,利用近邻思想实现此集合中数据对象所属类别的重新划分,按照从集合边缘到中心的顺序将类别待定集中的数据对象依次归入其最近邻居所在的类别中,得到最终的聚类结果,以避免“均匀效应”对聚类结果的影响。将该算法与K–means、多中心的非平衡K_均值聚类方法(Imbalanced K–means clustering method with multiple centers,MC_IK)和非均匀数据的变异系数聚类算法(Coefficient of variation clustering for non-uniform data,CVCN)在人工数据集和真实数据集上分别进行实验对比,结果表明CABON算法能够有效消减K–means算法对不均衡数据聚类时所产生的“均匀效应”,聚类效果明显优于K–means、MC_IK和CVCN算法。   相似文献   

3.
The popular K-means clustering method, as implemented in 3 commercial software packages (SPSS, SYSTAT, and SAS), generally provides solutions that are only locally optimal for a given set of data. Because none of these commercial implementations offer a reasonable mechanism to begin the K-means method at alternative starting points, separate routines were written within the MATLAB (Math-Works, 1999) environment that can be initialized randomly (these routines are provided at the end of the online version of this article in the PsycARTICLES database). Through the analysis of 2 empirical data sets and 810 simulated data sets, it is shown that the results provided by commercial packages are most likely locally optimal. These results suggest the need for some strategy to study the local optima problem for a specific data set or to identify methods for finding "good" starting values that might lead to the best solutions possible. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the true cluster structure. The author presents a heuristic procedure that selects an appropriate subset from among the set of all candidate clustering variables. Specifically, this procedure attempts to select only those variables that contribute to the definition of true cluster structure while eliminating variables that can hide (or mask) that true structure. Experimental testing of the proposed variable-selection procedure reveals that it is extremely successful at accomplishing this goal. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
针对高炉炼铁智能控制专家系统中单一支持向量机(SVM)炉温预测模型的改进研究,提出一种基于模糊C均值聚类(FCM)的多支持向量机模型。首先运用模糊C均值聚类对模型训练集进行聚类划分,然后对每一类进行支持向量机的训练,建立相应的子模型,并对测试集中的同一样本点分别进行预测,以测试样本点的输入对应于每一类的隶属度为权值,进行加权求和,最终得到预测值。通过对在线采集的数据分析表明,基于FCM的多支持向量机模型比单一的支持向量机模型在多方面预测性能得到改善,连续预测100炉命中率达86%。  相似文献   

6.
Proposes a random-effects regression model for analysis of clustered data. Unlike ordinary regression analysis of clustered data, random-effects regression models do not assume that each observation is independent but do assume that data within clusters are dependent to some degree. The degree of this dependency is estimated along with estimates of the usual model parameters, thus adjusting these effects for the dependency resulting from the clustering of the data. A maximum marginal likelihood solution is described, and available statistical software for the model is discussed. An analysis of a dataset in which students are clustered within classrooms and schools is used to illustrate features of random-effects regression analysis, relative to both individual-level analysis that ignores the clustering of the data, and classroom-level analysis that aggregates the individual data. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
The purposes of this paper are to outline seven types of qualitative data analysis techniques, to present step-by-step guidance for conducting these analyses via a computer-assisted qualitative data analysis software program (i.e., NVivo9), and to present screenshots of the data analysis process. Specifically, the following seven analyses are presented: constant comparison analysis, classical content analysis, keyword-in-context, word count, domain analysis, taxonomic analysis, and componential analysis. It is our hope that providing a clear step-by-step process for conducting these analyses with NVivo9 will assist school psychology researchers in increasing the rigor of their qualitative data analysis procedures. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

8.
In many countries, the most widely used method for timing plan selection and implementation is the time-of-day (TOD) method. In TOD mode, a few traffic patterns that exist in the historical volume data are recognized and used to find the signal timing plans needed to achieve optimum performance of the intersections during the day. Traffic engineers usually determine TOD breakpoints by analyzing 1 or 2?days worth of traffic data and relying on their engineering judgment. The current statistical methods, such as hierarchical and K-means clustering methods, determine TOD breakpoints but introduce a large number of transitions. This paper proposes adopting the Z-score of the traffic flow and time variable in the K-means clustering to reduce the number of transitions. The numbers of optimum breakpoints are chosen based on a microscopic simulation model considering a set of performance measures. By using simulation and the K-means algorithm, it was found that five clusters are the optimum for a major arterial in Al-Khobar, Saudi Arabia. As an alternative to the simulation-based approach, a subtractive algorithm-based K-means technique is introduced to determine the optimum number of TODs. Through simulation, it was found that both approaches results in almost the same values of measure of effectiveness (MOE). The proposed two approaches seem promising for similar studies in other regions, and both of them can be extended for different types of roads. The paper also suggests a procedure for considering the cyclic nature of the daily traffic in the clustering effort.  相似文献   

9.
McLachlan (2011) and Vermunt (2011) each provided thoughtful replies to our original article (Steinley & Brusco, 2011). This response serves to incorporate some of their comments while simultaneously clarifying our position. We argue that greater caution against overparamaterization must be taken when assuming that clusters are highly elliptical in nature. Specifically, users of mixture model clustering techniques should be wary of overreliance on fit indices, and the importance of cross-validation is highlighted. Additionally, we note that K-means clustering is part of a larger family of discrete partitioning algorithms, many of which are designed to solve problems identical to those for which mixture modeling approaches are often touted. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

10.
Ss "reorganize the material so that the recalls differ in sequential properties from those of the original list." When categorized subgroups of words are presented in a random order and Ss in recall put together or cluster such categorized items, the procedure is called category clustering. Associative clustering occurs "when in their recalls the Ss put together in sequence the stimuli and their responses which had been separated at list presentation" (e.g., stimuli such as table and mountain and responses such as chair and hill were presented in random order in word lists to Ss for recall). Results of several investigations are discussed. "When sufficiently prominent, experimenter-provided associational and categorical relations between members of a word pair provide a basis for clustering in free recall alternative to the bases—associational or otherwise—the S will use to effect subjective organization or idiosyncratic pairing. Free recall can tell us something of the way verbal organization is set up but we are largely in the dark as to how this organization acts to bring related items together whatever the basis of their relationship." 14 figures. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
This paper presents a two-step fuzzy clustering method for estimating haulers’ travel time. The proposed method provides a generic tool that can be incorporated in models dedicated for estimating earthmoving production. The estimated travel time takes into account the acceleration and deceleration in the transition zones. The developed method utilizes linear regression and fuzzy subtractive clustering. Seven factors influencing haulers’ travel time were first identified and their significance was then quantified using linear regression. The regression analysis was performed utilizing 180 training cases, generated using commercially available software for different models of haulers. The data were generated randomly to represent a wide range of possible combinations of factors affecting travel time of haulers across different types of road segments. The training data were subsequently used in the development of the proposed method. Unoptimized subtractive clustering, optimized Takagi–Sugeno zeroth-order subtractive clustering, and optimized Takagi–Sugeno first-order subtractive clustering were used in estimating haulers’ travel time. Their performance was evaluated using 36 test cases, also generated randomly in a similar manner to those utilized for training. The optimized Takagi–Sugeno first-order subtractive clustering model was found to outperform the other two, and was accordingly used in the proposed method. A numerical example is presented to demonstrate the use of the developed method and illustrate its accuracy.  相似文献   

12.
针对分类数据, 通过数据对象在属性值上的集中程度定义了新的基于属性值集中度的类内相似度(similarity based on concentration of attribute values, CONC), 用于衡量聚类结果中类内各数据对象之间的相似度; 通过不同类的特征属性值的差异程度定义了基于强度向量差异的类间差异度(dissimilarity based on discrepancy of SVs, DCRP), 用于衡量两个类之间的差异度.基于CONC和DCRP提出了新的分类数据聚类有效性内部评价指标(clustering validation based on concentration of attribute values, CVC), 它具有以下3个特点: (1)在评价每个类内相似度时, 不仅依靠类内各数据对象的特征, 还考虑了整个数据集的信息; (2)采用几个特征属性值的差异评价两个类的差异度, 确保评价过程不丢失有效的聚类信息, 同时可以消除噪音的影响; (3)在评价类内相似度及类间差异度时, 消除了数据对象个数对评价过程的影响.采用加州大学欧文分校提出的用于机器学习的数据库(UCI)进行实验, 将CVC与类别效用(category utility, CU)指标、基于主观因素的分类数据指标(categorical data clustering with subjective factors, CDCS)指标和基于信息熵的内部评价指标(information entropy, IE)等内部评价指标进行对比, 通过外部评价指标标准交互信息(normalized mutual information, NMI)验证内部评价效果.实验表明相对其他内部评价指标, CVC指标可以更有效地评价聚类结果.此外, CVC指标相对于NMI指标, 不需要数据集以外的信息, 更具实用性.   相似文献   

13.
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison, 2002). Focus is given to the multivariate normal distribution, and 9 separate decompositions (i.e., class structures) of the covariance matrix are investigated. To provide a link to the current literature, comparisons are made with K-means clustering in 3 detailed Monte Carlo studies. The findings have implications for applied researchers in that mixture-model clustering techniques performed best when the covariance structure and number of clusters were known. However, as the information about the shape and number of clusters became unknown, degraded performance was observed for both K-means clustering and mixture-model clustering. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

14.
Wagenmakers, Wetzels, Borsboom, and van der Maas (2011) argued that psychologists should replace the familiar “frequentist” statistical analyses of their data with Bayesian analyses. To illustrate their argument, they reanalyzed a set of psi experiments published recently in this journal by Bem (2011), maintaining that, contrary to his conclusion, his data do not yield evidence in favor of the psi hypothesis. We argue that they have incorrectly selected an unrealistic prior distribution for their analysis and that a Bayesian analysis using a more reasonable distribution yields strong evidence in favor of the psi hypothesis. More generally, we argue that there are advantages to Bayesian analyses that merit their increased use in the future. However, as Wagenmakers et al.'s analysis inadvertently revealed, they contain hidden traps that must be better understood before being more widely substituted for the familiar frequentist analyses currently employed by most research psychologists. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

15.
Infrastructure systems of many U.S. cities are in poor condition, with many assets reaching the end of their service life and requiring significant capital investments. One primary requirement to optimize the allocation of investments in such systems is an effective assessment of the physical condition of assets. This paper addresses the physical condition assessment of drinking water distribution systems by analyzing pipe breakage data as the main source of evidence about the current physical condition of water distribution pipes over space. From this spatial perspective, the primary questions are whether data sets present unexpected clustering of pipe breaks, and where those break clusters are located if they do exist. This paper presents a novel approach that aims to detect and locate clusters of break points in a water distribution network. The proposed approach extends existing spatial scan statistic approaches, which are commonly used for detection of disease outbreaks in a two-dimensional spatial framework, to data collected from networked infrastructure systems. This proposed approach is described and tested in a data set that consists of 491 breaks that occurred over six years in a 160-mi water distribution network. The results presented in this paper indicate that the adapted spatial scan statistic approach applied to points in physical networks is able to detect clusters of noncompact shapes, and that these clusters present significantly higher than expected breakage rates even after accounting for pipe age and diameter. Several possible hypotheses are explored for potential causes of these clusters.  相似文献   

16.
This article discusses the use of cluster analysis in family psychology research. It provides an overview of potential clustering methods, the steps involved in cluster analysis, hierarchical and nonhierarchical clustering methods, and validation and interpretation of cluster solutions. The article also reviews 5 uses of clustering in family psychology research: (a) deriving family types, (b) studying families over time, (c) as an interface between qualitative and quantitative methods, (d) as an alternative to multivariate interactions in linear models, and (e) as a data reduction technique for small samples. The article concludes with some cautions for using clustering in family psychology research. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
Wilson and Kihlstrom (1986) reported that hypnotic amnesia was not associated with reduced clustering when subjects had initially learned a long (16-item) list of categorized words. However, Wilson and Kihlstrom inappropriately substituted hypnotic susceptibility for level of amnesia in their analyses and, consequently, they failed to test for an association between reduced clustering and amnesia. Furthermore, supplementary correlational findings provided by Wilson and Kihlstrom strongly suggest that a disorganized clustering effect occurred but went unrecognized in their data. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
鲁杰  闫炳基  赵伟  李鹏  陈栋  国宏伟 《工程科学学报》2022,44(12):2081-2089
高炉操作炉型与高炉操作、技术经济指标等关系密切,合理的操作炉型有利于保证高炉生产的优质、低耗、高产、长寿.通过对冷却壁温度的聚类分析,能够有效合理地表征高炉操作炉型的变化,对高炉生产有着重要的指导意义.分别采用K-Means、TwoStep对数据集进行聚类分析,基于两种聚类算法的原理,结合Davies-Bouldin index(DBI)与Dunn index(DI)对聚类结果进行评价,分析不同聚类算法间的差异,得出了在所选样本数据及数据特征基础上,K-Means算法聚类结果更好的结论,该研究可为高炉炼铁大数据分析中的聚类算法选择提供有力参考.  相似文献   

19.
将地球化学采样点作为数据对象,测量的16种元素作为数据对象属性,运用数据挖掘技术中的聚类分析对采样点进行聚类,研究与分析了测区内地球化学元素的分布特征.研究结果表明,聚类结果和地层岩性有明显对应关系,能够有效地反映出不同地质单元的地球化学元素分布特征.  相似文献   

20.
介绍了钢管材质计算机在线分检系统的有关算法与应用 ,并用生产数据对统计分析和基于Kohonen神经网络聚类的模糊诊断的分选结果进行了分析与比较 ,指出了智能化分检方法涉及的问题和应用前景。介绍了钢管材质计算机在线分检系统的有关算法与应用 ,并用生产数据对统计分析和基于Kohonen神经网络聚类的模糊诊断的分选结果进行了分析与比较 ,指出了智能化分检方法涉及的问题和应用前景。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号