共查询到20条相似文献,搜索用时 10 毫秒
1.
Orthogonal arrays of mixed levels have been found useful in setting up highly fractional factorial experiments. When the responses from such experiments are binary, logistic regression analysis is used to analyze such data. Usually, the maximum likelihood estimates of parameters under the logistic regression model need to be found by an iterative procedure in order to test the importance of factors. In this article, a simple approximate procedure is proposed and compared to the maximum likelihood method with an example. 相似文献
2.
We propose a new approach, the forward functional testing (FFT) procedure, to cluster number selection for functional data clustering. We present a framework of subspace projected functional data clustering based on the functional multiplicative random-effects model, and propose to perform functional hypothesis tests on equivalence of cluster structures to identify the number of clusters. The aim is to find the maximum number of distinctive clusters while retaining significant differences between cluster structures. The null hypotheses comprise equalities between the cluster mean functions and between the sets of cluster eigenfunctions of the covariance kernels. Bootstrap resampling methods are developed to construct reference distributions of the derived test statistics. We compare several other cluster number selection criteria, extended from methods of multivariate data, with the proposed FFT procedure. The performance of the proposed approaches is examined by simulation studies, with applications to clustering gene expression profiles. 相似文献
3.
台风是一种破坏力极强的灾害性天气系统,做好台风路径和强度预报是防灾减灾的关键。除了气候性因子、台风持续性因子以及环境背景场因子,文章还考虑了在近海时,受陆地影响下,台风强度演变的情况,引入了新变量,即海陆比。将2000—2014年西北太平洋的所有台风样本分成海盆样本和近海样本,研究它们在12、24、36和48小时间隔的强度演变规律。本研究利用1°×1°美国国家环境预报中心/美国国家大气研究中心提供的 FNL全球再分析资料(Final Operational Global Analysis)数据,采用逐步回归和主成分分析法的多元统计回归模型预测台风强度,并比较了两种模型在台风强度预测上的表现。综合深海盆和近海台风强度的预测结果可以看出,文章提供的近海台风强度预报方法,比国内外的其他研究更具有防台减灾的实际应用价值。 相似文献
4.
在工业过程中,有很多重要变量往往无法在线检测,通常通过软测量方法进行估计,主元回归是其中1种常用方法。相比于主元,因子更具广泛意义,更能反映数据的本质特征。基于此,提出1种基于因子回归模型的软测量方法,先对过程日常运行数据进行因子分析,建立因子生成模型,并提取因子信息,然后建立因子与关键变量间的因子回归模型,在线应用时先将可测变量代入生成模型得到因子变量,然后将因子代入到因子回归模型,软测量出关键变量。将该方法应用到化工吸附分离过程中,比较了因子回归模型与主元回归模型的软测量效果,结果表明前者优于后者。 相似文献
5.
Functional PLS logit regression model 总被引:1,自引:0,他引:1
M. Escabias A.M. Aguilera M.J. Valderrama 《Computational statistics & data analysis》2007,51(10):4891-4902
Functional logistic regression has been developed to forecast a binary response variable from a functional predictor. In order to fit this model, it is usual to assume that the functional observations and the parameter function of the model belong to a same finite space generated by a basis of functions. This consideration turns the functional model into a multiple logit model whose design matrix is the product of the matrix of sample paths basic coefficients and the matrix of the inner products between basic functions. The likelihood estimation of the parameter function of this model is very inaccurate due to the high dependence structure of the so obtained design matrix (multicollinearity). In order to solve this drawback several approaches have been proposed. These employ standard multivariate data analysis methods on the design matrix. This is the case of the functional principal component logistic regression model. As an alternative a functional partial least squares logit regression model is proposed, that has as covariates a set of partial least squares components of the design matrix of the multiple logit model associated to the functional one. 相似文献
6.
We present a nonparametric method to forecast a seasonal univariate time series, and propose four dynamic updating methods to improve point forecast accuracy. Our methods consider a seasonal univariate time series as a functional time series. We propose first to reduce the dimensionality by applying functional principal component analysis to the historical observations, and then to use univariate time series forecasting and functional principal component regression techniques. When data in the most recent year are partially observed, we improve point forecast accuracy by using dynamic updating methods. We also introduce a nonparametric approach to construct prediction intervals of updated forecasts, and compare the empirical coverage probability with an existing parametric method. Our approaches are data-driven and computationally fast, and hence they are feasible to be applied in real time high frequency dynamic updating. The methods are demonstrated using monthly sea surface temperatures from 1950 to 2008. 相似文献
7.
Forecasting analysis by using fuzzy grey regression model for solving limited time series data 总被引:1,自引:1,他引:1
Ruey-Chyn Tsaur 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2008,12(11):1105-1113
The grey model GM(1,1) is a popular forecasting method when using limited time series data and is successfully applied to
management and engineering applications. On the other hand, the reliability and validity of the grey model GM(1,1) have never
been discussed. First, without considering other causes when using limited time series data, the forecasting of the grey model
GM(1,1) is unreliable, and provide insufficient information to a decision maker. Therefore, for the sake of reliability, the
fuzzy set theory was hybridized into the grey model GM(1,1). This resulted in the fuzzy grey regression model, which granulates
a concept into a set with membership function, thereby obtaining a possible interval extrapolation. Second, for a newly developed
product or a newly developed system, the data collected are limited and rather vague with the result that the grey model GM(1,1)
is useless for solving its problem with vague or fuzzy-input values. In this paper the fuzzy grey regression model is verified
to show its validity in solving crisp-input data and fuzzy-input data with limited time series data. Finally, two examples
for the LCD TV demand are illustrated using the proposed models. 相似文献
8.
首先,根据餐饮业网络评论文本对消费者情感极性进行预测,建立了Lasso-Logistic和Lasso-PCA两个预测模型.相比之下,Lasso-PCA模型整合了更多的变量信息,对文本的情感极性具有更好的预测效果;但是Lasso-PCA模型对变量的解释能力较弱,尤其在解释变量维度较高的情况下,Lasso-PCA模型很难分析出解释变量对被解释变量的影响.其次,对Lasso-Logistic模型的变量选择结果进一步分析发现,特色菜、服务态度和环境以及“美中不足”之处是影响消费者情感极性的显著因素. 相似文献
9.
Irina Arhipova Gundars Berzins Edgars Brekis Juris Binde Martins Opmanis Aldis Erglis Evija Ansonska 《Expert Systems》2020,37(5):e12530
Various studies demonstrate that data on mobile phone use are useful when analysing problems in the fields of human activity or population dynamics, including tourism, transportation planning, public administration, etc. However, one of the biggest challenges is related to the restrictions contained in the General Data Protection Regulation that force the use of statistics about mobile operator client activities instead of allowing the analysis of mobile operator data. Therefore, a data analytics approach that does not involve information on the mobility of particular persons was developed, providing economically relevant data on aggregate mobility while protecting personal data. The activity data aggregation was conducted at 15-min intervals in the area of each cellular base station; “activity” is defined as the number of outgoing and incoming calls and sent and received text messages (short message service or SMS) and, in some instances, as the count of unique users. The case study examines all of Latvia's municipalities, analysing the economic activity level in each municipality in comparison to the mobile phone activity in three periods: 2015–2016, 2017, and 2018. It was concluded that the economic activity in municipalities can be estimated, and positive dynamics of regional development have been detected. Such data and the data analytics method, which provides an understanding of how economic activities evolve in real time in particular locations and economic activity centres, can improve regional development planning and plan implementation. In order to assess which are the centres of economic activity in each municipality and its sphere of influence, the patterns of human commuting and fluctuations of internal activity on workdays and weekends/holidays in 2017–2018 were determined. In general, there is a shortage of reliable data on human commuting within Latvia and its specific regions; therefore, the method described here provides a practical tool for regional governments to keep track of strategy implementation and for strategic gap analysis. 相似文献
10.
Interpreting measurement data to extract meaningful information for damage detection is a challenge for continuous monitoring of structures. This paper presents an evaluation of two model-free data interpretation methods that have previously been identified to be attractive for applications in structural engineering: moving principal component analysis (MPCA) and robust regression analysis (RRA). The effect of three factors are evaluated: (a) sensor-damage location, (b) traffic loading intensity and (c) damage level, using two criteria: damage detectability and the time to damage detection. In addition, the effects of these three factors are studied for the first time in situations with and without removing seasonal variations through use of a moving average filter and an ideal low-pass filter. For this purpose, a parametric study is performed using a numerical model of a railway truss bridge. Results show that MPCA has higher damage detectability than RRA. On the other hand, RRA detects damages faster than MPCA. Seasonal variation removal reduces the time to damage detection of MPCA in some cases while the benefits are consistently modest for RRA. 相似文献
11.
Juan Antonio Cuesta-Albertos Ricardo Fraiman 《Computational statistics & data analysis》2007,51(10):4864-4877
A robust cluster procedure for functional data is introduced. It is based on the notion of impartial trimming. Existence and consistency results are obtained. Furthermore, a feasible algorithm is proposed and implemented in a real data example, where patterns of electrical power consumers are observed. 相似文献
12.
A number of data models have been proposed in the literature to improve the usability of database systems. Most of these proposals have not been implemented. It is felt that a working implementation is a necessary prerequisite for a serious study of these proposals. So far, implementing data models has been a daunting task, requiring a large team. We report here on a simple implementation technique using the persistent algorithmic language. PS-algol, which promises to make the task of implementing data models a manageable one. The feasibility of the approach is demonstrated by implementing an entity-based functional data model (extended function data model) using PS-algol. 相似文献
13.
PCR法对杯芳烃分析体系中同系物的同时测定研究 总被引:1,自引:0,他引:1
杯芳烃是重要的超分子试剂,其同系物的吸收光谱重叠严重,难以直接同时测定.本文应用主成分回归法研究了对叔丁基杯[4]芳烃、对叔丁基杯[6]芳烃和对叔丁基杯[8]芳烃三种大分子同系化合物组成的混合体系中各组分同时直接测定,测定结果回收率为94%-111%,具有较高的准确度,且过程简单,运算快速,为同系物混合体系的同时测定研究提供了新的途径. 相似文献
14.
Pablo Martínez-Camblor Norberto Corral 《Computational statistics & data analysis》2011,55(12):3244-3256
Most of the traditional statistical methods are being adapted to the Functional Data Analysis (FDA) context. The repeated measures analysis which deals with the k-sample problem when the data are from the same subjects is investigated. Both the parametric and the nonparametric approaches are considered. Asymptotic, permutation and bootstrap approximations for the statistic distribution are developed. In order to explore the statistical power of the proposed methods in different scenarios, a Monte Carlo simulation study is carried out. The results suggest that the studied methodology can detect small differences between curves even with small sample sizes. 相似文献
15.
有机小分子偶极矩影响因素研究 总被引:1,自引:0,他引:1
分子内部电荷分布不均使分子具有极性,分子的极性又影响了分子间的氢键、范德华力等相互作用力,进而影响宏观物质的化学特性,如化合物的溶解性、正辛醇,水分配系数和液相色谱保留行为等。化学特性的本质来源于其分子微观电荷状态,而偶极矩正是反映分子内电荷分布情况的重要物理量,因此估测分子的偶极矩十分重要。文中选取8种结构类型共107个分子,用Dragon软件计算出每个分子的929个分子参数,用多元线性回归法和主成分分析法分析参数对偶极矩值的影响。分析结果表明,相比主成分分析法,多元线性回归的stepwise法为最优建模方法。该模型统计结果良好(R~2=0.878,SEC=0.330,F=65.697),预测结果良好(R~2=0.792,SEP=0.665)。结果表明分子的平均原子电负性、距离矩阵参数、分子的组成、相邻原子的对称性、价键连接指数和分子的电性拓扑指数是影响偶极矩的主要因素。建立的最优回归模型对有机小分子的偶极矩具有很好的预测能力,对研究物质的宏观化学特性有一定的帮助作用。 相似文献
16.
建立预测类黄酮化合物抑制恶性疟原虫株活性定量的模型,并确定影响类黄酮化合物活性的主要因素。本文选用了38个结构不同的类黄酮化合物作为数据集,采用多元线性同归法及主成分分析法分析每个化合物的220个分子参数,建立最优的预测模型。比较用不同方法建立的模型,结果发现带logP参数的向后筛选法为最优方法,所建模型统计结果良好(训练集相关系数R~2=0.81,标准训练误差SEE=0.27),模型代入检验集数据时结果也令人满意(检验集相关系数R~2=0.83,标准检验误差SEP=0.39),可靠性和预测性较强。脂水分配系数的对数logP为模型重要影响参数。建模和确定影响因素有助于筛选新型类黄酮抗疟疾药物和研发。 相似文献
17.
Handan Ankarali Camdeviren Ayse Canan Yazici Zeki Akkus Resul Bugdayci Mehmet Ali Sungur 《Expert systems with applications》2007,32(4):987-994
In this study, it is aimed that comparing logistic regression model with classification tree method in determining social-demographic risk factors which have effected depression status of 1447 women in separate postpartum periods. In determination of risk factors, data obtained from prevalence study of postpartum depression were used. Cut-off value of postpartum depression scores that calculated was taken as 13. Social and demographic risk factors were brought up by helping of the classification tree and logistic regression model. According to optimal classification tree total of six risk factors were determined, but in logistic regression model 3 of their effect were found significantly. In addition, during the relations among risk factors in tree structure were being evaluated, in logistic regression model corrected main effects belong to risk factors were calculated. In spite of, classification success of maximal tree was found better than both optimal tree and logistic regression model, it is seen that using this tree structure in practice is very difficult. But we say that the logistic regression model and optimal tree had the lower sensitivity, possibly due to the fact that numbers of the individuals in both two groups were not equal and clinical risk factors were not considered in this study. Classification tree method gives more information with detail on diagnosis by evaluating a lot of risk factors together than logistic regression model. But making correct selection through constructed tree structures is very important to increase the success of results and to reach information which can provide appropriate explanations. 相似文献
18.
针对社会媒体中非正式文本的数据分析经常出现的稀疏数据矩阵,在应用文本分析工具的基础上使用稀疏主成分分析这一特征,降维分析方法分析现实情况下聊天文本中非正式语词表现的认知语用特征、描述非正式语词与人格的关系。使用短文本主题模型、心理距离问卷、大五人格问卷测量人格和背景变量,使用计算机文本分析工具对被试提供的即时聊天文本内的语词计频,使用简体中文版语词查询与字词计数字典和认知语用学对稀疏主成分分析后非正式语词维度进行特征表征。在非正式语词降维上,稀疏主成分分析比主成分分析在因子载荷数上更稳定,在累积方差解释率上也相对更优(24.54% >23.40%);降维所得的6因子中“主观评价”与宜人性正相关(r0.05=.16, p =.03<0.05),“随意社交”与宜人性负相关(r0.05=-.16, p=.03<0.05),“认知愉悦”与性别显著正相关(r0.05=.43, p=.00<0.001)。使用稀疏主成分分析对非正式语词的降维效果较好,并且比较简体中文版语词查询与字词计数字典的非正式语词维度和降维后所得非正式语词维度,两者在和人格的相关上是相符的,且后者能探索出更多信息。 相似文献
19.
20.
为了改善主元分析对带噪声过程的监测性能,本文结合小波包分析消噪性能与主元分析提取变量间相关性能的特点,提出了一种小波包主元分析方法。给出了基于小波包主元分析的过程监测的算法实现。并在此基础上,对TE过程进行了监测性能仿真。结果表明小波包主元分析方法有较好的监测性能。 相似文献