共查询到20条相似文献,搜索用时 15 毫秒
1.
Recently, many methods have been proposed for microarray data analysis. One of the challenges for microarray applications is to select a proper number of the most relevant genes for data analysis. In this paper, we propose a novel hybrid method for feature selection in microarray data analysis. This method first uses a genetic algorithm with dynamic parameter setting (GADP) to generate a number of subsets of genes and to rank the genes according to their occurrence frequencies in the gene subsets. Then, this method uses the χ2-test for homogeneity to select a proper number of the top-ranked genes for data analysis. We use the support vector machine (SVM) to verify the efficiency of the selected genes. Six different microarray datasets are used to compare the performance of the GADP method with the existing methods. The experimental results show that the GADP method is better than the existing methods in terms of the number of selected genes and the prediction accuracy. 相似文献
2.
Better understanding of the complex links between urban transportation, land use, air quality, and population exposure is needed to improve urban sustainability. A goal of this study was to develop an exposure modeling framework that integrates agent-based activity and travel simulation with air pollution modeling for Tampa, Florida. We aimed to characterize exposure and exposure inequality for traffic-related air pollution, and to investigate the impacts of high-resolution information on estimated exposure. To do these, we developed and applied a modeling framework that combines the DaySim activity-based travel demand model, the MATSim dynamic traffic assignment model, the MOVES mobile source emissions estimator, and the R-LINE dispersion model. Resulting spatiotemporal distributions of daily individual human activity and pollutant concentration were matched to analyze population and subgroup exposure to oxides of nitrogen (NOx) from passenger car travel for an average winter day in 2010. Four scenarios using data with different spatiotemporal resolutions were considered: a) high resolution for both activities and concentrations, b) low resolution for both activities and concentrations, c) high resolution for activities, but low resolution for concentrations, and d) vice versa. For the high-resolution scenario, the mean daily population exposure concentration of NOx from passenger cars was 10.2 μg/m3; individual exposure concentrations ranged from 0.2 to 145 μg/m3. Subgroup mean exposure was higher than the population mean for individuals living below-poverty (by ~16%), those with daily travel time over one hour (8%), adults aged 19–45 (7%), blacks (6%), Hispanics (4%), Asians (2%), combined other non-white races (2%), people from middle income households (2%), and residents of urban areas (2%). The subgroup inequality index (a measure of disparity) largely increased with concentration up to the 90th percentile level for these groups. At higher levels, disparities increased sharply for individuals from below poverty households, blacks, and Hispanics. Low-resolution simulation of both activities and concentrations decreased the exposure estimates by 10% on average, with differences ranging from eight times higher to ~90% lower. 相似文献
3.
4.
5.
CRISP-DM is the standard to develop Data Mining projects. CRISP-DM proposes processes and tasks that you have to carry out to develop a Data Mining project. A task proposed by CRISP-DM is the cost estimation of the Data Mining project. 相似文献
6.
Although existing works in the literature highlight the monitoring, characterization, and analysis of both air and noise pollution, they mainly focus on the two environmental pollutants independently. In this paper, we develop a system framework that includes sensing and allows the processing of the combined impact of air and noise samples together to design micro-services. Few of the existing works that studied the combined effect of the two environmental stressors merely calculated the correlation values without further inferring contextual information from it. In contrast, our work aims to draw further inferences about the demographic/traffic/spatio-temporal aspect of a location and thus identifies the context in which the samples are collected. To achieve the goal, a system framework CoAN is developed under which we performed in-house data collection with approx. 820 km trail, covering approx. 10 km road segment in Durgapur, a sub-urban city in India. We used a commercially available ‘Flow’ device, and developed an android-based application, ‘AudREC’ for air and noise sampling, respectively. An unsupervised K-means algorithm has been used to segregate the combined samples into disjoint clusters for analysis. In addition, feature selection, model training, and cluster interpretation using the LIME model are performed to draw some inferences about the sample data space. Several supervised models, like Decision Tree, Random Forest, Logistic Regression, SVM, and Kernel-SVM are used for training the system. Results show that Logistic Regression performs best over others achieving 99% accuracy. Furthermore, as a micro-service, a healthier route recommendation system is designed to avoid pollution exposure by taking into account both air and noise pollution exposure volumes. A sample result shows that our recommended route gives almost 12% lesser pollution exposures as compared to all other available routes suggested by Google map with the same source and destination. 相似文献
7.
《Environmental Modelling & Software》2002,17(1):3-9
The improvement in the structure function method for retrieving aerosol optical depth (AOD) with SPOT HRV data and its application in air quality monitoring are highlighted in this paper. Generally speaking, estimation of the aerosol optical depth will be affected by the temporal change of surface canopy, observation geometry and terrain effect when applying the contrast reduction method to the multi-temporal satellite image set. In order to reduce the errors induced by such effects, the single-directional structure function is replaced by the multi-directional mode, which can describe the real characteristics of the surface structure more completely. Comparison of the results with in-site observations show a significant improvement in the accuracy of the retrieved AOD. Furthermore, due to the linear relationship between aerosol optical depth and turbidity coefficient, satellite images can be employed for monitoring air quality. Application of the method is demonstrated with a case study situated around the northern Taiwan area. 相似文献
8.
Al-Janabi Samaher Alkaim Ayad Al-Janabi Ehab Aljeboree Aseel Mustafa M. 《Neural computing & applications》2021,33(21):14199-14229
Neural Computing and Applications - Upgrading health reality is the responsibility of all, it is necessary to think about the design of a smart system based on modern technologies to reduce the... 相似文献
9.
Bekir Parlak 《Computational Intelligence》2023,39(5):900-926
Text classification (TC) is a very crucial task in this century of high-volume text datasets. Feature selection (FS) is one of the most important stages in TC studies. In the literature, numerous feature selection methods are recommended for TC. In the TC domain, filter-based FS methods are commonly utilized to select a more informative feature subsets. Each method uses a scoring system that is based on its algorithm to order the features. The classification process is then carried out by choosing the top-N features. However, each method's feature order is distinct from the others. Each method selects by giving the qualities that are critical to its algorithm a high score, but it does not select by giving the features that are unimportant a low value. In this paper, we proposed a novel filter-based FS method namely, brilliant probabilistic feature selector (BPFS), to assign a fair score and select informative features. While the BPFS method selects unique features, it also aims to select sparse features by assigning higher scores than common features. Extensive experimental studies using three effective classifiers decision tree (DT), support vector machines (SVM), and multinomial naive bayes (MNB) on four widely used datasets named Reuters-21,578, 20Newsgroup, Enron1, and Polarity with different characteristics demonstrate the success of the BPFS method. For feature dimensions, 20, 50, 100, 200, 500, and 1000 dimensions were used. The experimental results on different benchmark datasets show that the BPFS method is more successful than the well-known and recent FS methods according to Micro-F1 and Macro-F1 scores. 相似文献
10.
《国际计算机数学杂志》2012,89(10):2178-2198
In this paper, we propose a characteristic centred finite difference method on non-uniform grids to solve the problem of the air pollution model. Numerical solutions and error estimates of the air pollution concentration and its first-order derivatives for space variables are obtained. The computational cost of the method is the same as that of the characteristic difference method based on a linear interpolation. The error order of the numerical solutions is the same as that of the characteristic difference method based on a quadratic interpolation. At last, we give numerical examples to illustrate feasibility and efficiency of this method. 相似文献
11.
Yi Liu Author Vitae Author Vitae 《Pattern recognition》2006,39(7):1333-1345
In many pattern recognition applications, high-dimensional feature vectors impose a high computational cost as well as the risk of “overfitting”. Feature Selection addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification. This paper presents a novel feature selection method named filtered and supported sequential forward search (FS_SFS) in the context of support vector machines (SVM). In comparison with conventional wrapper methods that employ the SFS strategy, FS_SFS has two important properties to reduce the time of computation. First, it dynamically maintains a subset of samples for the training of SVM. Because not all the available samples participate in the training process, the computational cost to obtain a single SVM classifier is decreased. Secondly, a new criterion, which takes into consideration both the discriminant ability of individual features and the correlation between them, is proposed to effectively filter out nonessential features. As a result, the total number of training is significantly reduced and the overfitting problem is alleviated. The proposed approach is tested on both synthetic and real data to demonstrate its effectiveness and efficiency. 相似文献
12.
MRCCA: A novel CCA based method and its application in feature extraction and fusion for matrix data
Multiset features extracted from the same pattern usually represent different characteristics of data, meanwhile, matrices or 2-order tensors are common forms of data in real applications. Hence, how to extract multiset features from matrix data is an important research topic for pattern recognition. In this paper, by analyzing the relationship between CCA and 2D-CCA, a novel feature extraction method called multiple rank canonical correlation analysis (MRCCA) is proposed, which is an extension of 2D-CCA. Different from CCA and 2D-CCA, in MRCCA k pairs left transforms and k pairs right transforms are sought to maximize correlation. Besides, the multiset version of MRCCA termed as multiple rank multiset canonical correlation analysis (MRMCCA) is also developed. Experimental results on five real-world data sets demonstrate the viability of the formulation, they also show that the recognition rate of our method is higher than other methods and the computing time is competitive. 相似文献
13.
Multimedia Tools and Applications - We proposed a novel method for 2D image compression-encryption whose quality is demonstrated through accurate 2D image reconstruction at higher compression... 相似文献
14.
Model developments to assess different air pollution exposures within cities are still a key challenge in environmental epidemiology. Background air pollution is a long-term resident and low-level concentration pollution difficult to quantify, and to which population is chronically exposed. In this study, hourly time series of four key air pollutants were analysed using Hidden Markov Models to estimate the exposure to background pollution in Madrid, from 2001 to 2017. Using these estimates, its spatial distribution was later analysed after combining the interpolation results of ordinary kriging and inverse distance weighting. The ratio of ambient to background pollution differs according to the pollutant studied but is estimated to be on average about six to one. This methodology is proposed not only to describe the temporal and spatial variability of this complex exposure, but also to be used as input in new modelling approaches of air pollution in urban areas. 相似文献
15.
An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensive Monte Carlo simulation studies show that the slope statistic outperforms (for the considered examples) some popular methods that have been proposed in the literature. Applications in graph clustering, in iris and breast cancer datasets are shown. 相似文献
16.
17.
S.P. Moustakidis Author Vitae Author Vitae 《Pattern recognition》2010,43(11):3712-3729
An efficient filter feature selection (FS) method is proposed in this paper, the SVM-FuzCoC approach, achieving a satisfactory trade-off between classification accuracy and dimensionality reduction. Additionally, the method has reasonably low computational requirements, even in high-dimensional feature spaces. To assess the quality of features, we introduce a local fuzzy evaluation measure with respect to patterns that embraces fuzzy membership degrees of every pattern in their classes. Accordingly, the above measure reveals the adequacy of data coverage provided by each feature. The required membership grades are determined via a novel fuzzy output kernel-based support vector machine, applied on single features. Based on a fuzzy complementary criterion (FuzCoC), the FS procedure iteratively selects features with maximum additional contribution in regard to the information content provided by previously selected features. This search strategy leads to small subsets of powerful and complementary features, alleviating the feature redundancy problem. We also devise different SVM-FuzCoC variants by employing seven other methods to derive fuzzy degrees from SVM outputs, based on probabilistic or fuzzy criteria. Our method is compared with a set of existing FS methods, in terms of performance capability, dimensionality reduction, and computational speed, via a comprehensive experimental setup, including synthetic and real-world datasets. 相似文献
18.
众包是一个新兴的收集数据集标签的方法。虽然它经济实惠,但面临着数据标签质量无法保证的问题。尤其是当客观原因存在使得众包工作者工作质量较差时,所得的标签会更加不可靠。因此提出一个名为基于特征扩维提高众包质量的方法(FA-method),其基本思想是,首先由专家标注少部分标签,再利用众包者标注的数据集训练模型,对专家集进行预测,所得结果作为专家数据集新的特征,并利用扩维后的专家集训练模型进行预测,计算每个实例为噪声的可能性以及噪声数量上限来过滤出潜在含噪声标签的数据集,类似地,对过滤后的高质量集再次使用扩维的方法进一步校正噪声。在8个UCI数据集上进行验证的结果表明,和现有的结合噪声识别和校正的众包标签方法相比,所提方法能够在重复标签数量较少或标注质量较低时均取得很好的效果。 相似文献
19.
ZHU Qiang CHEN XiuWan FAN QiXiang JIN HePing &LI JiRen China Three Gorges Corporation Yichang China School of Earth Space Science Peking University Beijing China Institute of Water Resources Hydropower Research Beijing 《中国科学:信息科学(英文版)》2011,(9)
Soil erosion by water is the most important land degradation problem worldwide.In this paper a new procedure was developed to estimate the rainfall-runoff erosivity factor(R)based on Tropical Rainfall Measuring Mission(TRMM)satellite-estimated precipitation data,which consists of 3-h rainfall intensity data.In this method,R was calculated as the product of the maximum 180-min rainfall intensity and the rainfall energy.This procedure was applied to the Daling River basin in Liaoning Province,China,R in terms... 相似文献
20.
《Information & Management》2022,59(5):103231
Under rapid urbanization, cities are facing many societal challenges that impede sustainability. Big data analytics (BDA) gives cities unprecedented potential to address these issues. As BDA is still a new concept, there is limited knowledge on how to apply BDA in a sustainability context. Thus, this study investigates a case using BDA for sustainability, adopting the resource orchestration perspective. A process model is generated, which provides novel insights into three aspects: data resource orchestration, BDA capability development, and big data value creation. This study benefits both researchers and practitioners by contributing to theoretical developments as well as by providing practical insights. 相似文献