With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components
of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature
selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity
of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms
based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature
subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate
them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments
allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms.
Finally, we show how stability profiles can support the choice of a feature selection algorithm.
Alexandros Kalousis received the B.Sc. degree in computer science, in 1994, and the M.Sc. degree in advanced information systems, in 1997, both
from the University of Athens, Greece. He received the Ph.D. degree in meta-learning for classification algorithm selection
from the University of Geneva, Department of Computer Science, Geneva, in 2002. Since then he is a Senior Researcher in the
same university. His research interests include relational learning with kernels and distances, stability of feature selection
algorithms, and feature extraction from spectral data.
Julien Prados is a Ph.D. student at the University of Geneva, Switzerland. In 1999 and 2001, he received the B.Sc. and M.Sc. degrees in
computer science from the University Joseph Fourier (Grenoble, France). After a year of work in industry, he joined the Geneva
Artificial Intelligence Laboratory, where he is working on bioinformatics and datamining tools for mass spectrometry data
analysis.
Melanie Hilario has a Ph.D. in computer science from the University of Paris VI and currently works at the University of Geneva’s Artificial
Intelligence Laboratory. She has initiated and participated in several European research projects on neuro-symbolic integration,
meta-learning, and biological text mining. She has served on the program committees of many conferences and workshops in machine
learning, data mining, and artificial intelligence. She is currently an Associate Editor of theInternational Journal on Artificial Intelligence Toolsand a member of the Editorial Board of theIntelligent Data Analysis journal. 相似文献
The reconfiguration of the home-work boundary that at-home telework entails has significant implications for gender issues and the use of ubiquitous information and communications technologies (ICTs). By presenting a Marx-inspired dialectical analysis of the family and home as both ‘haven and hell’, we develop a critique of proposed advantages for women home workers. Not only do we question the ability of ICTs to deliver the promises made on their behalf – we show how this socio-technical innovation may in fact contribute to compounding the double-burden of work associated with gender roles within the home. Contemporary critical understanding of the e-society should incorporate the influence of at-home teleworking because of its implications for the use of ubiquitous ICTs in the home environment, the shaping of work relations and its impact on gender issues. This increasing use of ICTs outside of the workplace is matched by the growing consensus within the European Union on the desirability of flexible working coupled with family friendly policies. This paper explores some of the rhetoric and research surrounding the proposed benefits of at home ‘telework’ and the likely cost–benefits, from an employee's perspective, in terms of increased freedom, reduced burden and ‘flexibility’. 相似文献
We present a novel application of the Zobrist hashing method, known in the computer chess literature, to simulation of diffusional phase transformations in metal alloys. A history of previously visited states can be easily maintained, allowing very fast lookup of energies and transition rates calculated earlier in the simulation. The method has been applied to the simulation of a Fe-1at.%Cu system, with simple potentials and a transition rate for diffusional events approximated from the difference in internal energy between trial states. In this simple model at temperatures of 1073 K we find that 61.2% of the states considered during the simulation have been seen previously, and that this proportion rises to 85.1% at 773 K and even to 99.9% at 373 K. Rapid recall of these states reduces the computational time taken for the same sequence of atom-vacancy exchange moves by a factor of 6.3 at 773 K rising to over 100 at 373 K. We suggest that a similar speedup factor will be found using more sophisticated models of diffusion and that the method can, with small modifications, be applied to a wide range of kinetic Monte Carlo simulations of atomistic diffusion processes. 相似文献
Structural and Multidisciplinary Optimization - Controlling the composition of individual voxels allows for a co-optimization of 3D-printed part properties such as color and mechanical ones. As a... 相似文献
The leaf area index (LAI) product from the Moderate Resolution Imaging Spectroradiometer (MODIS) is important for monitoring and modelling global change and terrestrial dynamics at many scales. The algorithm relies on spectral reflectances and a six biome land cover classification. Evaluation of the specific behaviour and performance of the product for regions of the globe such as Australia are needed to assist with product refinement and validation. We made an assessment of Collection 4 of the MODIS LAI product using four approaches: (a) assessment against a continental scale Structural Classification of Australian Vegetation (SCAV); (b) assessment against a continental scale land use classification (LUC); (c) assessment against historical field-based measurement of LAI collected prior to the Terra Mission; and (d) direct comparison of MODIS LAI with coincident field measurements of LAI, mostly from hemispherical photography. The MODIS LAI product produced a wide variety of geographically and structurally specific temporal response profiles between different classes and even for sub-groups within classes of the SCAV. Historical and concurrent field measurements indicated that MODIS LAI was giving reasonable estimates for LAI for most cover types and land use types, but that major overestimation of LAI occurs in some eastern Australian open forests and woodlands. The six biome structural land cover classification showed some significant deviations in class allocation compared to the SCAV particularly where grasslands are allocated to shrubland, savanna woodlands are allocated to shrubland, savanna and broadleaf forest, and open forests are allocated to savanna and broadleaf forest. The land cover and LAI products could benefit from some additional examination of Australian data addressing the structural representation of Eucalypt canopies in the “space of canopy realisation” for savanna and broadleaf forest classes. 相似文献
Das Projekt ?E-Mail made in Germany“ wurde als Reaktion auf die Enthüllungen von Edward Snowden initiiert. Es ist das erklärte Ziel, den E-Mail-Nutzern in Deutschland einen hohen Sicherheits- und Datenschutzstandard anzubieten. Das Produkt wurde mit einer breit angelegten Werbekampagne beworben. Im Mittelpunkt der Kampagne stand ein 30sekündiger TV-Spot, der auf die Notwendigkeit von sicherer E-Mail-Kommunikation eingeht. Kritiker bemängeln jedoch, dass dabei ein falsches Sicherheitsverständnis von ?E-Mail made in Germany“ vermittelt wird. Dieser Kritik wurde mit einer Laborstudie nachgegangen. 相似文献
We present an approach for the visualization and interactive analysis of dynamic graphs that contain a large number of time steps. A specific focus is put on the support of analyzing temporal aspects in the data. Central to our approach is a static, volumetric representation of the dynamic graph based on the concept of space-time cubes that we create by stacking the adjacency matrices of all time steps. The use of GPU-accelerated volume rendering techniques allows us to render this representation interactively. We identified four classes of analytics methods as being important for the analysis of large and complex graph data, which we discuss in detail: data views, aggregation and filtering, comparison, and evolution provenance. Implementations of the respective methods are presented in an integrated application, enabling interactive exploration and analysis of large graphs. We demonstrate the applicability, usefulness, and scalability of our approach by presenting two examples for analyzing dynamic graphs. Furthermore, we let visualization experts evaluate our analytics approach.
Cardiovascular (CV) disease is the single most significant cause of morbidity and mortality worldwide. The emerging global impact of CV disease means that the goals of early diagnosis and a wider range of treatment options are now increasingly pertinent. As such, there is a greater need to understand the molecular mechanisms involved and potential targets for intervention. Mitochondrial function is important for physiological maintenance of the cell, and when this function is altered, the cell can begin to suffer. Given the broad range and significant impacts of the cellular processes regulated by the mitochondria, it becomes important to understand the roles of the proteins associated with this organelle. Proteomic investigations of the mitochondria are hampered by the intrinsic properties of the organelle, including hydrophobic mitochondrial membranes; high proportion of basic proteins (pI greater than 8.0); and the relative dynamic range issues of the mitochondria. For these reasons, many proteomic studies investigate the mitochondria as a discrete subproteome. Once this has been achieved, the alterations that result in functional changes with CV disease can be observed. Those alterations that lead to changes in mitochondrial function, signaling and morphology, which have significant implications for the cardiomyocyte in the development of CV disease, are discussed. 相似文献
A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training (PW) that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood (PWPL), designed for when variables have large cardinality. On several real-world natural language processing tasks, piecewise training performs superior to Besag’s pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to PW and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training. 相似文献