- Scalable Variable Selection for Two-View Learning Tasks Using Projection Operator (arXiv)
Authors: Sandor Szedmak, Riikka Huusari, Tat Hon Duong Le, Juho Rousu
Abstract: In this paper, we propose a new variable selection method for two view settings, or vector-valued supervised learning problems. Our framework can handle very large selection tasks with potentially millions of data samples. Briefly, our method performs variable selection by repeatedly selecting variables that are highly correlated with the output variable but uncorrelated with previously selected variables. To measure correlation, our method uses the projection operator and its algebraic concept. The projection operator also allows for non-linear correlation models, since the relationship between the set of input and output variables, the correlation, can also be expressed by kernel functions. We experimentally validate our approach and demonstrate its scalability and selected feature relevance on both synthetic and real data.Keywords: supervised variable selection, vector-valued learning, projective value measurement, kernel Hilbert space reproduction
2. Bayesian variable selection using informed reversible jumps in image genetics: application to schizophrenia
(arXiv)
Authors: Zidenou Moncho, Diane Zuanetti, Thierry Checuo, Luis Milan
Abstract: Modern attempts to provide risk prediction for complex diseases such as schizophrenia integrate genetic and brain information in what is known as genetic imaging. This study associates the presence of the complex disease of schizophrenia with genetic and imaging features and proposes inference and prediction methods to predict its risk to new individuals. Given functional magnetic resonance imaging and single nucleotide polymorphism information of a healthy person and a person diagnosed with schizophrenia, a Bayesian probit model was used to select discriminating variables and to estimate predictive risk. , the most promising models are combined using a Bayesian model averaging scheme. For these purposes, we propose an informed reversible jump Markov chain Monte Carlo, called data-driven or informed reversible jump. It is scalable to high-dimensional data when the number of covariates is much larger than the sample size.
