Robust Penalized Logistic Regression through Maximum Trimmed Likelihood Estimator

Penalized logistic regression is used to identify genetic markers for many high-dimensional datasets such as in gene expression, GWAS, DNA methylation studies and so on. But outliers sometimes occur due to missed diagnosis or misdiagnosis of subjects, heterogeneity of samples,technical problems in experiments or other problems. They can greatly influence the estimation of penalized logistic regression. Few studies focus on the robustness of penalized methods when the response variable is categorical, which is standard in medical research. This study proposed a robust LASSO-type penalized logistic regression based on maximum trimmed likelihood(MTL-LASSO). The definition of breakdown point (BDP) for penalized logistic regression was given and its property for the proposed method was proved. A modification of FAST-LTS algorithms was used to implement the estimation. The reweighted step was added to improve performance while guaranteeing robustness. The simulation study shows the proposed method can resist against outliers. A real dataset about gene expression profiles of multiple sclerosis patients and healthy controls was analyzed. Outliers in the control group identified by reweighted MTL-LASSO behave differently from others. It unveils there may be heterogeneity problem in control group. A much better fit is obtained after removing outliers.