Browsing by Author "Chipman, Hugh A."
Now showing 1 - 3 of 3
Results Per Page
- ItemAn adaptive method for statistical detection with applications to drug discovery(2003) Zhu, Mu; Chipman, Hugh A.; Su, WanhuaResearchers have tried to tackle various statistical detection problems using state-of-the-art classification techniques but are often disappointed at the results. The reason is two-fold. First of all, as classification problems, these statistical detection problems are heavily unbalanced: the class of interest is rare in the training data; an overwhelming majority of the training data belong to what can be called a background class. A primary example is drug discovery, where most of the chemical compounds in the data set are inactive whereas the goal is to detect a small number of active compounds. Secondly, the goal of statistical detection is fundamentally different from that of classification, making misclassification rate the wrong criterion to focus on. In this article, we develop an adaptive method for statistical detection and demonstrate that it can be an effective tool for drug discovery.
- ItemLAGO: a computationally efficient approach for statistical detection(2006) Su, Wanhua; Zhu, Mu; Chipman, Hugh A.We study a general class of statistical detection problems where the underlying objective is to detect items belonging to a rare class from a very large database. We propose a computationally efficient method to achieve this goal. Our method consists of two steps. In the first step we estimate the density function of the rare class alone with an adaptive bandwidth kernel density estimator. The adaptive choice of the bandwidth is inspired by the ancient Chinese board game known today as Go. In the second step we adjust this density locally depending on the density of the background class nearby. We show that the amount of adjustment needed in the second step is approximately equal to the adaptive bandwidth from the first step, which gives us additional computational savings. We name the resulting method LAGO, for “locally adjusted Go-kernel density estimator.” We then apply LAGO to a real drug discovery dataset and compare its performance with a number of existing and popular methods.
- ItemPseudo-likelihood inference underestimates model uncertainty: evidence from bayesian nearest neighbours(2011) Su, Wanhua; Chipman, Hugh A.; Zhu, MuWhen using the K-nearest neighbours (KNN) method, one often ignores the uncertainty in the choice of K. To account for such uncertainty, Bayesian KNN (BKNN) has been proposed and studied (Holmes and Adams 2002 Cucala et al. 2009). We present some evidence to show that the pseudo-likelihood approach for BKNN, even after being corrected by Cucala et al. (2009), still significantly underestimates model uncertainty.