Browsing by Author "Zhu, Mu"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item An adaptive method for statistical detection with applications to drug discovery(2003) Zhu, Mu; Chipman, Hugh A.; Su, WanhuaResearchers have tried to tackle various statistical detection problems using state-of-the-art classification techniques but are often disappointed at the results. The reason is two-fold. First of all, as classification problems, these statistical detection problems are heavily unbalanced: the class of interest is rare in the training data; an overwhelming majority of the training data belong to what can be called a background class. A primary example is drug discovery, where most of the chemical compounds in the data set are inactive whereas the goal is to detect a small number of active compounds. Secondly, the goal of statistical detection is fundamentally different from that of classification, making misclassification rate the wrong criterion to focus on. In this article, we develop an adaptive method for statistical detection and demonstrate that it can be an effective tool for drug discovery.Item LAGO: a computationally efficient approach for statistical detection(2006) Su, Wanhua; Zhu, Mu; Chipman, Hugh A.We study a general class of statistical detection problems where the underlying objective is to detect items belonging to a rare class from a very large database. We propose a computationally efficient method to achieve this goal. Our method consists of two steps. In the first step we estimate the density function of the rare class alone with an adaptive bandwidth kernel density estimator. The adaptive choice of the bandwidth is inspired by the ancient Chinese board game known today as Go. In the second step we adjust this density locally depending on the density of the background class nearby. We show that the amount of adjustment needed in the second step is approximately equal to the adaptive bandwidth from the first step, which gives us additional computational savings. We name the resulting method LAGO, for “locally adjusted Go-kernel density estimator.” We then apply LAGO to a real drug discovery dataset and compare its performance with a number of existing and popular methods.Item Pseudo-likelihood inference underestimates model uncertainty: evidence from bayesian nearest neighbours(2011) Su, Wanhua; Chipman, Hugh A.; Zhu, MuWhen using the K-nearest neighbours (KNN) method, one often ignores the uncertainty in the choice of K. To account for such uncertainty, Bayesian KNN (BKNN) has been proposed and studied (Holmes and Adams 2002 Cucala et al. 2009). We present some evidence to show that the pseudo-likelihood approach for BKNN, even after being corrected by Cucala et al. (2009), still significantly underestimates model uncertainty.Item Threshold-free measures for assessing the performance of medical screening tests(2015) Yuan, Yan; Su, Wanhua; Zhu, MuBackground: The area under the receiver operating characteristic curve (AUC) is frequently used as a performance measure for medical tests. It is a threshold-free measure that is independent of the disease prevalence rate. We evaluate the utility of the AUC against an alternate measure called the average positive predictive value (AP), in the setting of many medical screening programs where the disease has a low prevalence rate. Methods: We define the two measures using a common notation system and show that both measures can be expressed as a weighted average of the density function of the diseased subjects. The weights for the AP include prevalence in some form, but those for the AUC do not. These measures are compared using two screening test examples under rare and common disease prevalence rates. Results: The AP measures the predictive power of a test, which varies when the prevalence rate changes, unlike the AUC, which is prevalence independent. The relationship between the AP and the prevalence rate depends on the underlying screening/diagnostic test. Therefore, the AP provides relevant information to clinical researchers and regulators about how a test is likely to perform in a screening population. Conclusion: The AP is an attractive alternative to the AUC for the evaluation and comparison of medical screening tests. It could improve the effectiveness of screening programs during the planning stage.