### Browsing by Author "Su, Wanhua"

Now showing 1 - 9 of 9

###### Results Per Page

###### Sort Options

Item An adaptive method for statistical detection with applications to drug discovery(2003) Zhu, Mu; Chipman, Hugh A.; Su, WanhuaResearchers have tried to tackle various statistical detection problems using state-of-the-art classification techniques but are often disappointed at the results. The reason is two-fold. First of all, as classification problems, these statistical detection problems are heavily unbalanced: the class of interest is rare in the training data; an overwhelming majority of the training data belong to what can be called a background class. A primary example is drug discovery, where most of the chemical compounds in the data set are inactive whereas the goal is to detect a small number of active compounds. Secondly, the goal of statistical detection is fundamentally different from that of classification, making misclassification rate the wrong criterion to focus on. In this article, we develop an adaptive method for statistical detection and demonstrate that it can be an effective tool for drug discovery.Item Introduction to applied statistics: open textbook series in statistics(2024) Su, Wanhua; Miller, Dylan; Mewhort, Clarissa; Chipman, Hugh; Fedoruk, JohnThis book aims to provide students taking the first course in introductory statistics with open learning materials to master basic statistical concepts and techniques and to give demonstrations on conducting fundamental statistical analysis using the free statistical software R Commander. Each chapter generally includes a statement of learning outcomes, course notes, review exercises, self-assessment quiz, and homework assignment questions. The book is based on instructor course notes for STAT 151 (Introduction to Applied Statistics) at MacEwan University. In December 2015, the online version of STAT 151, including module notes, quizzes, homework assignment questions and marking rubrics, and a lab manual in R Commander was developed, leading to the creation of this textbook. Each homework assignment has two parts; students must complete Part A by hand and Part B with R Commander. Most data sets for the assignment, assignment questions, and quiz questions are adapted from popular introductory statistics textbooks such as Introductory Statistics by Neil Weiss and Intro STATS by Richard D. De Veaux, Paul F. Velleman, David E. Bock, and Paul D. Velleman. The online STAT 151 was completed and offered for the first time in Spring 2018. This open textbook is the revised and enriched version of that online course. The only prerequisite of this book is high school mathematics; most students take STAT 151 in the first year of their post-secondary education. R Commander is taught instead of R/RStudio as the software for the lab component to avoid focusing on the programming component needed for R/R Studio. In a future edition, there are plans to include a lab manual with command lines in R/RStudio. This book introduces one-sided confidence intervals to help students understand the computer output of hypothesis testing in R Commander.Item LAGO: a computationally efficient approach for statistical detection(2006) Su, Wanhua; Zhu, Mu; Chipman, Hugh A.We study a general class of statistical detection problems where the underlying objective is to detect items belonging to a rare class from a very large database. We propose a computationally efficient method to achieve this goal. Our method consists of two steps. In the first step we estimate the density function of the rare class alone with an adaptive bandwidth kernel density estimator. The adaptive choice of the bandwidth is inspired by the ancient Chinese board game known today as Go. In the second step we adjust this density locally depending on the density of the background class nearby. We show that the amount of adjustment needed in the second step is approximately equal to the adaptive bandwidth from the first step, which gives us additional computational savings. We name the resulting method LAGO, for “locally adjusted Go-kernel density estimator.” We then apply LAGO to a real drug discovery dataset and compare its performance with a number of existing and popular methods.Item Neural network and logistic regression diagnostic prediction models for giant cell arteritis: development and validation(2019) Ing, Edsel B.; Miller, Neil R.; Nguyen, Angeline; Su, Wanhua; Bursztyn, Lulu L.; Poole, MeredithPurpose: To develop and validate neural network (NN) vs logistic regression (LR) diagnostic prediction models in patients with suspected giant cell arteritis (GCA). Design: Multicenter retrospective chart review. Methods: An audit of consecutive patients undergoing temporal artery biopsy (TABx) for suspected GCA was conducted at 14 international medical centers. The outcome variable was biopsy-proven GCA. The predictor variables were age, gender, headache, clinical temporal artery abnormality, jaw claudication, vision loss, diplopia, erythrocyte sedimentation rate, C-reactive protein, and platelet level. The data were divided into three groups to train, validate, and test the models. The NN model with the lowest false-negative rate was chosen. Internal and external validations were performed. Results: Of 1,833 patients who underwent TABx, there was complete information on 1,201 patients, 300 (25%) of whom had a positive TABx. On multivariable LR age, platelets, jaw claudication, vision loss, log C-reactive protein, log erythrocyte sedimentation rate, headache, and clinical temporal artery abnormality were statistically significant predictors of a positive TABx (P#0.05). The area under the receiver operating characteristic curve/Hosmer–Lemeshow P for LR was 0.867 (95% CI, 0.794, 0.917)/0.119 vs NN 0.860 (95% CI, 0.786, 0.911)/0.805, with no statistically significant difference of the area under the curves (P=0.316). The misclassification rate/false-negative rate of LR was 20.6%/47.5% vs 18.1%/30.5% for NN. Missing data analysis did not change the results. Conclusion: Statistical models can aid in the triage of patients with suspected GCA. Misclassification remains a concern, but cutoff values for 95% and 99% sensitivities are provided.Item Pseudo-likelihood inference underestimates model uncertainty: evidence from bayesian nearest neighbours(2011) Su, Wanhua; Chipman, Hugh A.; Zhu, MuWhen using the K-nearest neighbours (KNN) method, one often ignores the uncertainty in the choice of K. To account for such uncertainty, Bayesian KNN (BKNN) has been proposed and studied (Holmes and Adams 2002 Cucala et al. 2009). We present some evidence to show that the pseudo-likelihood approach for BKNN, even after being corrected by Cucala et al. (2009), still significantly underestimates model uncertainty.Item Revisioning the possible: aligning blended IL instruction with principles of EBP for meaningful nursing instruction(2021) Nelson, Jody; Foster, Alison; Asirifi, Mary; Gates, Melanie; Su, Wanhua; Velupillai, NirudikaThe MacEwan BScN program supports development of skills and attributes in the domain of clinical practice, including information literacy (IL) interventions in Year 2. Addressing a noticeable trend in 2018 of fewer students making connections between IL and evidence-based practice (EBP), librarians and instructors collaborated on an IL redesign, integrating IL and EBP in a blended learning (BL) context. The redesigned IL intervention, which pulls from best practices in online EBP instruction in nursing (Kelly et al., 2016), was implemented in 2019 with revised learning outcomes. Literature on IL instruction and EBP learning points to similarities, synergies, and value of a more fulsome integration in teaching (Adams, 2012; Amit-Aharon et al., 2020). While Adams (2012) emphasizes the importance of teaching IL concepts through a disciplinary lens, Amit-Aharon et al. (2020) note the significant positive correlation between IL self-efficacy, EBP attitudes and knowledge, and future EBP implementation in practice. Purpose: This Scholarship of Teaching and Learning (SoTL) research investigates the impact of the redesigned BL IL intervention on YR 2 nursing students’ perceived EBP confidence, attitudes, and ability, using an adapted Student EBP Questionnaire (S-EBPQ) (Upton et al., 2016).Item Scaffolding IL learning and EBP exploration in a semester-long journal club: impact on nursing student self-efficacy(2023) Nelson, Jody; Croxen, Hanneke; McKendrick-Calder, Lisa; Ha, Lam; Su, WanhuaNursing students require essential information literacy (IL) skills: locate research articles, assess for quality, and apply to practice-based scenarios. Understanding research remains a common challenge, with one study finding 40% of 2nd year nursing students have difficulty reading journal articles, yet stand-alone IL workshops rarely allow time needed to develop critical reading, assessment, and reflection practices. Our discovery-based, scaffolded IL learning approach is modeled on the student journal club, which has been found to positively impact students’ application of research in clinical contexts. By embedding IL instruction strategically throughout a 1st year nursing course we hoped to enhance understanding, mindset, retention, and transferability of IL. This study sought to identify the impact of the journal club on nursing student IL self-efficacy, as measured through the validated Information Literacy Self-Efficacy Scale.Item Statistical inference on recall, precision and average precision under random selection(2012) Su, Wanhua; Zhang, P.The objective of a rare target detection problem is to identify the rare targets as early as possible. Recall, precision and average precision are three popular performance measures for evaluating different detection methods. However, there is little literature on the statistical properties of these three measures. We develop a framework for conducting statistical inference on recall, precision and average precision through establishing their asymptotic properties. Simulations are used to illustrate the idea. The proposed methods can also be applied in other areas where ranking systems need to be evaluated, such as information retrieval.Item Threshold-free measures for assessing the performance of medical screening tests(2015) Yuan, Yan; Su, Wanhua; Zhu, MuBackground: The area under the receiver operating characteristic curve (AUC) is frequently used as a performance measure for medical tests. It is a threshold-free measure that is independent of the disease prevalence rate. We evaluate the utility of the AUC against an alternate measure called the average positive predictive value (AP), in the setting of many medical screening programs where the disease has a low prevalence rate. Methods: We define the two measures using a common notation system and show that both measures can be expressed as a weighted average of the density function of the diseased subjects. The weights for the AP include prevalence in some form, but those for the AUC do not. These measures are compared using two screening test examples under rare and common disease prevalence rates. Results: The AP measures the predictive power of a test, which varies when the prevalence rate changes, unlike the AUC, which is prevalence independent. The relationship between the AP and the prevalence rate depends on the underlying screening/diagnostic test. Therefore, the AP provides relevant information to clinical researchers and regulators about how a test is likely to perform in a screening population. Conclusion: The AP is an attractive alternative to the AUC for the evaluation and comparison of medical screening tests. It could improve the effectiveness of screening programs during the planning stage.