Repository logo
 

Statistics - Student Works

Permanent link for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 10
  • Item
    Variable selection for clustering and classification of data with missing values
    (2024) O'Connell, Brynn; Franczak, Brian C.
    This poster presentation embarks on a comprehensive exploration of explicit variable selection procedures in model-based classification, where classification aims to assign labels to unlabelled observations. Delving into existing methodologies, we will dissect the intricacies of variable selection, setting the stage for an extensive examination of an approach aimed at minimizing within-group variance while maximizing between-group variance, known as Variable Selection for Clustering and Classification (VSCC). With a focus on enhancing classification accuracy and interpretability, we will unveil the details of VSCC, elucidating its significance in model-based classification frameworks. Furthermore, we will investigate how this approach performs when applied to simulated and real data sets with missing values. Through meticulous evaluation and analysis, we will scrutinize the performance and robustness of the variable selection approach in handling the challenges posed by incomplete data. Our findings will be synthesized into a comprehensive discussion, shedding light on the implications of the results and offering valuable insights for future research directions and refinements in variable selection methodologies within model-based classification.
  • Item
    Scaffolding IL learning and EBP exploration in a semester-long journal club: impact on nursing student self-efficacy
    (2023) Nelson, Jody; Croxen, Hanneke; McKendrick-Calder, Lisa; Ha, Lam; Su, Wanhua
    Nursing students require essential information literacy (IL) skills: locate research articles, assess for quality, and apply to practice-based scenarios. Understanding research remains a common challenge, with one study finding 40% of 2nd year nursing students have difficulty reading journal articles, yet stand-alone IL workshops rarely allow time needed to develop critical reading, assessment, and reflection practices. Our discovery-based, scaffolded IL learning approach is modeled on the student journal club, which has been found to positively impact students’ application of research in clinical contexts. By embedding IL instruction strategically throughout a 1st year nursing course we hoped to enhance understanding, mindset, retention, and transferability of IL. This study sought to identify the impact of the journal club on nursing student IL self-efficacy, as measured through the validated Information Literacy Self-Efficacy Scale.
  • Item
    Forecasting CAD/USD exchange rate
    (2023) Wu, Joyce; Anton, Cristina
    The exchange rate of Canadian dollars was closely bound up with the US dollars for the past decades. The last time that the Canadian dollar was worth more than the US dollar was in July 2011. It then experienced its fastest decline in modern-day history as commodity prices rapidly deteriorated. We use time series analysis to study the variation of CAD/USD exchange rate since 2010. We fit an ARIMA model and analyze how different economic and social policies in both countries affect the exchange rate.
  • Item
    Measuring the activity of Saccharomyces cerevisiae in relation to home-based additives by measured net weight loss
    (2022) Mainwaring, Shaun
    This research study is to measure the activity of saccharomyces cerevisiae through selected additives which have been added in the hydration step of making bread dough. The saccharomyces cerevisiae is sensitive to sugars (Mazzoleni, S. et al.2015) and by using multiple possible additives that can be found at home, we can compare which ones give a healthier yeast and therefore a better rise to the dough. As the saccharomyces cerevisiae ferments, it consumes the sugars naturally in the dough and creates an acidic environment to maintain its growth and produces CO2 as a product of this reaction, which is the cause for the rising dough. This can be tracked by how active the yeast is to its mean weight loss by measuring the weight loss of the three separate batches and comparing the results through a Multiple Comparisons of Means: Tukey Contrasts test to see if the significance to what is added to what was added to help the fermentation process of the yeast. We can see that easily soluble sugars are the best choices for promoting the health of the saccharomyces cerevisiae in by the test with F(9,20)=14.49, p<0.0001.
  • Item
    Measuring the activity of Saccharomyces cerevisiae in relation to home-based additives by measured net weight loss
    (2022) Mainwaring, Shaun; Buro, Karen
    This research study is to measure the activity of saccharomyces cerevisiae through selected additives which have been added in the hydration step of making bread dough. The saccharomyces cerevisiae is sensitive to sugars (Mazzoleni, S. et al.2015) and by using multiple possible additives that can be found at home, we can compare which ones give a healthier yeast and therefore a better rise to the dough. As the saccharomyces cerevisiae ferments, it consumes the sugars naturally in the dough and creates an acidic environment to maintain its growth and produces CO2 as a product of this reaction, which is the cause for the rising dough. This can be tracked by how active the yeast is to its mean weight loss by measuring the weight loss of the three separate batches and comparing the results through a Multiple Comparisons of Means: Tukey Contrasts test to see if the significance to what is added to what was added to help the fermentation process of the yeast. We can see that easily soluble sugars are the best choices for promoting the health of the saccharomyces cerevisiae in by the test withF(9,20)=14.49, p<0.0001.
  • Item
    Measuring the activity of Saccharomyces cerevisiae in relation to home-based additives by measured net weight loss
    (2022) Mainwaring, Shaun; Buro, Karen
    This research study is to measure the activity of saccharomyces cerevisiae through selected additives which have been added in the hydration step of making bread dough. The saccharomyces cerevisiae is sensitive to sugars (Mazzoleni, S. et al.2015) and by using multiple possible additives that can be found at home, we can compare which ones give a healthier yeast and therefore a better rise to the dough. As the saccharomyces cerevisiae ferments, it consumes the sugars naturally in the dough and creates an acidic environment to maintain its growth and produces CO2 as a product of this reaction, which is the cause for the rising dough. This can be tracked by how active the yeast is to its mean weight loss by measuring the weight loss of the three separate batches and comparing the results through a Multiple Comparisons of Means: Tukey Contrasts test to see if the significance to what is added to what was added to help the fermentation process of the yeast. We can see that easily soluble sugars are the best choices for promoting the health of the saccharomyces cerevisiae in by the test withF(9,20)=14.49, p<0.0001.
  • Item
    Kawhi Leonard’s impact on the Toronto Raptors’ 2019 playoff run as a Markov chain
    (2020) Lupul, Nicholas
    In the summer of 2018, the Toronto Raptors engineered a trade that would forever change the history of their franchise. The blockbuster trade saw NBA superstar Kawhi Leonard in a Raptors uniform in exchange for then franchise cornerstone DeMar Derozan. The trade was heavily criticized with fans and analysts alike claiming the organization gave up its future for a small chance at a championship. The Raptors went on to win the championship with Kawhi as their centerpiece. By studying their performance in the playoffs as two separate Markov chains, when Kawhi was playing and when he was resting, his contribution can be analyzed. It was assumed that his presence would account for more defensive stops and a more efficient offense. Upon analyzing the collected data, it was seen that his presence accounts for more points per game and offensive rebounds per game and a decreased number of defensive stops. In the future this type of analysis can be applied to data from any team at any level where relevant statistics are tracked. By analyzing one player’s impact on games, organizations will have a better idea of which players to trade away or trade for as well as how to distribute minutes.
  • Item
    Lennon v. McCartney
    (2020) Kroetch, Kimberly
    In this analysis, the chord progressions used in songs by the Beatles are modelled as Markov chains to identify potential differences between songs for which John Lennon had more influence and those for which Paul McCartney had more influence. A preliminary comparison of random samples of songs from each artist did not identify noteworthy differences between Lennon and McCartney; most pieces resulted in regular Markov chains. This analysis then focusses on two songs from the Beatles – “Norwegian Wood”, primarily written by John Lennon, and “Good Day Sunshine”, primarily written by Paul McCartney – which deviated from this pattern. Similar patterns were found between the two songs despite major differences in the chords that made up each state space. In general, however, McCartney’s song had more variety in terms of the number of chords used and the paths taken between tonic chords.
  • Item
    Clustering of time series cytotoxicity data
    (2020) Richard, Dan; Anton, Cristina
    To study the effect of various toxicants on cells’ growth, the Alberta Centre for Toxicology did several in-vitro experiments, and concentration response curves (TCRCs) were generated. Each TCRC represents a time series that gives the temporal evolution of the number of cells, after exposure to a chemical with a certain concentration. Here we use the wavelet transform to extract important features from the original TCRC data, and we apply self organizing maps to classify the toxicants according to their adverse biological response.
  • Item
    Parallelization of the mixtures of multivariate t-factor analyzers software
    (2015) Chalifour, Mathieu R.; Andrews, Jeffrey L.; Andrews, Jeffrey L.
    Mixtures of modified t-factor analyzers (MMtFA) are a family of statistical models that are used to find inherent groups in data. The significance of this family of models is that, unlike many other methods currently used, they are robust with respect to outliers and can be applied to high dimensional data. Although the field of clustering has been around for some time, it has only been in the last decade that the methodology has expanded due to modern computer capabilities, however, the MMtFA models take a significant amount of time to fit even on modern computers. This is a problem in the sense that applied scientists may be unwilling to adopt this approach to cluster analysis purely due to the time costs. The main goal of the research was to optimize the software for the MMtFA family, which is being developed for the R computing platform (free, open-source, compatible with nearly all operating systems). This was done in two ways: parallelization and general efficiency fixes. Parallelization allows each model in the (MMtFA) family software to be estimated independently using different processors within a computer (most desktops have at least four processor cores), instead of a single processor. This greatly reduces the time expense for running the algorithm. Since time is not the only factor of great importance we will also provide comparisons with other clustering algorithms across several data sets to measure performance.