Browsing by Author "El-Hajj, Mohamad"
Now showing 1 - 14 of 14
Results Per Page
Sort Options
Item Analysis of hockey forward line Corsi: should the focus be on forward pairs?(2024) Brownlee, Samuel; Khan, Ayesha; Vanderzyl, Barnaby; El-Hajj, MohamadProfessional ice hockey is a popular sport in North America, with multiple previous analyses providing insights into teams. Most research has been done on analyzing pairs of players on the same team that work well together. The focus of this study was to analyze if trios on a forward line perform well together, as there has not been enough research in this field. Our goal was to determine if the third player changes the performance of a duo and identify key factors that explain this change. We have analyzed more than 14 years worth of data. This data started with more than 100 dimensions; from those 100, 35 dimensions were chosen for analysis. To reach our conclusion, we used three methods: K-Means, Random Forest, and Support vector machines. Single variate random forest was used to analyze which variables affected the Corsi Percentage. The results from K-Mean clustering, combined with the results from Single Variate Random Forest, were used to see if the substitution of a third player on a line of three makes a difference in the overall performance of the line. The Support Vector Machine algorithm was used to reinforce the cluster numbers obtained from K-means clustering. Our study found that adding a third player will have a positive effect when the third player consistently plays with the other two players and the three players participate more effectively in defence. These findings could help teams plan how they form their player lines when they want to achieve good game results.Item An analysis of rock climbing sport regarding performance, sponsorship, and health(2018) Huynh, Huy; Sobek, Elliott; El-Hajj, Mohamad; Atwal, SunnyThe basis of this work was to extract knowledge using data mining techniques over 2 million records from rock climbing competitions all over the world. The first phase of the project involved heavy data cleaning and preprocessing procedures to prepare the data for the mining models. After the first phase was completed, we explored three main questions: which factors can predict the performance of a competitor, which factors will likely lead to a sponsorship of a competitor and is it possible to predict healthiness of a competitor. This research will not only help rock climbers gain significant insights about their performance, it will also help sports sponsors choose the best candidates to assign their brand to.Item Analyzing factors impacting COVID-19 vaccination rates(2023) Cho, Dongseok; Driedger, Mitchell; Han, Sera; Khan, Noman; Elmorsy, Mohammed; El-Hajj, MohamadSince the approval of the COVID-19 vaccine in late 2020, vaccination rates have varied around the globe. Access to a vaccine supply, mandated vaccination policy, and vaccine hesitancy contribute to these rates. This study used COVID-19 vaccination data from Our World in Data and the Multilateral Leaders Task Force on COVID-19 to create two COVID-19 vaccination indices. The first index is the Vaccine Utilization Index (VUI), which measures how effectively each country has utilized its vaccine supply to doubly vaccinate its population. The second index is the Vaccination Acceleration Index (VAI), which evaluates how efficiently each country vaccinated their populations within their first 150 days. Pearson correlations were created between these indices and country indicators obtained from the World Bank. Results of these correlations identify countries with stronger Health indicators such as lower mortality rates, lower age-dependency ratios, and higher rates of immunization to other diseases display higher VUI and VAI scores than countries with lesser values. VAI scores are also positively correlated to Governance and Economic indicators, such as regulatory quality, control of corruption, and GDP per capita. As represented by the VUI, proper utilization of the COVID-19 vaccine supply by country is observed in countries that display excellence in health practices. A country’s motivation to accelerate its vaccination rates within the first 150 days of vaccinating, as represented by the VAI, was largely a product of the governing body’s effectiveness and economic status, as well as overall excellence in health practises.Item Analyzing factors impacting COVID-19 vaccination rates(2023) Cho, Dongseok; Driedger, Mitchell; Han, Sera; Khan, Noman; Elmorsy, Mohammed; El-Hajj, MohamadSince the approval of the COVID-19 vaccine in late 2020, vaccination rates have varied around the globe. Access to a vaccine supply, mandated vaccination policy, and vaccine hesitancy contribute to these rates. This study used COVID-19 vaccination data from Our World in Data and the Multilateral Leaders Task Force on COVID-19 to create two COVID-19 vaccination indices. The first index is the Vaccine Utilization Index (VUI), which measures how effectively each country has utilized its vaccine supply to doubly vaccinate its population. The second index is the Vaccination Acceleration Index (VAI), which evaluates how efficiently each country vaccinated their populations within their first 150 days. Pearson correlations were created between these indices and country indicators obtained from the World Bank. Results of these correlations identify countries with stronger Health indicators such as lower mortality rates, lower age dependency ratios, and higher rates of immunization to other diseases display higher VUI and VAI scores than countries with lesser values. VAI scores are also positively correlated to Governance and Economic indicators, such as regulatory quality, control of corruption, and GDP per capita. As represented by the VUI, proper utilization of the COVID-19 vaccine supply by country is observed in countries that display excellence in health practices. A country’s motivation to accelerate its vaccination rates within the first 150 days of vaccinating, as represented by the VAI, was largely a product of the governing body’s effectiveness and economic status, as well as overall excellence in health practises.Item Analyzing factors that lead to NBA regular season success(2024) El-Hajj, Mohamad; Steed, Jackson; Gore, Victor; Infante, Craeg; Flores, Raniel; Wakista, Danindu; Elmorsy, MohammedThe National Basketball Association (NBA) values regular-season success and acknowledges the crucial role of a team’s roster composition in determining overall performance. This study uses machine learning techniques, specifically unsupervised learning clustering and decision tree models, to predict the composition of a winning roster. Our research identified three distinct clusters based on win percentage and the distribution of players across different skill levels. Successful teams typically have more top-tier players and a significant representation of players in the lowest skill level. In contrast, teams that spread their talent across the entire roster are less successful. We have noticed that players with average to above-average skills are notably affected by excessive playing time in the previous game, which leads to decreased performance and potential losses for the team in the next game. Considering the time of year and the gap between games, we recommend prioritizing the rest and recovery of top players, especially in the latter half of the season. It’s crucial to ensure that players who are not as skilled as the top players but still make significant contributions to the team maintain consistent performance, especially during the first half of the season. Analyzing height’s impact on basketball player performance has revealed practical insights that can empower coaches and management. We found that the shortest and tallest players often perform less than those of average height. Most top performers in the NBA tend to have heights closer to the average. However, for players who frequently operate near the net and encounter numerous rebound opportunities, it is generally preferable to have an average or taller player for slightly enhanced overall performance compared to below-average height players. Teams can use these insights to improve their roster construction and maximize player utilization by coaches from one game to the next. This research provides practical strategies that can be immediately implemented to enhance team performance.Item Analyzing patterns of car speeding in an urban environment using multivariate functional data clustering(2023) Smith, Iain; Dobosz, Dominic; El-Hajj, MohamadTraffic flow and speed differences between cars are important factors that indicate the likelihood and danger of collisions. A vital part of intelligent transportation systems is discovering important locations to monitor and ticket speeding vehicles. To find these locations, we study data from a low-density city. We identify three critical road groups that indicate risk levels based on car speed differences and weather conditions. We find that these groups have differing weekly trends, which allow traffic enforcement time to change locations to enforce them. We create an analysis that an intelligent transportation system could automate to reduce risk on these roads and save city resources on enforcement.Item An association analysis of breast cancer with carotenoids(2023) Neumann, Samuel; El-Hajj, MohamadThe environment and the exposure individuals carry throughout their lifetime can gar- ner diverse effects on their health. This paper discusses the application of association analysis, to determine relationships between carcinogenesis and the human exposome. Human exposome data from the World Health Organization was analyzed to determine associations between human exposure and breast cancer. The discovered associations outline specific factors that may be associated with the prevention or causation of breast cancer. We discovered an association between biomarkers in specific biospecimens and breast cancer. Xanthophylls, measured in two different biospecimens, were determined to be associated with American breast cancer patients. The associations discovered may be of use in future cancer studies. This research is particularly interesting because of xanthophylls’ relationship to retinol, inhibiting oncogenesis. Providing support and data for such associations will encourage more research on the exposome’s effect on breast cancer and other conditions.Item The effects of neighbourhood characteristics on crime incidence(2018) Letourneau, Steven; Ell, Nathan; Cheung, Peter; McCaskill, Jordan; El-Hajj, MohamadUsing data from the City of Edmonton, Canada Open Data Portal, an exploration process is undergone using data mining techniques to help detect unseen relationships between tangible spatial characteristics and non-tangible crime incidences. These findings will help law enforcement and city planners make empirically based decisions and avoid the misappropriation of public resources. Using frequent pattern analysis to examine neighbourhood attributes that occur alongside crime provides insight into why crime occurs. These techniques include clustering, classification algorithms, and association algorithms. Results of the analysis on neighbourhood spatial characteristics indicate that dwelling structure type and tree density relate to incidence of neighbourhood crime, while other neighbourhood spatial characteristics bear no relationship. Results also show that intangible neighbourhood characteristics indicate that the distribution of yearly household income and employment and school enrollment levels relate to incidence of neighbourhood crime. The distribution of yearly household income bears a relationship to crime type, specifically violent vs non-violent types.Item Enhancing patient care: machine learning’s role in reducing wait times for medical procedures(2025) El-Hajj, Mohamad; Collins, Liam; Steed, Jackson; Heß, Claudia; Kunz, SibylleThe healthcare system faces a critical challenge with extended wait times for medical procedures, significantly impacting both patients and healthcare professionals. While increasing funding and hiring more doctors may seem like effective solutions, these approaches are often impractical due to various constraints. This research examines the factors driving medical procedure wait times in Canada, specifically in British Columbia, Nova Scotia, and Quebec, highlighting the urgent need to address delays caused by resource limitations. By leveraging machine learning techniques—including random forest methods, k-means clustering, and linear regression—alongside statistical models such as bar graphs, correlation matrices, and z-score normalization, the study, conducted in both Python and R Studio, identifies key contributors to these delays. Based on the findings, a strategic approach to physician hiring is proposed, emphasizing the optimization of seniority levels. Specifically, the study recommends capping the hiring of entry-level doctors at 18% and senior-level doctors at 5%, while increasing the absolute population of entry-level physicians by 27% and reducing the physician-to-100,000 population ratio by 2%, which could lead to a 15% reduction in wait times. By addressing the complexities of medical procedure delays, this research aims to enhance the efficiency and fairness of surgical care delivery.Item Fitting and filtering functional data for use in video data analysis(2024) Smith, Iain; El-Hajj, MohamadOur research focuses on advancing the capabilities of machine learning applications that involve analyzing video data. To achieve this, we have created a novel method for integrating functional data into video. Our approach entails the direct application of convolutional filters to functional data, as well as the introduction of new filters that make use of derivatives, which represent an exciting avenue for further exploration. In order to validate the effectiveness of our approach, we conducted experiments using both synthetic and real-world datasets. These experiments helped us establish our method’s potential in practical scenarios. We propose a specific parameter ratio for incorporating functional data into the original input frames. This parameter ratio has been shown to require less information while offering substantial potential for exploration within the realm of machine-learning applications for video data. Furthermore, we found that additional operations applicable to functions, such as derivatives, yield valuable information that can be harnessed to enhance machine learning applications involving video data. This opens up exciting possibilities for leveraging the richness of functional data in video analysis.Item Leveraging machine learning to predict factors that drive successful basketball team formation(2025) El-Hajj, Mohamad; Kwon, Benjamin; Jethro Infante, Craeg; Steed, Jackson; Gore, Victor; Phan, Nhi; Elmorsy, Mohammed; Pang, XiaodanThis study delves deep into the key factors affecting the likelihood of NCAA basketball players getting drafted into the NBA. The study highlights the importance of offensive metrics such as points scored and offensive ratings in predicting an NCAA player’s chances of being drafted into the NBA by utilizing an unsupervised learning clustering model and a supervised decision tree model. This underscores the significance of offensive statistics in a player’s skill set and suggests that players and coaches should prioritize improving these metrics to enhance a player’s draft potential. The study found that defensive metrics like defensive ratings and blocks have less impact on overall draft potential than offensive metrics. A crucial point to note is that a team’s success often relies on having its top players actively participating on the court. This research enhances our understanding of the factors influencing the draft prospects of NCAA basketball players. It underscores the advancement of basketball analytics and paves the way for further research on player performance metrics and their influence on the scouting and selection of professional athletes.Item Mining COVID-19 data to predict the effect of policies on severity of outbreaks(2023) El-Hajj, Mohamad; Anton, Calin; Anton, Cristina; Dobosz, Dominic; Smith, Iain; Deiab, Fattima; Saleh, NagamDuring the years 2020, 2021, and partially 2022, the COVID-19 virus ran rampant across the globe, causing devastating effects on the masses. Using data mining techniques, we explored factors linked to severe cases of COVID-19 and tried to identify the effect of different government policies on the evolution of the severity of infections. Four countries were selected with a date range of the year 2021 to investigate each region’s efforts regarding vaccine distribution and specific policies enacted for COVID-19 suppression. Pearson’s Correlation Coefficients were used to help establish initially relationships between the policies, vaccines, and severe cases. We used the identified factors to predict the number of new COVID-19 cases and hospital ICU admissions. We included all the country data from Our World in Data (OWID) for this phase. Our investigation indicates that, given enough data, long-range trend predictions can be obtained using Random Forest Regressors. A trained Random Forest model can readily explain factors that effectively slow the spread of COVID-19. With proposed policies given as input, the model can return the expected number of cases, thus informing policies without spending multiple weeks tracking results.Item Power profiling of smart grid users using dynamic time warping(2025) Kim, Minchang; Daghmehchi Firoozjaei, Mahdi; Kim, Hyoungshick; El-Hajj, MohamadPower consumption data play a crucial role in demand management and abnormality detection in smart grids. Despite its management benefits, analyzing power consumption data leads to profiling consumers and opens privacy issues. To demonstrate this, we present a power profiling model for smart grid consumers based on real-time load data acquired from smart meters. It profiles consumers’ power consumption behavior by applying the daily load factor and the dynamic time warping (DTW) clustering algorithm. Due to the invariability of signal warping of this algorithm, time-disordered load data can be profiled and consumption features can be extracted. By this model, two load types are defined and the related load patterns are extracted for classifying consumption behavior by DTW. The classification methodology is discussed in detail. To evaluate the performance of the proposed model for profiling, we analyze the time-series load data measured by a smart meter in a real case. The results demonstrate the effectiveness of the proposed profiling method, achieving an F-score of 0.8372 for load type clustering in the best case and an overall accuracy of 77.17% for power profiling.Item Preliminary results on using clustering of functional data to identify patients with alzheimer’s disease by analyzing brain MRI scans(2025) Anton, Calin; Anton, Cristina; El-Hajj, Mohamad; Craner, Matthew; Lui, RichardThis study delves into the effectiveness of funWeightClust, a sophisticated model-based clustering technique that leverages functional linear regression models to pinpoint patients diagnosed with Alzheimer’s Disease. Our research entailed a thorough analysis of voxelwise fractional anisotropy data derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, with a particular emphasis on the Cingulum and Corpus Callosum, which are critical regions of interest in understanding the disease’s impact on brain structure. Through a series of experiments, we established that funWeightClust is efficient at distinguishing between patients with Alzheimer’s Disease and healthy control subjects. Notably, the clustering model yielded even more pronounced and accurate results when we focused our analysis on specific brain regions, such as the Left Hippocampus and the Splenium. We postulate that integrating additional biomarkers could significantly enhance the accuracy and reliability of funWeightClust in identifying patients who exhibit signs of Alzheimer’s Disease.