Data Consultant, Speaker, Trainer and STEM Ambassador with an MSc in Data Science. Experience in a wide range of data skills with a focus on adding value to clients with their data. Currently working on AI, NLP and Power BI projects.
The aim of this project in R is to classify potentially hazardous asteroids and comets, according to the risk of impacting earth.
Data is obtained from NASA's API for Near Earth Objects (NEOs) and saved in JSON format. Two classifiers are used for comparison, Support Vector Machines and Random Forest.
Database designed and created using MySQL to hold fictional travel data for multiple families, with tables for families, activities, destinations, trips and expenses. The schema for the database showing the primary and foreign keys is included. The exported database in the repository can be downloaded and run.
The scripts demonstrate creation, filling of data in tables, joins, stored function, subquery, view, groupby and trigger.
The aim of this project in Python is to classify facial recognition images to help enforce COVID mask Rules. It aims to check if the detected face is correctly masked, unmasked or incorrectly masked.The dataset contains a good variation in terms of age, ethnicity and gender.
To compare, four types of pre-processing is applied on the four chosen classifiers(SVC Linear, ADA Boost, Random Forest, Convolutional neural network (CNN)).CDSMOTE a hybrid approach to correcting class imbalance is applied as well.
This portfolio project in PowerBI is designed for the sales data of a fictional toy store in Scotland with multiple branches. The data once profiled and cleaned is modelled as Star Schema.
The interactive report published KPI uses card visuals, slicers and date hierarchy.
This data mining project in R involves several tasks related to exploring and analyzing an unknown/ unfamiliar dataset. It covers various aspects of data exploration, preprocessing, classification, clustering, and additional insights, demonstrating a comprehensive analysis of the dataset.
This project in Python is a baseline version of a larger project that cannot be shared for proprietary reasons. The Stanford Natural Language Inference (SNLI) Corpus is used which is a collection of
over 570,000 sentence pairs (a premise-hypothesis pair) with a label assigned for each pair.
This portfolio project in R explores the NHS hospital admissions due to unintentional injuries in Scotland, combined with Council Areas and Health Board data. A visualisation is created for a specific scenario.
This project is implemented in Microsoft SSIS using sales data for a fictitious company. The ETL (Extract, Transform, Load) process was employed to model , extract, transform and load data from an excel format. The transaction data is then analysed using DAX and Pivot tables, answering questions like MTD profit and product brand rank.