Surya's Portfolio

NASA NEOs Classification

The aim of this project in R is to classify potentially hazardous asteroids and comets, according to the risk of impacting earth.

Data is obtained from NASA's API for Near Earth Objects (NEOs) and saved in JSON format. Two classifiers are used for comparison, Support Vector Machines and Random Forest.

Github for this project

Database creation - Clan Adventures

Database designed and created using MySQL to hold fictional travel data for multiple families, with tables for families, activities, destinations, trips and expenses. The schema for the database showing the primary and foreign keys is included. The exported database in the repository can be downloaded and run.

The scripts demonstrate creation, filling of data in tables, joins, stored function, subquery, view, groupby and trigger.

Github for this project

Image Classification to aid Mask Enforcement

The aim of this project in Python is to classify facial recognition images to help enforce COVID mask Rules. It aims to check if the detected face is correctly masked, unmasked or incorrectly masked.The dataset contains a good variation in terms of age, ethnicity and gender.

To compare, four types of pre-processing is applied on the four chosen classifiers(SVC Linear, ADA Boost, Random Forest, Convolutional neural network (CNN)).CDSMOTE a hybrid approach to correcting class imbalance is applied as well.

Github for this project

Tartan Toy Emporium

This portfolio project in PowerBI is designed for the sales data of a fictional toy store in Scotland with multiple branches. The data once profiled and cleaned is modelled as Star Schema.

The interactive report published KPI uses card visuals, slicers and date hierarchy.

Github for this project

Data Mining and Analysis

This data mining project in R involves several tasks related to exploring and analyzing an unknown/ unfamiliar dataset. It covers various aspects of data exploration, preprocessing, classification, clustering, and additional insights, demonstrating a comprehensive analysis of the dataset.

Github for this project

Natural Language Processing

This project in Python is a baseline version of a larger project that cannot be shared for proprietary reasons. The Stanford Natural Language Inference (SNLI) Corpus is used which is a collection of over 570,000 sentence pairs (a premise-hypothesis pair) with a label assigned for each pair.

Github for this project

Hospital Admission due to Unintentional Injuries

This portfolio project in R explores the NHS hospital admissions due to unintentional injuries in Scotland, combined with Council Areas and Health Board data. A visualisation is created for a specific scenario.

Github for this project

Data Warehousing and Analysis

This project is implemented in Microsoft SSIS using sales data for a fictitious company. The ETL (Extract, Transform, Load) process was employed to model , extract, transform and load data from an excel format. The transaction data is then analysed using DAX and Pivot tables, answering questions like MTD profit and product brand rank.

Github for this project

More PowerBI

Coming Soon.

Surya L Ramesh