I am a data scientist with over two years of experience applying machine learning and natural language processing techniques to real-world challenges in marketing, economics, and finance. With a strong foundation in mathematics and economic research, I bring analytical depth to data-driven projects.
My recent work has focused on developing predictive models and text-based solutions to support decision-making and business strategy. I hold a Master’s degree in Data Science with a specialization in Computational Intelligence from Coventry University.
Issued by Microsoft – May 2025
Issued by IBM – June 2024
Issued by Google – Nov 2024
Supervised ML: Regression and Classification
Advanced Learning Algorithms
Issued by DeepLearning.AI – May 2025
Deep Learning with PyTorch: Image Segmentation
PyTorch Essential Training: Deep Learning
Built an obesity level prediction system using PySpark machine learning algorithms. I explored data distributions with Tableau, applied z-score normalization, and used grid search with 5-fold cross-validation for hyperparameter tuning. Logistic regression achieved best performance with 0.944 F1-score and accuracy, followed by random forest (0.939) and decision tree (0.904), all exceeding baseline performance.
Click Tableau
to view EDA
Built a credit risk prediction system that identifies potential credit card defaults for consumers. I addressed key challenges of class overlap and data imbalance by engineering payment consistency and delay features, implementing SMOTE within stratified 5-fold cross-validation. Using randomized search optimization across five machine learning methods, XGBoost achieved highest validation ROC-AUC (0.77) and strongest test performance (0.73 recall, 0.72 F1-score), followed by SVM and KNN, with all models outperforming baselines.
Click Demo
to Explore
I built an AI-powered web application that makes it easy to digest long-form text by generating concise summaries and highlighting key entities like people, organisations, and locations. The app combines two powerful NLP models — facebook’s bart-large-cnn to create clear, informative summaries, and dslim’s bert-base-NER to identify important named entities.
Click Demo
to Explore
Built a machine learning pipeline to predict benzene concentrations using Gaussian Process Regression (GPR) and other models. After cleaning skewed data caused by -200 null encodings, I extracted temporal features and compared four models. GPR achieved the best performance (RMSE: 2.33, R²: 0.883) and provided uncertainty estimates for more informed environmental decisions.
This project tackles the SemEval-2017 Task 4A challenge: sentiment classification of English-language tweets. The goal is to classify each tweet as positive, neutral, or negative using supervised machine learning models.
In this project, we aim to predict customer churn using various features such as payment methods, contract type, and whether a customer streams TV or movies. This task is crucial for companies to identify potential 'churners' and take actions to retain them.