Projects


Google Business Intelligence Capstone Project

Create Business Intelligence solution by documenting stakeholder and project requirements, creating strategy document, mockups, ETL process, and Tableau dashboard.

Read more

Detection of Click Fraud

Use data mining techniques for detection of advertising click fraud.

Read more

Motivational Meme Generator

A Python project that creates memes from motivational quotes and pet images.

Read more

Classification of Job Listings

Use Natural Language Processing (NLP) to classify online Data Science job postings.

Read more

Google Data Analytics Capstone Project

Performed a data analysis for fitness tracker data by following the data analysis life cycle: Ask, Prepare, Process, Analyze, Share, Act. The goal was to draw useful insights from the fitness tracker data to help guide stakeholders in developing a marketing strategy.

Read more

Market Basket Analysis on US Census data

Perform Market Basket Analysis on United States Census data to extract patterns and associations from demographic data for New York state.

Read more

ANOVA for Health Care and ESG stocks

Analysis of Variance (ANOVA) to answer whether Health Care stocks and ESG stocks are to be considered together as a diversification strategy.

Read more

Learning Regression in New York State

Analysis of learning regression among school-age children in New York State during COVID-19. Learning regression is defined as a decline in student learning as measured by the gradation rate among school-aged children.

Read more

NEO Project

Binary classification of Near-Earth Objects (asteroids) using NASA data. Technologies include Python, Apache Spark, AWS Postgres, Jupyter notebooks, Machine Learning, and Tableau.

Read more

A fast-paced, dynamic program that covers specialized skills for the field of data analytics, including: Intermediate Excel, Python, JavaScript, HTML/CSS, API Interactions, SQL, Tableau, Fundamental Statistics, Machine Learning, R, Git/GitHub.

Read more

Data Engineering Projects

A repository of data engineering projects covering HDFS, Apache Kafka, Apache Airflow, and Apache PySpark.

Read more

ETL Data Pipelines for COVID-19 Data Analysis

ETL Data Pipelines orchestrated in Apache Airflow to load world-wide demographics and COVID-19 case data into AWS Redshift Data Warehouse.

Read more

ETL Process in Python

An ETL (Extract, Load, Transform) process for loading JSON song and artists data into Postgres database.

Read more

PISA Data Analysis

PISA (Program for International Student Assessment) is a survey assessment taken in 2012 of reading, mathematics and science representing about 28 million 15-year-olds globally. This project attempts to identify similarities and differences among the United States and the top two countries in math literacy, China and Singapore.

Read more

A/B Testing

Analyze an A/B test run by an e-commerce website in order to help the company understand if they should adopt the new web landing page, keep the old page, or perhaps run the experiment for a longer period.

Read more

Data Wrangling Project

Data analysis and cleaning of X (formerly Twitter) data feed.

Read more