Big Data - Queen's University 2024

Embarking on a project to dissect the intricacies of customer reviews on Amazon.com, the team aimed to leverage data analysis and machine learning to identify the hallmarks of helpful reviews. Within the framework of Apache Spark on Databricks, the team embarked on trend exploration to pinpoint patterns that denote review helpfulness. This quest extended into a Kaggle competition, where the challenge was to develop a machine learning model that could accurately predict review helpfulness based on both existing and engineered features. Through iterative coding and machine learning practices, the team sought optimization with the aid of ChatGPT or other large language models (LLMs), refining their approach for enhanced efficiency. This rigorous process revealed key insights, such as the critical role of feature engineering in model accuracy and the advantages of advanced computational resources for expediting model execution. The culmination of these efforts was a comprehensive strategy that underscored the potential of data-driven analysis to illuminate the factors contributing to the helpfulness of online product reviews, paving the way for enhanced customer satisfaction and informed purchasing decisions on the world's largest retail platform.