
Summary
I built a book rating prediction algorithm in Python, using collaborative filtering to identify users with similar tastes and generate predictions for how a user might rate a given book. This was a super exciting project for me as the algorithm I developed works similarly to how the algorithms behind social media do. I wanted to understand more about how companies create absurdly addictive content recommendation systems. The reason I created a book rating prediction as opposed to a recommendation system is it’s a lot easier to test, while still having the essence of a recommendation system. I compared the predicted score to the actual score a user gave to a book to assess how effective my algorithm was.
The project also introduced me to handling larger datasets efficiently. While the data could be filtered to fit into memory, I used Dask to parallelise computations and make the algorithm scalable for even larger datasets. I created an algorithm that had a time complexity of O(n), so it scaled linearly with problem size. There were a few flaws with my design which I outlined in the “Critique of Design and Project” section of my report, but overall I learnt a lot about how recommendation systems work.
Key Skills
-
Python
-
Sklearn, Dask
-
Parallel computing
-
​Cleaning, analysing and visualising data from a large dataset​
-
Solo project time management

