From Contextual Recommender Systems to a Transformer-Based Architecture

At Peloton, there are two major avenues for Members to find their workout classes: the on-demand library and the home screen. The on-demand library is a catalog view of all classes available on the Peloton ecosystem, and users can apply various filters and sorting or perform a free-form text search to find classes. The home screen, much like in other content-streaming services, is the first screen that Members see upon logging in and serves as the gateway to the Peloton experience. Over the last few years, the variety of our platform’s classes has grown significantly. We now boast a total of 53K instructor-led classes, 120+ class types, 50+ instructors, and 10+ fitness disciplines. With such diversity, it can sometimes feel a bit overwhelming for users to navigate through all the offerings.

This is why, since its inception, the Personalization team has been focused on providing our users with the best possible home screen recommendations.

We have gone through a multi-stage journey with the goal of providing the best recommendations to our users and enhancing their fitness experience. We began in 2022 with our earliest recommendation model LSTMNet. This model used the power of sequential modeling using our members’ workout history to get a sense of their progression over time. We then improved on LSTMnet by using Contextually Aware Recommender Systems (CARS). With CARS we improved our home screen recommendations with contextual features like the amount of time the user has to work out, the instructors and class types a user gravitates towards, and the recent fitness state of the user.

However, the CARS model had some shortcomings:

The training data did not include visibility into what members saw in their recommendations. It only had visibility into their workouts. This was a problem because if the model kept recommending the same class every day and the member kept ignoring it every day, the model would never know the implicit negative feedback.
When we moved from LSTMNet to CARS, we gained the ability to use complex features in the model which led to improvements, but at the same time lost a nice framework for making the model understand the sequential nature of historical workouts. A model architecture that can do both should ideally perform better.
Finally, the framework used to train our models has a myopic view of the world. The model is provided with labeled training data from a certain period and it is tasked to understand them as well as it can. It does not remember what happened before nor does it persist an understanding of user behavior that can help reduce uncertainty in the future.

Addressing all of the shortcomings listed above requires solving a multitude of ML and engineering problems with interdependencies which can get complicated quickly. It is useful, in such cases, to organize the work around a firm central framework. We chose a contextual bandit based on an online learner to provide that central statistical pinning. We chose a contextual bandit framework because it allows us to understand how the context - such as the user and class data at a specific point in time - influences the model’s selection of recommendations (the actions) and how these actions translate into the rewards received. To bring back the ability of the model to understand the sequential nature of workouts, we decided to use a transformer model. Basing this model on an online learning technique helped us mitigate our restriction of being limited to a fixed window of training data and be able to maintain a long-term understanding of user behavior.

Use of impressions to generate training data

I_u,c,t > User u sees Class c at Time t on the home screen

Also, a workout W is defined as the point in time when User u completes Class c.

W_u,c,t > User u completes Class c at Time t

Completing a workout or taking a class is a strong positive implicit signal that a user enjoys that class. Not taking a workout shown on the home screen is also a negative implicit signal the user is not interested in that class. As mentioned above, CARS didn’t have visibility into what the members saw in their recommendations which means CARS did not receive any implicit negative signals from what the user saw but did not like. To solve this problem we decided to incorporate impressions into model training. Using events fired from the home screen we were able to identify the classes a user saw on the home screen. Then corroborating the impressions to the workouts done by a user we were able to gain signals from the user on whether a class should be labeled as a positive sample or a negative sample.

Mathematically, this can be represented as:

(I_u,t ⋂ W_u,t) ∪ (W_u,t-(I_u,t ⋂ W_u,t)) = P_u

I_u,t - (I_u,t ⋂ W_u,t) = N_u

Where

u = User

t = point in time

I_u,t are the impressions seen by User u at Time t

W_u,t are the classes taken by the User u at Time t

P_u are the positive samples we obtain for User u. These are the set of classes that the user saw on the home screen and converted to a workout or any classes the user completed even if they were not recommended on the home screen (directly from Search or from On Demand Library).

N_u are the negative samples we obtain for User u. These are the set of classes that the user saw on the home screen but did not convert to a workout.

This shift in how we generated our negative samples compared to random negative sampling used previously helped massively improve the way the model learns.

Workout history

The second problem we mentioned above was how CARS lost the ability to understand the sequential nature of historical workouts when we shifted the modeling framework from LSTMNet to CARS.

The Personalization team has invested in a feature store that enables us to compute new types of dynamic (time-dependent) features. The best way to use these features in a model is to contextualize them in the sequence of historical workouts. We call this feature the user workout history. The intuition behind using the workout history is to get a sense of how the model would rank a class based on the user’s past workouts.

Taking an example, we see a user's workout history with an affinity for Yoga, Strength, and Cycling. The user takes 10-minute yoga and 30 or 45-minute Strength and Cycling classes. We also see the user likes certain instructors, like Alex Toussaint, Adrian Williams, and Aditi Shah. We also see that the user likes to alternate between Strength and Cycling while also taking a Yoga class to relax intermittently. The question we want the model to answer is, given this workout history, what is the probability of the user taking a 20-minute Strength class by Olivia Amato next?

To build this intuition into our model we needed to embed the workout history into something an ML model can understand. A transformer model is a neural network architecture designed to process sequential data using a self-attention mechanism, which captures contextual relationships between elements in a sequence. This is why we decided to use a transformer to encode the workout history and use the output embeddings when ranking a class for the user.

Online learning with cost savings

The third shortcoming of CARS was its myopic view of the world. We used to train the CARS model on data going back about 1 year. However, we wanted to implement not just a smarter model but also a cost-effective way to train our model. We implemented an online learning technique to achieve this. We split our training data by date partitions and fine-tuned the most recent model on the data changes (delta) between the last training session and the current date.

This can be mathematically represented as

M_d = f(M_d-1,D_d)

Where M_1 = f'(D_0...D_n)

Where M_d is the model trained on day d and D_d is the dataset on day d. The first model we train M_1 is initialized by doing a cold start training of the model using data from day 0 to n, where n is a hyperparameter we tuned.

Using online learning we are now able to maintain a long-running understanding of a user’s behavior. This also reduced the time and the cost to train our models significantly since now we are just fine-tuning our model on 1 day of data rather than training it on a massive dataset.

Conclusion and looking ahead

We successfully demonstrated an approximately 5% improvement in home screen click-through rate using this new framework via an A/B test. We also launched this new model on the Peloton Bike, Peloton Tread, and Peloton App while reducing the cost to train our model by approximately 128x and the time to train our models by approximately 48x.

We want to thank our team members Neel Talukder and Munesh Bandaru who were instrumental in building out our Feature store and several features like the user workout history which was key to our model.

Though we currently do not have any improvements planned on this model, we would like to explore the possibility of real-time inference in the future in order to dynamically generate recommendations.

We hope that the learnings shared in this blog could help inform any other teams looking to build out recommender systems. We have an exciting roadmap ahead as we keep tackling the very rewarding and unique challenges we face at the intersection of recommender systems and fitness. If you or anyone you know is interested in working with us, please visit our careers page and join the ride!

Glossary

Class: A class is defined as an instance of an on-demand video that our members use to exercise. Classes belong to a wide range of fitness disciplines, are filmed by instructors, and have a certain duration.

Online Learning: Online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step. More information on the Wikipedia page.

Workout: A workout is an instance of a user completing a class at a point in time.

Written by Mohit Jeste & Nganba Meetei