Article

Warm Recommendations For The AI Cold-Start Problem

•

May 23, 2024

•

10 min read

In our previous post, we explored solutions to kickstart an AI-driven product even if the initial training data is sparse. We discussed how you and your team might start collecting user data or sourcing existing internal or external training data. Based on this initial data, let’s assume that the machine learning algorithm was successfully trained and tuned, and it performs well when the user's interaction history is available. However, what happens when a new user starts using your product?

Today, we'll delve into the strategies for providing personalized recommendations, even for users who are interacting with your AI-driven product for the first time.

Need for Data, Part 2

Let’s take the example of the ‘classic’ recommendation systems, given their wide scale usage. You, as the Product Manager of a recently launched ecommerce product with such an ML-driven recommendation engine, noticed that the algorithm performs well when users’ online purchase history is available, quickly becoming a real value driver for your loyal customer base.

At the same time, the ecommerce site is acquiring new users rapidly and you want to provide this key differentiator feature to these new users as well from the get-go. However, your Data Scientist warns that the current model's accuracy isn't sufficient until having at least 4 or 5 completed orders. That might mean it could take over a month to experience this ‘wow’ feature, even for the most frequent, weekly active users (like frequent online grocery shopping audience), let alone for users engaging with your product only once or twice a month.

So what would you as the Product Manager do in such a situation: wait for weeks to delight your fresh users with this cornerstone feature, risking that they might churn in the meantime, or hunt for data once again!

The Cold Start Problem

Now, you might wonder, why would you need data to run an algorithm that is already trained? The ‘data is the fuel for your AI system’ analogy continues here as well: cold starting certain (recommendation) engines will not be as smooth and efficient compared to running a ‘warmed up’ engine.

So in the above ecommerce example, if the recommendation system doesn’t have sufficient information about the new users’ preferences, the supposedly personalized recommendations produced by the algorithm might be simply inaccurate or just too generic. In some extreme cases, it can be even better not to show any results, than show results that might disappoint the customer, who may never return to the feature, or worse, to the platform itself. For example, recommending meat products for a vegan or vegetarian user, just because meat is popular amongst most users!

But fear not, sourcing the right data can help you overcome this challenge as well. I will share a unique approach from my professional experience later on, so keep reading.

While in this article we focus mainly on the new user related cold start challenge, similar situations could arise with new products within recommendation and search systems. Also, the impact and mitigation heavily rely on the algorithms used in your product. To learn more about these cases and challenges based on specific algorithms used, check out this or this research papers or have a chat with the Data Scientist of your product team.

Solution 1: Explore Preferences During Onboarding

A widely used solution to "warm up” recommendation engines is asking users explicit questions about their goals, preferences, and interests during registration and their first few interactions with the product. These inputs collected during onboarding can help not only the product recommender system, but also other personalization features to tailor the user experience.

Think about how your favorite streaming services ask new users about the movies or genres of music they like, or some grocery shopping site may ask fresh sign-ups about their dietary habits along with potential food intolerances and allergies in order to provide more accurate, safe and satisfying recommendations, even to brand new users. Custom AI songs can also use similar personalization to craft music tailored to individual preferences, creating a more immersive and unique experience

The data collected in this early phase of user engagement can then be used as a starting point for segmentation or group recommendations. It's important to carefully balance the explorative nature of asking for preferences with the need to quickly demonstrate the value of the product.

Solution 2: Enrich Profiles for Better Collective Filtering

By using the metadata gathered during registration and onboarding, the recommender systems used in your product may be able to already suggest products that users with similar profile like or popular near the users location. This approach, called collaborative filtering, uses similarities between users and items simultaneously to provide recommendations: the system can recommend an item to user A based on the interests of a similar user B.

You can also potentially source additional online information about your users from data brokers to enrich user profiles for better recommendations. However, this approach needs to carefully balance convenience and personalization with user privacy concerns. Just be mindful to stay legal, ethical and empathic to your users. Should you choose this path, Airbyte can help bring in data from various sources, including data brokers, to improve the accuracy of collaborative filtering.

Solution 3: Exploit Existing Data

This approach works in the special cases where, while your product doesn’t have history for specific new users, the users have history with another product in the connected ecosystem (see recent EU DMA triggered data sharing questions of Google & Meta products), or a more common loyalty point based knowledge sharing amongst related business entities.

For example, let's say your team is responsible for personalized user experiences, including product recommendations, for a grocery chain's new online offering. When a user signs up, you may provide the option to enter their customer loyalty card number. While the user is confirming their email, your system could already be ingesting their offline, in-store purchase history. Such data ingestion during registration could provide the product preference information your recommender system needs for more accurate recommendations from day one.

Tools like Airbyte can help break data silos and ingest first-party, second-party, and even third-party data to overcome the cold start challenge. This approach can be particularly effective for recommendation systems, where the quality of product information and the depth of the user's history engaging with similar products are key factors.

Conclusion

By leveraging these additional data points, your recommendation algorithm(s) will likely be able to provide a level of personalization even on a user's first interaction with your product. Over time, as your users engage more, you should use that new usage data to further optimize and keep the recommendation engine up-to-date based on their personal history.

Airbyte can help build the ongoing data pipeline to keep your models fresh with the latest data. Coincidentally, building an effective data pipeline is the topic of our next article in this series, check it out here.

Limitless data movement with free Alpha and Beta connectors

Introducing: our Free Connector Program ->

The data movement infrastructure for the modern data teams.

Try a 14-day free trial

About the Author

Ferenc is an AI Product Leader with years of experience leading product teams developing AI / ML-powered software solutions with high business impact at a global B2C - B2B SaaS platform. With over a decade in product management from start-ups to enterprises, he led cross-functional teams of engineers, UX designers, researchers, data scientists, and data analysts building a wide range of products.

Disclaimer: opinions are my own, examples are for illustrative purposes only.