On June 10, 2021, Google hosted an Applied ML seminar with an opening keynote by Tony Jabara, VP of Engineering and Head of Machine Learning at Spotify. Jabara presented some fascinating insights into Spotify’s current strategy for machine learning and their growing use of reinforcement learning.

To preface, it was refreshing to hear someone in a senior role speak so openly about their company’s approach to using machine learning. Whilst I personally was concerned with some of the language and assumptions used with respect to behaviour, overall it was a very insightful talk and some of the phrases hint at the next era of machine learning (ML) for practical applications. Including a much-needed truce between quantitative and qualitative methods.

The talk is embedded below (starts at approx 2:50).

And the live Q&A that followed:

Before going into some of the soundbites, ML here is referred to as classic ML. With classic ML, algorithms are first developed on training data, usually pre-labelled, and then applied to new data. Such algorithms tend to reach a point where performance flattens – adding further data to the training process does not significantly improve results. Deep learning is a subset of ML that utilises neural networks to mimic our understanding of some functions of the human brain. With deep learning, the rule of thumb is that more data always leads to better results. Reinforcement learning is a form of machine learning, usually deep learning, where, instead of being given a training dataset to first learn from before making predictions, the machine starts with no data and learns through trial-and-error – rapid simulations on a massive scale – to maximise the likelihood of achieving some defined reward. This differentiates it from classic ML. Some of the most impressive breakthroughs in recent AI demonstrations have utilised reinforcement learning (RL).

The following soundbites are notes taken whilst listening to the live transmission and should not be assumed to be verbatim. To use any quotes, please refer to the recording embedded above.

‘algotorial’ – combining algorithms with editorial curation of content to improve knowledge discovery

Tony Jabara, VP of Engineering, Spotify

Spotify’s strategy is to apply ML to “optimise for long-term and delayed rewards, leading to trust and a meaningful relationship between listeners and the audio they love” based on a belief that “life is richer with a personalised soundtrack with familiar wants and discovery needs.” The target outcome is to provide a ‘fulfilling content diet’, focusing on a lifetime journey through audio content’ and optimising for future engagement rather than the next click.

Whether or not you agree with that belief and target outcome, it grounds the approach: to provide a mix of results that are partially based on perceived wants – music that a listener is already familiar with – and partially based on predicted needs – discovery of new music – with the goal of maximising engagement with the platform. I’d prefer more straightforward language in the vision. But, as someone (or some marketing course) once reminded me, ‘cold raw fish’ sounds a lot less appetising than ‘sushi’ 🙂

I felt uncomfortable with some of the articulation about how we consume music. A slide is presented in the talk shows the use of reinforcement learning to provide instant gratification – satisfying the want – and delayed gratification – a need. Is that satisfying a need, or generating one? It’s hard to tell. And I’m not sure that wants and needs are so easily separated into instant and delayed behaviours. The talk goes on to describe how we move in an audio landscape, and the objective is to take the listener on a journey to discover new music, to ‘elevate their relationship with music to a higher altitude.’

Perhaps my discomfort is because I personally have a deep relationship with music and my tastes are all over the place in genre and era. I love discovering new tracks but they often have little in relation to current preferences. It is usually a serendipitous find (thanks to Shazam) such as noticing the background of an advert or movie, hearing something being played somewhere, in conversation sharing music tracks, or simply a random moment listening to the radio. I’d be interested to have suggestions from an AI, but I don’t want to completely cede control over how I discover and consume music.

Whilst there was much talk about using ML for long-term predictions and targets, it was less clear how this was being achieved. There was talk of ‘context bandits’ used to provide short-term predictions, and applying RL to continuously learn how successful these bandits are at making recommendations. So achieving a long-term objective through frequent incremental adjustments rather than periodic major changes.

One area we are investing in is the use of reinforcement learning to create new ‘recommendation surfaces’

Tony Jabara

There were some bold claims about being able to produce causal relationships as opposed to purely correlations using inverse propensity scoring (IPS). There were some fascinating insights into the use of RL to create new ‘recommendation surfaces’, and explorations into platforming how they randomise systems and simulations for what-if? scenarios.

An interesting and relevant concept was the idea of taking an ‘algotorial’ approach: combining machine learning algorithms with old-fashioned editorial curation of content, particularly when entering new markets and countries where music tastes may be vastly different. It was an acknowledgement that the quantitative data-intensive approach only gets you so far and some knowledge can only be captured and applied in qualitative ways. The goal is to combine the two for scale.

Another interesting answer referenced in-house investments in ML tools, logging both information and the counter-factual information the systems would have shown in different circumstances. I’ve been doing research into how the use of counter-factuals might help improve explainability of ML algorithms, based on ideas that have been developing in physics as a way of uniting those two thumper theories of general relativity and quantum mechanics. It may also be a key piece in the puzzle of achieving generalisable AI. But that’s a subject for another post…

To close out, despite some reservations about the choice of terminology at times, it was a great and insightful talk! Much appreciated and highly recommended for anyone interested in how to strategically apply machine learning to improve performance.

References and further reading

Featured image: random photo from the author’s phone 🙂 Taken whilst walking around Greifensee near Zürich

view across Greifensee towards the mountains
Greifensee, 30 May 2021.
Category:
Blog, Data Science
Tags:
, ,