I feel enlightened now that you called out the self-reinforcing nature of the algorithms. It makes sense that an RL agent solving the bandits problem would create its own bubbles out of laziness.
Maybe we can take advantage of that laziness to incept critical thinking back into social media, or at least have it eat itself.
Thanks!
I feel enlightened now that you called out the self-reinforcing nature of the algorithms. It makes sense that an RL agent solving the bandits problem would create its own bubbles out of laziness.
You’re totally right that it’s like a multi-armed bandit problem, but maybe with so many possibilities that searching is prohibitively expensive, since the space of options to search is much bigger than the rate that humans can consume content. In other ways, though, there’s a dissimilarity because the agent’s reward depends on its past choices (people watch more of what they’re recommended). It would be really interesting to know if anyone has modeled a multi-armed bandit problem with this kind of self-dependency. I bet that, in that case, the exploration behavior is pretty chaotic. @abucci this seems like something you might just know off the top of your head!
Maybe we can take advantage of that laziness to incept critical thinking back into social media, or at least have it eat itself.
If you have any ideas for how to turn social media against itself, I’d love to hear them. I worked on this post unusually long for a lot of reasons, but one of them was trying to think of a counter strategy. I came up with nothing though!
If what the algorithm does can be approximated that way (as a reward-maximizing player in a multi-ply game that chooses what category of content to show a user at each turn), then you can get partway towards understanding how it works functionally by understanding how the tradeoffs between monetization, data gathering, and maximizing surprisal (learning) in its reward function are struck. I suspect that splitting the bins/categories more and more finely sometimes makes the tradeoffs look better, which might explain why social media companies tend to do this (if you have one bin of stuff with red and blue objects, and people choose randomly from it, they’ll be less happy on average than if you have a bin of red objects and a bin of blue objects and are able to direct red-preferring and blue-preferring users to the appropriate bin better than a coin flip would).
People are not static utility maximizers, but these types of algorithms assume we are. So I think they tend to get stuck in corners both because of how they strike tradeoffs (you get manosphere content because that’s what’s most monetizable) and because people’s preferences aren’t expressed consistently in their actions and change through time (you keep getting shown scifi content because you looked at a few scifi videos in a row awhile ago when you were feeling nostalgic but you don’t usually prefer it).
That’s what I have for now. Sorry for length.