Publishing ranking and filtering algorithms

By Bjørn Borud , 2022-04-19

With Elon Musk trying to acquire Twitter, one discussion I’ve seen pop up every now and then is that some people, including Elon Musk, think Twitter should make their algorithm public. Whatever that means. So of course, this week everyone is an expert on this topic.

From context I’m assuming they mean the ranking algorithms that decide what how people’s feeds are prioritised on Twitter – what bubbles to the top, and what is buried several screenfuls down.

This is an interesting discussion to have because people might not be aware that anything involving ranking systems or filtering of unwanted content tends to be an arms race. An arms race between service providers who try to provide a good product and parties who want to gain some advantage by doing things that usually tend to negatively impact the user experience.

And as usual, the simpleminded folk look for conspiracies. Because why not attribute malice to that which can adequately be explained by stupidity.

Example: spam

Let’s engage in a thought experiment to understand how this arms race manifests. Let’s imagine every mail provider was obligated to share how they do spam filtering by publishing the code and configuration that is running at any given time. For instance by hosting the code publicly on Github along with the required data.

Imagine I’m a spammer. Sending out massive amounts of email is dirt cheap, so even if my success rate is 0.001% that’s still 10 people for every million emails I send. That would be a very good yield indeed. Especially if I send out tens of millions of spam messages per day, every day. If I get a chance to trick a few hundred people every day, I can probably make a comfortable living out of it.

And as a spammer, I wouldn’t care if I make the service a tiny bit worse for millions of people. Of course, there will be many spammers beside me. In aggregate making the user experience a lot more than a tiny bit worse.

If your email provider offers you a “Junk” folder, go and have a look at it now. Then realize that what you are seeing is a small subset of the emails that even made it that far. You probably think you are getting a lot of junk mail in your inbox. But most of that is actually legitimate in the sense that it comes from parties you have had some interaction with. Parties that may have gotten you to agree to receive email from them whether you were aware of it or not.

An increase in junk mail will make you unhappy. It worsens your user experience. Which means you will be unhappy with the service. Which means, the service provider will have to do something to counter the drop in customer satisfaction.

This is why, for decades, most service providers above a certain size have whole departments dedicated to this. And a lot more highly qualified manpower goes into this than anyone is willing to admit.

Unfair advantage

If I were a spammer, and if email services were obliged to give me the exact recipe for how they perform spam filtering, this would hand me a huge advantage. I get access to the tools of my adversary.

You find these kinds of mechanisms all over the place. In filtering out unwanted content, in ranking search results, in figuring out if your credit card is being misused, etc. These types of systems are in a constant arms race. Every single day people who want to game these countermeasures become a bit better at what they do.

I can understand why people think it would be good to publish methods that are being used to filter and rank data by large social platforms, search services or even email services. It sounds like that would be the fair and transparent thing to do. But that doesn’t mean it produces the kinds of outcomes you want.

So why is Elon wrong about this?

To understand why someone like Elon would be naive about this, a large part of the answer is in how he is successful.

He tends to make initial assumptions and then put them to the test quickly to see what fails. Then he addresses what failed, and tries again. Over and over until things stop failing, or he gives up and tries something else. With this method you can start off with a bunch of wrong assumptions and still find your way to solutions. Because you are constantly learning.

Elon Musk is quite often wrong. But he is good at figuring out when that happens and then eventually be not wrong. Look at the spaceships he has designed. He regularly abandons design strategies that turn out to be unfruitful. This is in direct contrast to the way most, if not all of, his competitors operate. They don’t even get out of bed in the morning without a plan that they feel is bullet proof. Which means that they have a very high threshold for changing their minds.

Right now, Elon Musk is wrong because he hasn’t had the opportunity to understand the problem.

From a purely professional point of view, I would love for Elon Musk to acquire Twitter and do this experiment. Yes, it would cause mayhem, and yes, it might make things a lot worse before probably circling back to where we are today, but we might learn something. And I’m sure Musk would learn a lot, and in the long run, probably do a lot better than Twitter does today since he is a lot less risk averse than most people.

Alternatives

In the old days of USENET I used a news reader called Gnus. Gnus had a ranking and filtering system built in which you had full control over. You could even extend it by programming your own extensions to the filtering and ranking system. So I could have rules that specified that “I’m not interested in what Alice has to say, except for when she posts about fishing or bees”. Or “I want to see any responses to my postings first”. The system would apply cascades of rules which add to or subtract from a score. Then finally it would sort by score and throw away anything that is below the “bullshit threshold”.

I really miss being able to use my own ranking and filtering tools on content. And I am reminded of this whenever I encounter streaming services: I could do a much better job at organizing my view of Netflix than Netflix can. (That doesn’t imply I can easily make something that works for you, but that’s okay). The problem is that there is no convenient way to achieve this since Netflix are really stingy when it comes to APIs.

This would be extremely useful on social media. I know plenty of people online who I want to discuss certain things with, but which produce also lots of nonsense I don’t want to waste my time on. Then there are people who almost always say something I want to read. The mechanisms for filtering that Facebook or Twitter offer are very coarse grained. And I have zero direct control over how content is ranked on the feed I see.

Opening up Twitter a bit more so you can, to a greater degree, tailor your experience would be a better approach. Preferably by being able to write your own clients that can get access to more data, and do better ranking and filtering.

Of course, this comes with its own non-trivial risks. And for all I know, they may be far worse than what we have today. People would essentially be able to craft their own reality where their biases and prejudices need never be challenged by someone else’s reality.

It is entirely possible that people, organizations and companies might offer ready made filters that people can adopt. I’m sure most hate groups, churches, political activists etc. would love the ability to manipulate what their members see - and people would be stupid enough to let them.

Imagine the titanium shelled filter bubbles you could end up with. Powered by people’s unhealthy obsession with not being offended.

Final words

Just don’t make the mistake of thinking Elon Musk knows what he is talking about in this case. It doesn’t seem like he does. Not everything he utters impulsively is the result of careful consideration and people shouldn’t take everything he says seriously.