YouTube f*cked up big time: A/B test feature is completely broken. (mega thread)
Let’s play a game: 1) Check the picture below 2) Read the title 3) Pick the thumbnail you think fits it best.
The only thumbnail that truly shows what the video is about is C, and I bet that's what most of you chose. Well, that’s not what the YouTube A/B feature found (or Test & Compare as they call it).
@enesyilmazer This video ended up ranking 8th out of 10 on the channel, despite using the A/B feature right from the start and with no further changes to its packaging.
I also ran the same test with my own A/B simulation tool, the Viral Economy, which is built to reach new viewers. Here’s what it found:
So, why didn’t YouTube catch this obvious result? How did the only thumbnail that matched the title lose out to ones where the house is barely visible? I’ll explain why in a moment (spoiler: it’s not just a CTR or watch time issue, it’s on a fundamental level).
The example above is far from being an exception, it’s actually the rule with YouTube A/B test. We checked about 70 different videos from completely different niches, and according to the Viral Economy, YouTube was wrong about 80% of the time.
So who’s right? A billion-dollar company or a tool built by an anonymous individual on Twitter with a cringe Prison Break avatar? 😀 Joke aside, this is a serious issue. With the A/B feature everyone is losing, YouTube included.
Here are a few more examples from our data that speak for themselves, I let you judge for yourself. 𝘕𝘰𝘵𝘦: 𝘛𝘩𝘦 𝘱𝘢𝘤𝘬𝘢𝘨𝘪𝘯𝘨 𝘩𝘪𝘨𝘩𝘭𝘪𝘨𝘩𝘵𝘦𝘥 𝘪𝘯 𝘳𝘦𝘥 𝘪𝘴 𝘸𝘩𝘢𝘵 𝘳𝘦𝘮𝘢𝘪𝘯𝘦𝘥 𝘰𝘯 𝘠𝘰𝘶𝘛𝘶𝘣𝘦 𝘢𝘧𝘵𝘦𝘳 𝘵𝘩𝘦 𝘈/𝘉 𝘵𝘦𝘴𝘵. 𝘐 𝘥𝘰𝘯’𝘵 𝘬𝘯𝘰𝘸
And the list goes on and on. Let’s dive deeper into what causes this issue now. According to YouTube, the tool is built to “get you the highest amount of viewer engagement”, which they translates into watch time.
As it involves a lot of complexity, I’ll try to simplify so this thread can be enlightening and actionable to you. For those who follow me, you might remember my feedback thread for YouTube about CTR & AVD. The A/B feature has a similar problem except it’s even worse because
As we’ve seen, this is how the current A/B feature on YouTube works:
Although using watch time as a metric is the right decision, the way YouTube built it is fucked up. And bad news for YouTube: this problem can’t be solved, the feature has to be completely rebuilt from scratch. Let’s see why:
𝟏) 𝐓𝐇𝐄 𝐍𝐄𝐂𝐄𝐒𝐒𝐀𝐑𝐘 𝐁𝐀𝐒𝐈𝐂𝐒 Unlike what most people think, it’s not: - The algorithm looks for viewers for videos it’s the opposite: - The algorithm looks for videos for viewers
I won’t explore this further because it’s very complex, and that’s not the point of this thread. But it’s an important element to understand why the A/B test feature is broken.
Talking about videos, every piece of content falls somewhere within the following spectrum: - Niche: Requires prior context to spark interest - Reach: No prior context is required to spark interest Most of the time, it’s somewhere in between.
For the viewers however, it’s another story as they are infintely more complex. The best way to illustrate in a simple way their interests would be through a conical spectrum:
So the audience of a YouTube video would look like this:
Important take away so far: - The algorithm is seeking videos for viewers, not viewers for videos. - A piece of content is either “niche”, “reach” or (most of the time) somewhere in between. Now that you have the basics, let’s get to the good stuff.
𝟐) 𝐇𝐎𝐖 𝐓𝐇𝐄 𝐀/𝐁 𝐅𝐄𝐀𝐓𝐔𝐑𝐄 𝐌𝐈𝐒𝐋𝐄𝐀𝐃𝐒 𝐓𝐇𝐄 𝐀𝐋𝐆𝐎𝐑𝐈𝐓𝐇𝐌 ⚠️ To keep it accessible to everyone, the following explanation is a simplified version. It’s far (far) more complex than that in reality.
What weighs more, 100 sardines or 1 whale? 1 whale. Which are more numerous in the ocean, sardines or whales? Sardines. The same logic applies to YouTube: Which viewers produce more watch time? Those familiar with the topic/content, or “niche” viewers.
Which are more numerous? Reach viewers. This is the first major flaw in the A/B feature, it’s blind to this distinction. Just like the CTR & AVD problem, here, YouTube assumes all viewers are the same. They are not.
There’s a huge asymmetry in how viewers produce watch time. “Niche viewers” are like the whales in our example, individually they produce more watch time on average than a “reach viewer”.
But because “reach” viewers are far more numerous (just like sardines), as a group they produce way more watch time.
As we've seen, a video is fixed on the niche/reach spectrum and can’t move. The same is true for thumbnails:
Viewers however are far more complex. Their interests are constantly shifting, but if we could freeze time, it would look like this:
Now if we combine viewers and thumbnails we get this:
Depending on which thumbnail is recommended, each viewer because of their interests difference, will not act the same: - The NICHE viewer would click on A & B but not C - The NICHE-REACH viewer would click on B but not A & C - The REACH viewer would click on C but not A & B






















