Beyond the Scroll: How Social Media Algorithms Shape Your Reality

Machine Learning


that your social media feed may know you too well.

When you browse social media, you notice a very typical behavior: you watch one video, and suddenly your timeline is flooded with more of the same. 5 years ago, it felt a bit like magic. But today, we talk about “the algorithm” as if it were a mysterious entity pulling strings in some Silicon Valley basement. The truth is much less dramatic, and much more interesting.

The algorithm isn’t inherently evil, it doesn’t sit there plotting your radicalisation. It’s just a chunk of code running cosine similarities and weighted averages, trying to predict what you’ll click on next. The trouble is what we interact with creates engagement. And the surest way to keep humans engaged turns out to be the worst way to keep them informed (rage-baits, fake news, or worse).

This post is about how recommendation engines work, why they tilt us toward echo chambers, and, because reading about a thing is never the same as seeing it, we’ll build one from scratch, point it at real news data, and watch the bubble form.


The Engagement Engine: How Recommenders Work

A social media algorithm is, at its heart, a curator. Its job is to sift through millions of posts and serve you the ones you’re most likely to engage with: click, watch, like, share, rage-comment on. It does this based on one word: data.

Every action you take is a clue:

  • Which posts you linger on (even without clicking)
  • Which videos you watch, and for how long
  • Which accounts you follow, mute, or block
  • Which topics you search for at 1 a.m.

Using machine learning, the algorithm spots patterns in this firehose of behaviour. It’s constantly asking the same question: what keeps this person on the platform longer? Remember that this is the largest goal of any social media company: keeping you on the platform longer.

Two classic techniques sit underneath most recommender systems:

  • Collaborative filtering finds users who behave like you and recommends what they liked. If Alice and Bob both loved The Matrix and Inception, and Alice also loved Interstellar, the system nudges Interstellar to Bob. Pretty easy to understand.
  • Content-based filtering looks at the characteristics of what you’ve liked and finds similar things. If you watch a lot of cooking videos, it surfaces more videos tagged “cooking”, “recipe”, or “knife skills”, they resemble what you already enjoyed.

Real platforms blend these methods with hundreds of other signals. But the core idea is the same: learn from your behaviour, predict what else might grab you.

The algorithm doesn’t intend to show you bad or false content. It optimises for engagement. And one of the surest ways to keep humans engaged is to tap into our emotions, especially the strong, negative ones. Or videos of cats.


Building a News Recommender on Real Data

Let’s stop talking about this abstractly and build one. We will use real anonymised click logs from Microsoft News. The dataset is called MIND (Microsoft News Dataset), published for academic research by Microsoft Research. This sample contains 50,000 users, over 51,000 English news articles across 17 categories (news, sports, finance, lifestyle, health, travel, and more), and 156,000+ real impression sessions, each recording what a user was shown and what they clicked on. The whole thing fits in about 30 lines of Python, although you don’t really need to now this inde detail:

import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity

# Build a sparse user × article matrix (1 = clicked, 0 = didn't)
matrix = csr_matrix((np.ones(len(clicks)), (user_rows, article_cols)),
                    shape=(n_users, n_articles))
def recommend(user_id, matrix, top_n=15, n_neighbors=50):
    """Find 50 most similar users and rank the articles
    they clicked that our user hasn't seen yet."""
    u = user_idx[user_id]
    
    # Cosine similarity between this user and everyone else
    sims = cosine_similarity(matrix[u], matrix).flatten()
    sims[u] = 0  # don't recommend to yourself
    
    # Take the top 50 most similar users
    top_neighbors = np.argsort(sims)[-n_neighbors:][::-1]
    weights = sims[top_neighbors]
    
    # Score articles by weighted sum of neighbour clicks
    scores = np.asarray(matrix[top_neighbors].T.dot(weights)).flatten()
    
    # Zero out articles the user already clicked
    scores[matrix[u].toarray().flatten() > 0] = 0
    
    # Return the top-scoring articles
    top_articles = np.argsort(scores)[-top_n:][::-1]
    return top_articles

Cosine similarity finds your fifty closest neighbours, people who click on the same kinds of articles you do. We take the articles they clicked, weight them by how similar each neighbour is to you, and serve the top fifteen. This is the base of what powers a billion-dollar industry.


Coswhat similarity?

Cosine similarity might sound like something out of a math textbook, but bear with me, it’s easier than it looks. To show you how it works, let’s take a quick detour.

Imagine the following data points scattered across two axes, mechanical vs. biological, and cuteness:

Emoji Similarity Mapper — Cat and Dog – Image by Author

Cosine similarity measures the angle between two arrows, each one starting from the origin (0,0) and pointing toward one of our data points. The smaller the angle between them, the more similar the two items are.

Think of it this way: if two arrows are almost pointing in the same direction, the items they represent share similar characteristics. Take cats and dogs as an example. Both score high on ‘biological’ and high on ‘cuteness’, so their arrows point in nearly the same direction and cosine similarity returns a value close to 1 (its maximum).

But if we compare cats with teddy bears, although they are similar on the cute dimension, they are different on the biological axis:

Emoji Similarity Mapper — Cat and Teddy Bear – Image by Author

If we compare cats with teddy bears, although they are similar on the cute dimension, they are different on the biological axis, a cat is fully biological, while a teddy bear scores zero.

This pulls their arrows apart.The angle between them widens, and cosine similarity returns a lower value, reflecting that despite sharing one trait, these two objects occupy very different regions of our space.

And, of course, comparing cats to cars, give almost no similarity as the arrows between both point in different directions:

Emoji Similarity Mapper — Cat and Car – Image by Author

AI models use this kind of information to recommend content that is likely to trigger a similar response in you. Imagine a two-dimensional space where one axis captures how a video makes you feel (calm, entertained, outraged) and the other captures its topic. Every video gets plotted somewhere in that space.

If you click on a political video that makes you angry, and you watch it all the way through. The platform registers both dimensions: the topic and the emotional response. Using cosine similarity, it finds other videos whose ‘arrow’ points in the same direction (rage-baiting political videos) and serves them to you next. The more you engage, the more confidently the algorithm learns which corner of that space keeps you watching.


Meet User U92876 (let’s call it Joe): The Sports Fan

I picked a user from the MIND dataset whose reading history is pure sports, NFL power rankings, NBA trade rumours, MLB bans. It read twenty-five articles, all sport.

Let’s ask the recommender what to serve them:

Joe’s recommended reading list – Image by Author

The category breakdown:

  • 40% sports
  • 13% news
  • 13% autos
  • 34% scattering of everything else.

The algorithm recognises this person’s sports habit and feeds it back, but it also serves a reasonably varied diet. There’s politics, entertainment, lifestyle, finance. Not bad, right?
Now watch what happens.


The Moment of Curiosity

I simulated something far more common than a massive rabbit-hole binge: a moment of idle curiosity.

Our sports fan didn’t spend hours reading politics. Looking at their initial feed, they simply clicked on three items that caught their eye:

  • The news story about Joe Biden.
  • The news story about Mitch McConnell.
  • The video about Trump’s attacks.

Just three clicks in less than ten minutes of reading and watching. Three tiny breadcrumbs left for the algorithm and Joe goes on with his life during the rest of the day.

Now, if we ran those clicks through the basic 30 lines of Python code we wrote earlier, nothing much would happen. Mathematically, 25 historical sports clicks would still overpower 3 new political clicks. The algorithm would still see a user who is 89% interested in sports, and the feed would barely budge.

But here is the very important secret sauce of modern social media: Recency Weighting (or Time Decay).

Real algorithms don’t treat all your clicks equally as a click you made three years ago is practically ancient history; a click you made three minutes ago is gold. To keep you hooked in the current session, platforms apply a heavy multiplier to whatever you are doing recently.

A single line of code implements this in the algorithm we saw earlier. If we decide that the most recent clicks should carry up to 100 times more weight than older ones, we could write something like this:

time_decay_weights = np.array([0.1 if historical_click else 10.0 for click in user_history])

If we do this, let’s run the recommendations again:

Joe’s new reading recommendation list – Image by Author

Here’s the damage that just 3 clicks have done to our Time Weighting recommendation system:

Feed Category Before – Image by Author
Feed Category Before – Image bu Author

Political news went from 13% to 40% of the feed. A 3x increase. From one evening of clicking and reading three pieces of news. Sports (the thing this person has read for years) can drop from the dominant category to second place. The algorithm didn’t pause to think “hold on, this person has 25 sports articles in their history, and one evening of politics doesn’t define them.”

It doesn’t think, it just recalculates time weighted similarity matrices, found a new set of neighbours and served what other users that clicked on this may enjoy.

Two things jump out:

  • The speed. It may take one evening to flip a user’s entire feed composition. Real platforms recalculate faster than this demo as they update in real time. You probably notice this in your feed with ads related to products you’ve been searching lately.
  • What disappears. This isn’t just about what the algorithm adds, but also about what it removes. The user’s informational diet didn’t just get more political, it got narrower. And narrower is the real danger here

Note: real platforms don’t publish their decay constants, so this is illustrative, not a measurement, but the mechanism is real and the direction is what matters. My 100x example is possibly an exaggeration of the recency bias.


What research tells us

You now know how clicks influence what the math of what the algorithm shows you next.

But this gets worse — content that makes us angry, fearful, or shocked glues us to the screen far better than content that makes us feel good or informed. Social media companies didn’t engineer this consciously, their algorithms simply discovered it.

A massive 2025 study analyzing the digital trace data of 25,000 SmartNews users found that humans possess a trait-level “negativity bias” when selecting news. Evolutionarily, we are hardwired to pay attention to threats, avoiding danger was critical to our ancestors’ survival. What happens when this ancient instinct meets modern machine learning? The study confirmed that personalized recommendation feeds take our inherent negativity bias and actively augment it.

Furthermore, data from researchers analyzing hundreds of millions of posts on platforms like Facebook and X (formerly Twitter) reveals that social media users are roughly 1.91 times more likely to share negative news links than positive ones. Negativity equals virality, and the outrage loop is born.


The Cognitive Toll: It’s Not Just What You Think, It’s How You Think

The impact of these algorithmic loops isn’t just about the type of content we consume; it’s about how it fundamentally alters our brains. A recent 2025 systematic review analyzing 71 studies and 98,299 participants of short-form video feeds (like TikTok, Instagram Reels, and YouTube Shorts) found profound cognitive consequences.
Increased engagement with these endless-scroll platforms is associated with poorer cognitive performance, specifically impacting our sustained attention and inhibitory control.

Psychologists point to a dual process of habituation and sensitization to explain this phenomenon. The rapid, high-stimulation nature of short videos desensitizes us to slower, more effortful tasks like reading a book or deep problem-solving. At the same time, the algorithm’s instant delivery of curated content sensitizes our brain’s reward system, reinforcing impulsive engagement patterns and encouraging the habitual seeking of instant gratification.

Heavy users of these platforms exhibit reduced electrophysiological activity during attention-demanding tasks. Some researchers even point to structural differences in key cognitive control regions, including the prefrontal cortex and striatal reward circuits, linked to this constant bombardment of highly rewarding algorithmic stimuli.


The Societal Cost

Due to mathematical matrices, each of us in our own personalised bubble of information.

In the short term, it’s annoying for most people. But zoom out and the picture darkens. When algorithms feed us content that confirms what we already believe, we experience confirmation bias on steroids.

These filter bubbles deepen the social divide we have today. We will continue to be extremely divided and there is no end in sight for this chasm.

Misinformation thrives in closed loops because false stories don’t get exposed to scrutiny outside the bubble. By the time a fact-check goes out, the original lie has done a lap around the platform and built a small army of believers.

And democracy, which depends on a shared baseline of reality and some willingness to argue in public, takes a hit when citizens occupy entirely different reality bubbles.


Reclaiming Your Feed

You’re not powerless here. The algorithm is responsive but there a couple of things you can do, that, although annoying, may take you out of your bubble.

The same mechanism that built your bubble can be used to widen it. Some practical move:

  • Diversify the inputs. Actively follow a few sources outside your comfort zone. If you lean one way politically, follow some thoughtful voices from the other side.
  • Reset periodically. Clear your watch history. Use “Not Interested” on suggestions that keep haunting you. Try the platform logged out, or in incognito, and notice how different the world looks without your data.
  • Use chronological feeds. Most platforms still let you switch off the algorithmic ranking and just see posts from people you follow, in order.
  • Pause before sharing. Every like, comment, and share is a vote for “more of this, please.” If something makes you furious, that’s exactly when the algorithm is most likely to be exploiting you.
  • Limit the time. Set screen-time limits. Schedule offline hours. The less you depend on the feed for your information diet, the less power it has to shape what you believe.

Beyond the Bubble

I hope this blog post informed you on how these recommendation systems bubbles work. We built a recommender on real news data, and it took three clicks to flip a sports fan’s feed from 40% sports to 53% politics.

The first step to breaking free is simply being aware. Next time you find yourself in an online frenzy, take a breath and ask: Why am I seeing this? Who benefits from me reacting this way? The answer usually traces back to an algorithm doing its job, and that job is rarely “informing you.”

Stay informed, stay open-minded,
— Ivo



Source link