recent twitter Open source some components An overview of a system that recommends Tweets to a user’s Twitter timeline. This release includes code for several services and jobs that run algorithms, as well as code for training machine learning models for embedding and ranking tweets.
System details is explained in an engineering blog post on Twitter. The process consists of her three main steps: sourcing candidate tweets, ranking tweets, and heuristics/filtering. This release includes code for the system’s component services, such as the machine learning model server and streaming event processor, as well as code for extracting features and models from raw data about tweets, users, and engagement (such as likes and retweets). It contains. ). Twitter framed the release as part of an effort to improve transparency, but noted that it had excluded some of its algorithms from the release. Specifically, it’s calling out code that encourages ads or potentially violates a user’s privacy.According to Twitter:
The goal of our open source efforts is to provide you, the user, with complete transparency into how our system works. Shown to understand the algorithm in more detail We have released code that enhances the recommendations that can be made. We are also working on some features to increase transparency within the app.
The goal of the recommendation pipeline is to create a user’s “For You” timeline page. The process begins by selecting her set of 1,500 candidate tweets from both. in network (i.e. who the user is following) as well as out of networkOn average, timelines contain roughly the same amount of tweets from both sources.
Twitter recommendation system diagram. Image Source: https://github.com/twitter/the-algorithm
Tweets in the network are ranked using a logistic regression model trained using Twitter’s RealGraph algorithm to attempt to predict the “probability of engagement between two users.” The higher the probability, the more tweets from that user will be included.
Out-of-network tweets are selected from two sources. First, Twitter social graph Find tweets involving people the user follows, ranked by a logistic regression model. Other off-network tweets are selected using embedded spaces called SimClusters. It uses a matrix decomposition algorithm to identify her 145,000 virtual communities of users. Tweets are associated with a community based on how many users in the community liked the Tweet.
After candidate tweets are selected, they are ranked using a 48M parameter neural network model based on MaskNet. Finally, Twitter applies heuristics and filters to “create a balanced and diverse feed.” This includes balancing the resulting in-network and out-of-network Tweets, threading reply-to-reply Tweets, and removing NSFW content.
Twitter’s release of its algorithm has sparked a lively debate online. In a thread on Hacker News, several users pointed to source code for extracting features from tweets. author_is_elon, author_is_power_user, author_is_democratand author_is_republicanalong with comments in the code claiming that these are used to modify the A/B testing algorithm.
On Twitter, machine learning engineer Vicki Boykis talked about the release:
We plan to use Twitter’s codebase for a long time and this is truly a gift to recsys. [recommendation systems] Geek. From what I’ve only browsed so far, it seems to be a very standard construction of recsys-y components, rankers, filters, generators, Kafka log collections, and timelines.
In a Reddit discussion about this release, one user wrote:
Setting aside the political background behind many people’s desire to publish their “algorithm”, this is amazing educational content for ML professionals. Here you’ll find a world-class, complex recommendation and ranking system that everyone can read and develop. This is a true goldmine of educational resources.
Twitter’s recommendation system code is available on GitHub, as is the code for training the two ML models.