Source: Digg Blog

Digg Blog Rethinking Notifications with Data Science

This post was written by Betaworks lead scientist Suman Deb Roy and was originally posted on MediumHow Digg Bot finds stories for your favorite topicsA year and half ago, the Notifications Summit was held at Betaworks to deliberate on many key ideas: the push and the pull, notifications as a primary interface, as a meta-app, utility of the lock screen, deep linking, filters etc. There was growing consensus that notifications could become an operating system for the information age, a beacon in the attention economy.The attention economy has transformed many industries, but none more severely than news media - where a clear oversupply of information has overwhelmed consumers. The larger an information landscape becomes, the more pressing is the demand for actionable and relevant content. This hyper-relevancy is the principal challenge notification systems face.Somewhat counter-intuitively though, it is only by monitoring and analyzing this entire information landscape that great notifications can be created, because only then can relevance be calculated as a synergy between the world and the user - an elusive attribute of actionable notifications.Digg Bot notification for the topic "bitcoin". If you subscribed to this topic, notifications about it might appear on the lock screen (left) and in Facebook Messenger (right) when you open it.Luckily, Digg has data of the entire information landscape. Each day, Digg aggregates almost 7.5 million unique urls through its various products: Digg Reader which tracks 8 millions of RSS feeds, Digg Deeper that listens to 2-3 million Twitter users and Digg Channels comprising of focused topic pages. This means Digg observes a comprehensive chunk of media produced on the Web every single day, giving it unique potential at notifications technology.In this post, I'll explain how we are thinking about notifications at Digg using our messaging services, including topic subscriptions in the news bot, algorithms and heuristics that generate notifications and some results/data we are seeing from this feature.DiggBot's Notification FeatureWe soft-launched Digg alerts on our Facebook Messenger bot on August 2nd, 2016. Since then, Digg Bot has sent over 34,037 notifications for hundreds of unique topics or keywords to users. Subscribing to a topic in Digg Bot is relatively easy. Just search for any word/phrase and the last card in the carousel will let you subscribe to it.Alternately, you can add/edit/remove topics from your subscriptions at any time by typing manage subscriptions. When you add/follow a topic, you might receive push notifications comprising of important stories in the topic.While you can follow traditional beats like politics or technology, the real value of a notification system is in more granular topics, which could range from obsessions like climate change to entities like beyonce or tesla. As an example, I subscribe to artificial intelligence news and these are some notifications Digg Bot sent me.Notifications for "Artificial Intelligence". If an important story in your topic breaks after 9pm or before 8 am, we might send them as silent pushes.You can also subscribe to even finer sub-topics within concepts like artificial intelligence, e.g deep learning. Feel free to track specific entities related to sub-topics as well, such as the company Deepmind that is related to AI. Digg Bot's algorithm adjusts itself based on the volume and velocity of stories associated to the topic's generality and sends relevant pushes featuring a representative link related to the topic.The coolest thing about a notification system is the ability to set up granular alerts about sub-topics. Instead of subscribing to all NBA news from ESPN, you could just get notifications about the Golden State Warriors. Instead of being bombarded with financial news from one publisher, you could configure Digg to notify you about certain companies only.Digg's Notification AlgorithmTo generate relevant notifications, we must first calculate how pertinent a story is to the user at that moment. This depends on three factors - (1) how important the story is globally, (2) importance of the story in the user's own world, and (3) time and attention-impeding capacity of an alert. While the first factor can be handled by editors efficiently, in reality, people don't always care about everything newsrooms want them to care about at that very moment - because urgency is a deeply personal thing. Thus, factors 2 and 3 are hard to balance without intelligent technology.Time is an inescapable attribute of intelligent notifications. Unfortunately, many popular machine learning solutions begin to wobble when we introduce this exact criterion into the equation - time. Features that appear paramount in static analysis of systems can get eroded when the same system is observed dynamically.A singular ML framework can be hard to personalize in this regard, because the algorithm needs sophistication to model temporal variations of human attentiveness to news and information. Thus, there are three keyalgorithmic ensembles we employ to address this:1 . The Trending Ensemble: A group of algorithms that determine the trending nature of a story, characterized by how much attention it is receiving in the social and news media. It is optimized for multi-modal signal monitoring, early detection, and considers accumulative opportunity cost plus seasonality.The result is every article ingested gets a DiggRank, indicating its trending nature in the world. You can check the current trending articles in Digg Bot.2. The Clustering Ensemble: Multiple learning algorithms that determine if two separate news articles are part of the same story /event. This addresses a regular irritation with news alerts - duplicate pushes from different outlets about the same story. The clustering ensemble is optimized for detecting consolidated media coverage, diversity and syndicated associations. The result is that all links covering the same story are grouped together in a cluster.When news about Youtube's live-TV service broke, about 10-11 media outlets covered it. This gif shows how all those related stories from different publishers were clustered and displayed in Digg's technology channel.The clustering ensemble also manages three important situations:Story Development: As more media outlets write about a story and it develops, the semantics of article titles and descriptions change (if there is new information) - causing the cluster to split. The algorithm determines if the fresh articles in the news cycle is different enough to represent a story update and big enough to be pushed eventually.Unverified Trends: This addresses a significant hassle in the age of breaking social news - the popular yet unverified story. Recall that last year, a single fake news story triggered safety alerts on Facebook. Some of the best information systems might be vulnerable to media hacking. Thus, consolidated media coverage (via clustering) is a heuristic for verifying hoax stories.Editorial Expertise: The algorithm has to select one article from the cluster of similar links to be featured in the push notification. If there is a link in the cluster that Digg editors have featured on the front page, it could be prioritized as the representative article of the notification.3. The Info-Sphere Ensemble: Just because a story is alert-worthy, does not mean it needs to be pushed now. Untimely pushes create ambiguity and a wrong sense of urgency. The final ensemble is a policy network - whose job is to determines if we actually push the story to the user right now or defer it to a later time, given a story's importance.The info-sphere ensemble attempts to simulate the information sphere of the user. A user can be subscribed to multiple topics of different granularity. Since the volume and velocity of incoming news for every topic is different, notifications must be modulated. Has the user recently received an alert about this topic? How many total notifications has she received in the last x hours? How surprising is it for stories in this topic to gain this much traction? On average, an individual subscribes to 4-5 topics. These questions are critical in assuring relevant yet non-invasive notifications.Using these ensembles, Digg Bot has been flagging ~200 stories each day as alert-worthy, although we are noticing the aggregate number rise as more people keep subscribing to newer topics.The overall number of notification alerts that Digg Bot flags each day. The algorithm went through a tuning spell from Aug 04-10, 2016 right after launch, which is why there was a huge spike and then trough. Tuning involves calculating the right thresholds and parameters once a system goes live, based on volume and velocity of incoming topical stories.These 3 ensembles collectively give rise to some interesting flavors of notifications, depending on the topic categories you subscribe to.Flavors of Digg Notifications:(1) Mix of Breaking, Note-worthy, and Catch-up storiesWe cannot emphasize enough the time-horizon of predictions or pushes that make alerts useful. Our priority isn't necessarily to make notifications breaking, unless absolutely necessary. Instant is not always the best. Thus, the algorithm also calculates whether some topic stories are important but not big enough, so you can catch up with them in your "time-out" hours. This we call - the Digest.The Digest comprises of top-ranked stories from a subset of your topic subscriptions. The topics chosen for push depend on the popularity of the stories within the topic and the frequency of alerts in that topic. For example, if you subscribed to Westworld (the TV show), these are some notifications (separate and digest) you would have received.One of the many algorithmic tunings is to determine when something is breaking vs. socially popular vs. can be sent out in a digest. We understand that normal capability for media consumption (even for

Read full article »
Est. Annual Revenue
$5.0-25M
Est. Employees
100-250
CEO Avatar

CEO

Update CEO

CEO Approval Rating

- -/100

Read more