MoData Blog
Stay informed with our latest news and tutorials

Data Driven Journalism – 3 Lessons from #Netflix #TextAnalytics

May 23rd, 2016

Author

Content Team

Categories

Sales Forecasting

Words

3900

Estimated Time

16 Min

Link to: 3 data lessons from Netflix on Data Driven Journalism

Netflix knows what we like to watch, when, for how long, and a whole lot more. Whenever we select a program, the system recalibrates its data to personalize our experience. And again with each session.

Within this, Netflix applies a myriad of cool data techniques, and many of the challenges and decisions behind their processes explained regularly on their Tech Blog.

We looked at five lessons that data journalists can take from the Netflix experience:

1. Text placement on images

Where is the best place to insert text on an image? Text mustn’t obscure the image, but it also needs to be prominent enough to grab the audiences’ attention. Similar concerns face journalists constructing visualizations.

To optimize text placement, Netflix uses a text detection algorithm to detect when there is already text within a frame and prevent overlaps. Yet, in doing so, there is the risk of false positives. In order to limit these, the team applies a number of image transformations, outlined in the diagram below, and checks these against the text detection algorithm. By providing more images, with different features highlighted, the algorithm has a larger corpus of data on the image properties and harness this to best place text.

textplacement

Image: Netflix Tech Blog.

2. Language-receptive searching

Netflix is available in 21 languages and, subsequently, its instant search algorithm needs to be able to point users to relevant content in their local language. One important goal of the Netflix search is retrieving content with as few interactions as possible. To this end, the team works on optimizing title indexes to reflect language-specific interaction patterns.

Using the example of the Korean Hangul alphabet, they show how basic title indexing can maximize the algorithm’s efficiency:

To search for “올드보이” (Oldboy), in the worst possible case, a member would have to enter nine characters: “ㅇ ㅗ ㄹㄷ ㅡ ㅂ ㅗ ㅇㅣ”. Using a basic indexing for the video title, in the best case a member would still need to type three characters: “ㅇ ㅗ ㄹ”, which would be collapsed in the first syllable of that title: “올”. In a Hangul-specific indexing, a member would need to write as little as one character: “ㅇ”

3. Harnessing historical data

To make user recommendations, Netflix harnesses a machine learning approach derived from historical input data. Using Apache Spark, the team created a “time machine” that takes data snapshots from viewing history, lists, and predicted ratings at various time periods, building up a comprehensive database across time.

Data for any given destination time is then fetched via APIs, like the following sample that would retrieve viewing history data snapshots:

apinetflix

Image: Netflix Tech Blog.

However, as will all machine learning approaches, the team acknowledges that there are many ways to improve the system.

“Instead of batch snapshotting on a periodic cadence, we can drive the snapshots based on events, for example at a time when a particular member visits our service. To avoid duplicate data collection, we can also capture data changes instead of taking full snapshots each time,” they explain.

Go Back
Latest Blog
September 19th, 2018

As sales begin and you start seeing some consistent revenue coming in, there are certain metrics that are essential to begin the process of scaling....

Read More
Sales Pipeline
September 17th, 2018

  Successful companies rely on structured sales processes It’s no secret that companies that execute a well structured sales process see enormous benefits in scalable...

Read More
Optimize your sales process
September 12th, 2018

A sales cycle differs depending on the company, service or product: no sales cycle is exactly the same as another. However, the each one does...

Read More
e-book for free
FREE E-BOOK !
Is Closing a Real Problem in Account Based Selling?

If you want your sales team to get the right results, you need to solve problems that delay or curb your sales goal and conversion. But do you know what the real problems in your sales process are? Download your free eBook, which includes a list of questions, to help you identify them.

Ready to move your sales to the next level ?
Request your demo now.
Mo-Data's AI-powered sales intelligence software provides you with actionable insights, predictions and prescriptions to help you grow revenue, faster.