Introducing FBLearner Flow: Facebook’s AI backbone #machinelearning

Link to Jeffrey Dun’s post on the Facebook Blog: Introducing FBLearner Flow: Facebook’s AI backbone

Many of the experiences and interactions people have on Facebook today are made possible with AI. When you log in to Facebook, we use the power of machine learning to provide you with unique, personalized experiences. Machine learning models are part of ranking and personalizing News Feed stories, filtering out offensive content, highlighting trending topics, ranking search results, and much more. There are numerous other experiences on Facebook that could benefit from machine learning models, but until recently it’s been challenging for engineers without a strong machine learning background to take advantage of our ML infrastructure. In late 2014, we set out to redefine machine learning platforms at Facebook from the ground up, and to put state-of-the-art algorithms in AI and ML at the fingertips of every Facebook engineer.

In some of our earliest work to leverage AI and ML — such as delivering the most relevant content to each person — we noticed that the largest improvements in accuracy often came from quick experiments, feature engineering, and model tuning rather than applying fundamentally different algorithms. An engineer may need to attempt hundreds of experiments before finding a successful new feature or set of hyperparameters. Traditional pipeline systems that we evaluated did not appear to be a good fit for our uses — most solutions did not provide a way to rerun pipelines with different inputs, mechanisms to explicitly capture outputs and/or side effects, visualization of outputs, and conditional steps for tasks like parameter sweeps.

To address these points, we wanted a platform with the following properties:

  • Each machine learning algorithm should be implemented once in a reusable manner.
  • Engineers should be able to write a training pipeline that parallelizes over many machines and can be reused by many engineers.
  • Training a model should be easy for engineers of varying ML experience, and nearly every step should be fully automated.
  • Everybody should be able to easily search past experiments, view results, share with others, and start new variants of a given experiment.

We decided to build a brand-new platform, FBLearner Flow, capable of easily reusing algorithms in different products, scaling to run thousands of simultaneous custom experiments, and managing experiments with ease. This platform provides innovative functionality, like automatic generation of UI experiences from pipeline definitions and automatic parallelization of Python code using futures. FBLearner Flow is used by more than 25 percent of Facebook’s engineering team. Since its inception, more than a million models have been trained, and our prediction service has grown to make more than 6 million predictions per second.

Eliminating manual work required for experimentation allows machine learning engineers to spend more time on feature engineering, which in turn can produce greater accuracy improvements. Engineers are able to have an impact at scale — there are fewer than 150 workflow authors, and their efforts have an impact beyond themselves and their team. FBLearner Flow provides the platform and tools to enable engineers to run thousands of experiments every day.

Core concepts and components
Before we take a closer look at the system, there are a few key concepts to understand:

Workflows: A workflow is a single pipeline defined within FBLearner Flow and is the entry point for all machine learning tasks. Each workflow performs a specific job, such as training and evaluation of a specific model. Workflows are defined in terms of operators and can be parallelized.

Operators: Operators are the building blocks of workflows. Conceptually, you can think of an operator like a function within a program. In FBLearner Flow, operators are the smallest unit of execution and run on a single machine.

Channels: Channels represent inputs and outputs, which flow between operators within a workflow. All channels are typed using a custom type system that we have defined.

The platform consists of three core components: an authorship and execution environment for custom distributed workflows, an experimentation management UI for launching experiments and viewing results, and numerous predefined pipelines for training the most commonly used machine learning algorithms at Facebook.

Read the rest here:

stay connected