Modernizing Twitter's ad engagement analytics platform

Project Image
projectfase
Adopt
thema's
Cloud Everywhere
value chain
Service
Technology
innovatie sector
Cultuur en Media
SDGs
9. Industrie, innovatie en infrastructuur

Project Achtergrond

Over the past decade, Twitter has developed powerful data transformation pipelines to handle the load of its ever-growing user base worldwide. The first deployments for those pipelines were initially all running in Twitter's own data centers. The input data streamed from various sources into Hadoop Distributed File System (HDFS) as LZO-compressed Thrift files in an Elephant Bird container format. The data was then processed and aggregated in batches by Scalding data transformation pipelines. Then, aggregation results were output into Manhattan, Twitter's homegrown distributed key-value store, for serving. Additionally, a streaming system using Twitter's homegrown systems Eventbus (a messaging tool built on top of DistributedLog), Heron (a stream processing engine), and Nighthawk (a sharded Redis deployment) powered the real-time analytics that Twitter had to provide, filling the gap between the current time and the last batch run.

Probleemstelling van het project

While this system consistently sustained massive scale, its original design and implementation was starting to reach some limits. In particular, some parts of the system that had grown organically over the years were difficult to configure and extend with new features. Some intricate, long-running jobs were also unreliable, leading to sporadic failures. The legacy end-user serving system was very expensive to run and couldn’t support large queries.

Technologische innovaties

Cloud, Dataflow

Doelstelling van het project

To accommodate for the projected growth in user engagement over the next few years and streamline the development of new features, the Twitter Revenue Data Platform engineering team decided to rethink the architecture and deploy a more flexible and scalable system in Google Cloud.

Technology Providers

Google

To top