Streaming Systems - Tyler Akidau Radlin

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data …

od 152,15 Najbliżej: 30 km

Liczba ofert: 1

Oferta sklepu

Opis

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.Expanded from Tyler Akidau...s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You...ll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.You...ll explore:How streaming and batch data processing patterns compareThe core principles and concepts behind robust out-of-order data processingHow watermarks track progress and completeness in infinite datasetsHow exactly-once data processing techniques ensure correctnessHow the concepts of streams and tables form the foundations of both batch and streaming data processingThe practical motivations behind a powerful persistent state mechanism, driven by a real-world exampleHow time-varying relations provide a link between stream processing and the world of SQL and relational algebra Spis treści: Preface Or: What Are You Getting Yourself Into Here? Navigating This Book Takeaways Conventions Used in This Book Online Resources Figures Code Snippets OReilly Safari How to Contact Us Acknowledgments I. The Beam Model 1. Streaming 101 Terminology: What Is Streaming? On the Greatly Exaggerated Limitations of Streaming Event Time Versus Processing Time Data Processing Patterns Bounded Data Unbounded Data: Batch Fixed windows Sessions Unbounded Data: Streaming Time-agnostic Filtering Inner joins Approximation algorithms Windowing Windowing by processing time Windowing by event time Summary 2. The What, Where, When, and How of Data Processing Roadmap Batch Foundations: What and Where What: Transformations Where: Windowing Going Streaming: When and How When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things! When: Watermarks When: Early/On-Time/Late Triggers FTW! When: Allowed Lateness (i.e., Garbage Collection) How: Accumulation Summary 3. Watermarks Definition Source Watermark Creation Perfect Watermark Creation Heuristic Watermark Creation Watermark Propagation Understanding Watermark Propagation Watermark Propagation and Output Timestamps The Tricky Case of Overlapping Windows Percentile Watermarks Processing-Time Watermarks Case Studies Case Study: Watermarks in Google Cloud Dataflow Case Study: Watermarks in Apache Flink Case Study: Source Watermarks for Google Cloud Pub/Sub Summary 4. Advanced Windowing When/Where: Processing-Time Windows Event-Time Windowing Processing-Time Windowing via Triggers Processing-Time Windowing via Ingress Time Where: Session Windows Where: Custom Windowing Variations on Fixed Windows Unaligned fixed windows Per-element/key fixed windows Variations on Session Windows Bounded sessions One Size Does Not Fit All Summary 5. Exactly-Once and Side Effects Why Exactly Once Matters Accuracy Versus Completeness Side Effects Problem Definition Ensuring Exactly Once in Shuffle Addressing Determinism Performance Graph Optimization Bloom Filters Garbage Collection Exactly Once in Sources Exactly Once in Sinks Use Cases Example Source: Cloud Pub/Sub Example Sink: Files Example Sink: Google BigQuery Other Systems Apache Spark Streaming Apache Flink Summary II. Streams and Tables 6. Streams and Tables Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity Toward a General Theory of Stream and Table Relativity Batch Processing Versus Streams and Tables A Streams and Tables Analysis of MapReduce Map as streams/tables Reduce as streams/tables Reconciling with Batch Processing What, Where, When, and How in a Streams and Tables World What: Transformations Where: Windowing Window merging When: Triggers How: Accumulation A Holistic View of Streams and Tables in the Beam Model A General Theory of Stream and Table Relativity Summary 7. The Practicalities of Persistent State Motivation The Inevitability of Failure Correctness and Efficiency Implicit State Raw Grouping Incremental Combining Generalized State Case Study: Conversion Attribution Conversion Attribution with Apache Beam Summary 8. Streaming SQL What Is Streaming SQL? Relational Algebra Time-Varying Relations Streams and Tables Looking Backward: Stream and Table Biases The Beam Model: A Stream-Biased Approach The SQL Model: A Table-Biased Approach Materialized views Looking Forward: Toward Robust Streaming SQL Stream and Table Selection Temporal Operators Where: windowing When: triggers A SQL-ish default: per-record triggers Watermark triggers Repeated delay triggers Data-driven triggers How: accumulation Retractions in a SQL world Discarding mode, or lack thereof Summary 9. Streaming Joins All Your Joins Are Belong to Streaming Unwindowed Joins FULL OUTER LEFT OUTER RIGHT OUTER INNER ANTI SEMI Windowed Joins Fixed Windows Temporal Validity Temporal validity windows Temporal validity joins Watermarks and temporal validity joins Summary 10. The Evolution of Large-Scale Data Processing MapReduce Hadoop Flume Storm Spark MillWheel Kafka Cloud Dataflow Flink Beam Summary Index

Specyfikacja

Podstawowe informacje

Autor
  • Tyler Akidau;Slava Chernyak;Reuven Lax
Wybrane wydawnictwa
  • O'Reilly Media