AI Engineering

Lista Ofert

Opis

Recent breakthroughs in AI have not only increased demand for AI products, they've also lowered the barriers to entry for those who want to build AI products. The model-as-a-service approach has transformed AI from an esoteric discipline into a powerful development tool that anyone can use. Everyone, including those with minimal or no prior AI experience, can now leverage AI models to build applications. In this book, author Chip Huyen discusses AI engineering: the process of building applications with readily available foundation models.The book starts with an overview of AI engineering, explaining how it differs from traditional ML engineering and discussing the new AI stack. The more AI is used, the more opportunities there are for catastrophic failures, and therefore, the more important evaluation becomes. This book discusses different approaches to evaluating open-ended models, including the rapidly growing AI-as-a-judge approach.AI application developers will discover how to navigate the AI landscape, including models, datasets, evaluation benchmarks, and the seemingly infinite number of use cases and application patterns. You'll learn a framework for developing an AI application, starting with simple techniques and progressing toward more sophisticated methods, and discover how to efficiently deploy these applications.Understand what AI engineering is and how it differs from traditional machine learning engineeringLearn the process for developing an AI application, the challenges at each step, and approaches to address themExplore various model adaptation techniques, including prompt engineering, RAG, fine-tuning, agents, and dataset engineering, and understand how and why they workExamine the bottlenecks for latency and cost when serving foundation models and learn how to overcome themChoose the right model, dataset, evaluation benchmarks, and metrics for your needsChip Huyen works to accelerate data analytics on GPUs at Voltron Data. Previously, she was with Snorkel AI and NVIDIA, founded an AI infrastructure startup, and taught Machine Learning Systems Design at Stanford. She's the author of the book Designing Machine Learning Systems, an Amazon bestseller in AI.AI Engineering builds upon and is complementary to Designing Machine Learning Systems (O'Reilly). Spis treści: Preface What This Book Is About What This Book Is Not Who This Book Is For Navigating This Book Conventions Used in This Book Using Code Examples OReilly Online Learning How to Contact Us Acknowledgments 1. Introduction to Building AI Applications with Foundation Models The Rise of AI Engineering From Language Models to Large Language Models Language models Self-supervision From Large Language Models to Foundation Models From Foundation Models to AI Engineering Foundation Model Use Cases Coding Image and Video Production Writing Education Conversational Bots Information Aggregation Data Organization Workflow Automation Planning AI Applications Use Case Evaluation The role of AI and humans in the application AI product defensibility Setting Expectations Milestone Planning Maintenance The AI Engineering Stack Three Layers of the AI Stack AI Engineering Versus ML Engineering Model development Modeling and training Dataset engineering Inference optimization Application development Evaluation Prompt engineering and context construction AI interface AI Engineering Versus Full-Stack Engineering Summary 2. Understanding Foundation Models Training Data Multilingual Models Domain-Specific Models Modeling Model Architecture Transformer architecture Attention mechanism Transformer block Other model architectures Model Size Scaling law: Building compute-optimal models Scaling extrapolation Scaling bottlenecks Post-Training Supervised Finetuning Preference Finetuning Reward model Finetuning using the reward model Sampling Sampling Fundamentals Sampling Strategies Temperature Top-k Top-p Stopping condition Test Time Compute Structured Outputs Prompting Post-processing Constrained sampling Finetuning The Probabilistic Nature of AI Inconsistency Hallucination Summary 3. Evaluation Methodology Challenges of Evaluating Foundation Models Understanding Language Modeling Metrics Entropy Cross Entropy Bits-per-Character and Bits-per-Byte Perplexity Perplexity Interpretation and Use Cases Exact Evaluation Functional Correctness Similarity Measurements Against Reference Data Exact match Lexical similarity Semantic similarity Introduction to Embedding AI as a Judge Why AI as a Judge? How to Use AI as a Judge Limitations of AI as a Judge Inconsistency Criteria ambiguity Increased costs and latency Biases of AI as a judge What Models Can Act as Judges? Ranking Models with Comparative Evaluation Challenges of Comparative Evaluation Scalability bottlenecks Lack of standardization and quality control From comparative performance to absolute performance The Future of Comparative Evaluation Summary 4. Evaluate AI Systems Evaluation Criteria Domain-Specific Capability Generation Capability Factual consistency Safety Instruction-Following Capability Instruction-following criteria Roleplaying Cost and Latency Model Selection Model Selection Workflow Model Build Versus Buy Open source, open weight, and model licenses Open source models versus model APIs Data privacy Data lineage and copyright Performance Functionality API cost versus engineering cost Control, access, and transparency On-device deployment Navigate Public Benchmarks Benchmark selection and aggregation Public leaderboards Custom leaderboards with public benchmarks Data contamination with public benchmarks How data contamination happens Handling data contamination Design Your Evaluation Pipeline Step 1. Evaluate All Components in a System Step 2. Create an Evaluation Guideline Define evaluation criteria Create scoring rubrics with examples Tie evaluation metrics to business metrics Step 3. Define Evaluation Methods and Data Select evaluation methods Annotate evaluation data Evaluate your evaluation pipeline Iterate Summary 5. Prompt Engineering Introduction to Prompting In-Context Learning: Zero-Shot and Few-Shot System Prompt and User Prompt Context Length and Context Efficiency Prompt Engineering Best Practices Write Clear and Explicit Instructions Explain, without ambiguity, what you want the model to do Ask the model to adopt a persona Provide examples Specify the output format Provide Sufficient Context Break Complex Tasks into Simpler Subtasks Give the Model Time to Think Iterate on Your Prompts Evaluate Prompt Engineering Tools Organize and Version Prompts Defensive Prompt Engineering Proprietary Prompts and Reverse Prompt Engineering Jailbreaking and Prompt Injection Direct manual prompt hacking Automated attacks Indirect prompt injection Information Extraction Defenses Against Prompt Attacks Model-level defense Prompt-level defense System-level defense Summary 6. RAG and Agents RAG RAG Architecture Retrieval Algorithms Term-based retrieval Embedding-based retrieval Comparing retrieval algorithms Combining retrieval algorithms Retrieval Optimization Chunking strategy Reranking Query rewriting Contextual retrieval RAG Beyond Texts Multimodal RAG RAG with tabular data Agents Agent Overview Tools Knowledge augmentation Capability extension Write actions Planning Planning overview Foundation models as planners Plan generation Function calling Planning granularity Complex plans Reflection and error correction Tool selection Agent Failure Modes and Evaluation Planning failures Tool failures Efficiency Memory Summary 7. Finetuning Finetuning Overview When to Finetune Reasons to Finetune Reasons Not to Finetune Finetuning and RAG Memory Bottlenecks Backpropagation and Trainable Parameters Memory Math Memory needed for inference Memory needed for training Numerical Representations Quantization Inference quantization Training quantization Finetuning Techniques Parameter-Efficient Finetuning PEFT techniques LoRA Why does LoRA work? LoRA configurations Serving LoRA adapters Quantized LoRA Model Merging and Multi-Task Finetuning Summing Linear combination Spherical linear interpolation (SLERP) Pruning redundant task-specific parameters Layer stacking Concatenation Finetuning Tactics Finetuning frameworks and base models Base models Finetuning methods Finetuning frameworks Finetuning hyperparameters Learning rate Batch size Number of epochs Prompt loss weight Summary 8. Dataset Engineering Data Curation Data Quality Data Coverage Data Quantity Data Acquisition and Annotation Data Augmentation and Synthesis Why Data Synthesis Traditional Data Synthesis Techniques Rule-based data synthesis Simulation AI-Powered Data Synthesis Instruction data synthesis Data verification Limitations to AI-generated data Quality control Superficial imitation Potential model collapse Obscure data lineage Model Distillation Data Processing Inspect Data Deduplicate Data Clean and Filter Data Format Data Summary 9. Inference Optimization Understanding Inference Optimization Inference Overview Computational bottlenecks Online and batch inference APIs Inference Performance Metrics Latency, TTFT, and TPOT Throughput and goodput Utilization, MFU, and MBU AI Accelerators Whats an accelerator? Computational capabilities Memory size and bandwidth Power consumption Inference Optimization Model Optimization Model compression Overcoming the autoregressive decoding bottleneck Speculative decoding Inference with reference Parallel decoding Attention mechanism optimization Redesigning the attention mechanism Optimizing the KV cache size Writing kernels for attention computation Kernels and compilers Inference Service Optimization Batching Decoupling prefill and decode Prompt caching Parallelism Summary 10. AI Engineering Architecture and User Feedback AI Engineering Architecture Step 1. Enhance Context Step 2. Put in Guardrails Input guardrails Output guardrails Guardrail implementation Step 3. Add Model Router and Gateway Router Gateway Step 4. Reduce Latency with Caches Exact caching Semantic caching Step 5. Add Agent Patterns Monitoring and Observability Metrics Logs and traces Drift detection AI Pipeline Orchestration User Feedback Extracting Conversational Feedback Natural language feedback Early termination Error correction Complaints Sentiment Other conversational feedback Regeneration Conversation organization Conversation length Dialogue diversity Feedback Design When to collect feedback In the beginning When something bad happens When the model has low confidence How to collect feedback Feedback Limitations Biases Degenerate feedback loop Summary Epilogue Index O autorze: Chip Huyen zajmowała się tworzeniem i wdrażaniem systemów ML dla takich firm jak NVIDIA, Netflix czy Snorkel AI. Brała też udział w projektowaniu Claypot AI, działającej w czasie rzeczywistym platformy do uczenia maszynowego. Jest autorką kursu CS 329S dotyczącego projektowania systemów uczenia maszynowego, dostępnego na Uniwersytecie Stanforda.

Rozwiń Zwiń

Specyfikacja

Podstawowe informacje

Autor

Chip Huyen

AI Engineering Żory

Lista Ofert

Opis

Specyfikacja

Podstawowe informacje