Deep Reinforcement Learning Hands-On (e-book)

Lista Ofert

Opis

This practical guide will teach you how deep learning (DL) can be used to solve complex real-world problems. Key Features Explore deep reinforcement learning (RL), from the first principles to the latest algorithms Evaluate high-profile RL methods, including value iteration, deep Q-networks, policy gradients, TRPO, PPO, DDPG, D4PG, evolution strategies and genetic algorithms Keep up with the very latest industry developments, including AI-driven chatbots Book Description Recent developments in reinforcement learning (RL), combined with deep learning (DL), have seen unprecedented progress made towards training agents to solve complex problems in a human-like way. Google's use of algorithms to play and defeat the well-known Atari arcade games has propelled the field to prominence, and researchers are generating new ideas at a rapid pace. Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots. What you will learn Understand the DL context of RL and implement complex DL models Learn the foundation of RL: Markov decision processes Evaluate RL methods including Cross-entropy, DQN, Actor-Critic, TRPO, PPO, DDPG, D4PG and others Discover how to deal with discrete and continuous action spaces in various environments Defeat Atari arcade games using the value iteration method Create your own OpenAI Gym environment to train a stock trading agent Teach your agent to play Connect4 using AlphaGo Zero Explore the very latest deep RL research on topics including AI-driven chatbots Who this book is for Some fluency in Python is assumed. Basic deep learning (DL) approaches should be familiar to readers and some practical experience in DL will be helpful. This book is an introduction to deep reinforcement learning (RL) and requires no background in RL. Spis treści: Deep Reinforcement Learning Hands-On Table of Contents Deep Reinforcement Learning Hands-On Why subscribe? PacktPub.com Contributors About the author About the reviewers Packt is Searching for Authors Like You Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews 1. What is Reinforcement Learning? Learning supervised, unsupervised, and reinforcement RL formalisms and relations Reward The agent The environment Actions Observations Markov decision processes Markov process Markov reward process Markov decision process Summary 2. OpenAI Gym The anatomy of the agent Hardware and software requirements OpenAI Gym API Action space Observation space The environment Creation of the environment The CartPole session The random CartPole agent The extra Gym functionality wrappers and monitors Wrappers Monitor Summary 3. Deep Learning with PyTorch Tensors Creation of tensors Scalar tensors Tensor operations GPU tensors Gradients Tensors and gradients NN building blocks Custom layers Final glue loss functions and optimizers Loss functions Optimizers Monitoring with TensorBoard TensorBoard 101 Plotting stuff Example GAN on Atari images Summary 4. The Cross-Entropy Method Taxonomy of RL methods Practical cross-entropy Cross-entropy on CartPole Cross-entropy on FrozenLake Theoretical background of the cross-entropy method Summary 5. Tabular Learning and the Bellman Equation Value, state, and optimality The Bellman equation of optimality Value of action The value iteration method Value iteration in practice Q-learning for FrozenLake Summary 6. Deep Q-Networks Real-life value iteration Tabular Q-learning Deep Q-learning Interaction with the environment SGD optimization Correlation between steps The Markov property The final form of DQN training DQN on Pong Wrappers DQN model Training Running and performance Your model in action Summary 7. DQN Extensions The PyTorch Agent Net library Agent Agents experience Experience buffer Gym env wrappers Basic DQN N-step DQN Implementation Double DQN Implementation Results Noisy networks Implementation Results Prioritized replay buffer Implementation Results Dueling DQN Implementation Results Categorical DQN Implementation Results Combining everything Implementation Results Summary References 8. Stocks Trading Using RL Trading Data Problem statements and key decisions The trading environment Models Training code Results The feed-forward model The convolution model Things to try Summary 9. Policy Gradients An Alternative Values and policy Why policy? Policy representation Policy gradients The REINFORCE method The CartPole example Results Policy-based versus value-based methods REINFORCE issues Full episodes are required High gradients variance Exploration Correlation between samples PG on CartPole Results PG on Pong Results Summary 10. The Actor-Critic Method Variance reduction CartPole variance Actor-critic A2C on Pong A2C on Pong results Tuning hyperparameters Learning rate Entropy beta Count of environments Batch size Summary 11. Asynchronous Advantage Actor-Critic Correlation and sample efficiency Adding an extra A to A2C Multiprocessing in Python A3C data parallelism Results A3C gradients parallelism Results Summary 12. Chatbots Training with RL Chatbots overview Deep NLP basics Recurrent Neural Networks Embeddings Encoder-Decoder Training of seq2seq Log-likelihood training Bilingual evaluation understudy (BLEU) score RL in seq2seq Self-critical sequence training The chatbot example The example structure Modules: cornell.py and data.py BLEU score and utils.py Model Training: cross-entropy Running the training Checking the data Testing the trained model Training: SCST Running the SCST training Results Telegram bot Summary 13. Web Navigation Web navigation Browser automation and RL Mini World of Bits benchmark OpenAI Universe Installation Actions and observations Environment creation MiniWoB stability Simple clicking approach Grid actions Example overview Model Training code Starting containers Training process Checking the learned policy Issues with simple clicking Human demonstrations Recording the demonstrations Recording format Training using demonstrations Results TicTacToe problem Adding text description Results Things to try Summary 14. Continuous Action Space Why a continuous space? Action space Environments The Actor-Critic (A2C) method Implementation Results Using models and recording videos Deterministic policy gradients Exploration Implementation Results Recording videos Distributional policy gradients Architecture Implementation Results Things to try Summary 15. Trust Regions TRPO, PPO, and ACKTR Introduction Roboschool A2C baseline Results Videos recording Proximal Policy Optimization Implementation Results Trust Region Policy Optimization Implementation Results A2C using ACKTR Implementation Results Summary 16. Black-Box Optimization in RL Black-box methods Evolution strategies ES on CartPole Results ES on HalfCheetah Results Genetic algorithms GA on CartPole Results GA tweaks Deep GA Novelty search GA on Cheetah Results Summary References 17. Beyond Model-Free Imagination Model-based versus model-free Model imperfections Imagination-augmented agent The environment model The rollout policy The rollout encoder Paper results I2A on Atari Breakout The baseline A2C agent EM training The imagination agent The I2A model The Rollout encoder Training of I2A Experiment results The baseline agent Training EM weights Training with the I2A model Summary References 18. AlphaGo Zero Board games The AlphaGo Zero method Overview Monte-Carlo Tree Search Self-play Training and evaluation Connect4 bot Game model Implementing MCTS Model Training Testing and comparison Connect4 results Summary References Book summary Other Books You May Enjoy Leave a review - let other readers know what you think Index

Rozwiń Zwiń

Specyfikacja

Podstawowe informacje

Autor	Maxim Lapan
Rok wydania	2018

Techniczne

Format	PDF MOBI EPUB
Ilość stron	547

Dodatkowe informacje

Kategorie	Programowanie
Wybrane wydawnictwa	Packt Publishing

Deep Reinforcement Learning Hands-On (e-book) Katowice