Natural Language Processing with Spark NLP. Learning to Understand Text at Scale (ebook) Chorzów

If you want to build an enterprise-quality application that uses natural language text but aren...t sure where to begin or what tools to use, this practical guide will help get you started. Alex Thomas, principal data scientist at Wisecube, shows software engineers and data scientists how to build …

od 203,15 Najbliżej: 21 km

Liczba ofert: 2

Oferta sklepu

Opis

If you want to build an enterprise-quality application that uses natural language text but aren...t sure where to begin or what tools to use, this practical guide will help get you started. Alex Thomas, principal data scientist at Wisecube, shows software engineers and data scientists how to build scalable natural language processing (NLP) applications using deep learning and the Apache Spark NLP library.Through concrete examples, practical and theoretical explanations, and hands-on exercises for using NLP on the Spark processing framework, this book teaches you everything from basic linguistics and writing systems to sentiment analysis and search engines. You...ll also explore special concerns for developing text-based applications, such as performance.In four sections, you...ll learn NLP basics and building blocks before diving into application and system building:Basics: Understand the fundamentals of natural language processing, NLP on Apache Stark, and deep learningBuilding blocks: Learn techniques for building NLP applications-including tokenization, sentence segmentation, and named-entity recognition-and discover how and why they workApplications: Explore the design, development, and experimentation process for building your own NLP applicationsBuilding NLP systems: Consider options for productionizing and deploying NLP models, including which human languages to support Spis treści: Preface Why Natural Language Processing Is Important and Difficult Background Philosophy Conventions Used in This Book Using Code Examples OReilly Online Learning How to Contact Us Acknowledgments I. Basics 1. Getting Started Introduction Other Tools Setting Up Your Environment Prerequisites Starting Apache Spark Checking Out the Code Getting Familiar with Apache Spark Starting Apache Spark with Spark NLP Loading and Viewing Data in Apache Spark Hello World with Spark NLP 2. Natural Language Basics What Is Natural Language? Origins of Language Spoken Language Versus Written Language Linguistics Phonetics and Phonology Morphology Syntax Semantics Sociolinguistics: Dialects, Registers, and Other Varieties Formality Context Pragmatics Roman Jakobson How To Use Pragmatics Writing Systems Origins Alphabets Abjads Abugidas Syllabaries Logographs Encodings ASCII Unicode UTF-8 Exercises: Tokenizing Tokenize English Tokenize Greek Tokenize Geez (Amharic) Resources 3. NLP on Apache Spark Parallelism, Concurrency, Distributing Computation Parallelization Before Apache Hadoop MapReduce and Apache Hadoop Apache Spark Architecture of Apache Spark Physical Architecture Logical Architecture RDDs Partitioning Serialization Ordering Output and logging Spark jobs Persisting Python and R Spark SQL and Spark MLlib Transformers SQLTransformer Binarizer VectorAssembler Estimators and Models MinMaxScaler StringIndexer Evaluators Pipelines Cross validation Serialization of models NLP Libraries Functionality Libraries Annotation Libraries NLP in Other Libraries Spark NLP Annotation Library Stages Transformers DocumentAssembler Annotators SentenceDetector Tokenizer Lemmatizer POS tagger Pretrained Pipelines Explain document ML pipeline Finisher Exercises: Build a Topic Model Resources 4. Deep Learning Basics Gradient Descent Backpropagation Convolutional Neural Networks Filters Pooling Recurrent Neural Networks Backpropagation Through Time Elman Nets LSTMs Exercise 1 Exercise 2 Resources II. Building Blocks 5. Processing Words Tokenization Vocabulary Reduction Stemming Lemmatization Stemming Versus Lemmatization Spelling Correction Normalization Bag-of-Words CountVectorizer N-Gram Visualizing: Word and Document Distributions Exercises Resources 6. Information Retrieval Inverted Indices Building an Inverted Index Step 1 Step 2 Step 3 Step 4 Vector Space Model Stop-Word Removal Inverse Document Frequency In Spark Exercises Resources 7. Classification and Regression Bag-of-Words Features Regular Expression Features Feature Selection Modeling Nave Bayes Linear Models Decision/Regression Trees Deep Learning Algorithms Iteration Exercises 8. Sequence Modeling with Keras Sentence Segmentation (Hidden) Markov Models Section Segmentation Part-of-Speech Tagging Conditional Random Field Chunking and Syntactic Parsing Language Models Recurrent Neural Networks Exercise: Character N-Grams Exercise: Word Language Model Resources 9. Information Extraction Named-Entity Recognition Coreference Resolution Assertion Status Detection Relationship Extraction Summary Exercises 10. Topic Modeling K-Means Latent Semantic Indexing Nonnegative Matrix Factorization Latent Dirichlet Allocation Exercises 11. Word Embeddings Word2vec GloVe fastText Transformers ELMo, BERT, and XLNet doc2vec Exercises III. Applications 12. Sentiment Analysis and Emotion Detection Problem Statement and Constraints Plan the Project Design the Solution Implement the Solution Test and Measure the Solution Business Metrics Model-Centric Metrics Infrastructure Metrics Process Metrics Offline Versus Online Model Measurement Review Initial Deployment Fallback Plans Next Steps Conclusion 13. Building Knowledge Bases Problem Statement and Constraints Plan the Project Design the Solution Implement the Solution Test and Measure the Solution Business Metrics Model-Centric Metrics Infrastructure Metrics Process Metrics Review Conclusion 14. Search Engine Problem Statement and Constraints Plan the Project Design the Solution Implement the Solution Test and Measure the Solution Business Metrics Model-Centric Metrics Review Conclusion 15. Chatbot Problem Statement and Constraints Plan the Project Design the Solution Implement the Solution Test and Measure the Solution Business Metrics Model-Centric Metrics Review Conclusion 16. Object Character Recognition Kinds of OCR Tasks Images of Printed Text and PDFs to Text Images of Handwritten Text to Text Images of Text in Environment to Text Images of Text to Target Note on Different Writing Systems Problem Statement and Constraints Plan the Project Implement the Solution Test and Measure the Solution Model-Centric Metrics Review Conclusion IV. Building NLP Systems 17. Supporting Multiple Languages Language Typology Scenario: Academic Paper Classification Text Processing in Different Languages Compound Words Morphological Complexity Transfer Learning and Multilingual Deep Learning Search Across Languages Checklist Conclusion 18. Human Labeling Guidelines Scenario: Academic Paper Classification Inter-Labeler Agreement Iterative Labeling Labeling Text Classification Tagging Checklist Conclusion 19. Productionizing NLP Applications Spark NLP Model Cache Spark NLP and TensorFlow Integration Spark Optimization Basics Design-Level Optimization Profiling Tools Monitoring Managing Data Resources Testing NLP-Based Applications Unit Tests Integration Tests Smoke and Sanity Tests Performance Tests Usability Tests Demoing NLP-Based Applications Checklists Model Deployment Checklist Scaling and Performance Checklist Testing Checklist Conclusion Glossary Index

Specyfikacja

Podstawowe informacje

Autor
  • Alex Thomas
Rok wydania
  • 2020
Ilość stron
  • 366
Format
  • MOBI
  • EPUB