Applied Unsupervised Learning with R (e-book)

Lista Ofert

Opis

Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data. Key Features Build state-of-the-art algorithms that can solve your business' problems Learn how to find hidden patterns in your data Revise key concepts with hands-on exercises using real-world datasets Book Description Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions. This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models. By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection. What you will learn Implement clustering methods such as k-means, agglomerative, and divisive Write code in R to analyze market segmentation and consumer behavior Estimate distribution and probabilities of different outcomes Implement dimension reduction using principal component analysis Apply anomaly detection methods to identify fraud Design algorithms with R and learn how to edit or improve code Who this book is for Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginner-level familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians. Spis treści: Preface About the Book About the Authors Elevator Pitch Key Features Description Learning Objectives Audience Approach Hardware Requirements Software Requirements Conventions Installation and Setup Installing R on Windows Installing R on macOS X Installing R on Linux Chapter 1 Introduction to Clustering Methods Introduction Introduction to Clustering Uses of Clustering Introduction to the Iris Dataset Exercise 1: Exploring the Iris Dataset Types of Clustering Introduction to k-means Clustering Euclidean Distance Manhattan Distance Cosine Distance The Hamming Distance k-means Clustering Algorithm Steps to Implement k-means Clustering Exercise 2: Implementing k-means Clustering on the Iris Dataset Activity 1: k-means Clustering with Three Clusters Introduction to k-means Clustering with Built-In Functions k-means Clustering with Three Clusters Exercise 3: k-means Clustering with R Libraries Introduction to Market Segmentation Exercise 4: Exploring the Wholesale Customer Dataset Activity 2: Customer Segmentation with k-means Introduction to k-medoids Clustering The k-medoids Clustering Algorithm k-medoids Clustering Code Exercise 5: Implementing k-medoid Clustering k-means Clustering versus k-medoids Clustering Activity 3: Performing Customer Segmentation with k-medoids Clustering Deciding the Optimal Number of Clusters Types of Clustering Metrics Silhouette Score Exercise 6: Calculating the Silhouette Score Exercise 7: Identifying the Optimum Number of Clusters WSS/Elbow Method Exercise 8: Using WSS to Determine the Number of Clusters The Gap Statistic Exercise 9: Calculating the Ideal Number of Clusters with the Gap Statistic Activity 4: Finding the Ideal Number of Market Segments Summary Chapter 2 Advanced Clustering Methods Introduction Introduction to k-modes Clustering Steps for k-Modes Clustering Exercise 10: Implementing k-modes Clustering Activity 5: Implementing k-modes Clustering on the Mushroom Dataset Introduction to Density-Based Clustering (DBSCAN) Steps for DBSCAN Exercise 11: Implementing DBSCAN Uses of DBSCAN Activity 6: Implementing DBSCAN and Visualizing the Results Introduction to Hierarchical Clustering Types of Similarity Metrics Steps to Perform Agglomerative Hierarchical Clustering Exercise 12: Agglomerative Clustering with Different Similarity Measures Divisive Clustering Steps to Perform Divisive Clustering Exercise 13: Performing DIANA Clustering Activity 7: Performing Hierarchical Cluster Analysis on the Seeds Dataset Summary Chapter 3 Probability Distributions Introduction Basic Terminology of Probability Distributions Uniform Distribution Exercise 14: Generating and Plotting Uniform Samples in R Normal Distribution Exercise 15: Generating and Plotting a Normal Distribution in R Skew and Kurtosis Log-Normal Distributions Exercise 16: Generating a Log-Normal Distribution from a Normal Distribution The Binomial Distribution Exercise 17: Generating a Binomial Distribution The Poisson Distribution The Pareto Distribution Introduction to Kernel Density Estimation KDE Algorithm Exercise 18: Visualizing and Understanding KDE Exercise 19: Studying the Effect of Changing Kernels on a Distribution Activity 8: Finding the Standard Distribution Closest to the Distribution of Variables of the Iris Dataset Introduction to the Kolmogorov-Smirnov Test The Kolmogorov-Smirnov Test Algorithm Exercise 20: Performing the Kolmogorov-Smirnov Test on Two Samples Activity 9: Calculating the CDF and Performing the Kolmogorov-Smirnov Test with the Normal Distribution Summary Chapter 4 Dimension Reduction Introduction The Idea of Dimension Reduction Exercise 21: Examining a Dataset that Contains the Chemical Attributes of Different Wines Importance of Dimension Reduction Market Basket Analysis Exercise 22: Data Preparation for the Apriori Algorithm Exercise 23: Passing through the Data to Find the Most Common Baskets Exercise 24: More Passes through the Data Exercise 25: Generating Associative Rules as the Final Step of the Apriori Algorithm Principal Component Analysis Linear Algebra Refresher Matrices Variance Covariance Exercise 26: Examining Variance and Covariance on the Wine Dataset Eigenvectors and Eigenvalues The Idea of PCA Exercise 27: Performing PCA Exercise 28: Performing Dimension Reduction with PCA Activity 10: Performing PCA and Market Basket Analysis on a New Dataset Summary Chapter 5 Data Comparison Methods Introduction Hash Functions Exercise 29: Creating and Using a Hash Function Exercise 30: Verifying Our Hash Function Analytic Signatures Exercise 31: Perform the Data Preparation for Creating an Analytic Signature for an Image Exercise 32: Creating a Brightness Comparison Function Exercise 33: Creating a Function to Compare Image Sections to All of the Neighboring Sections Exercise 34: Creating a Function that Generates an Analytic Signature for an Image Activity 11: Creating an Image Signature for a Photograph of a Person Comparison of Signatures Activity 12: Creating an Image Signature for the Watermarked Image Applying Other Unsupervised Learning Methods to Analytic Signatures Latent Variable Models Factor Analysis Exercise 35: Preparing for Factor Analysis Linear Algebra behind Factor Analysis Exercise 36: More Exploration with Factor Analysis Activity 13: Performing Factor Analysis Summary Chapter 6 Anomaly Detection Introduction Univariate Outlier Detection Exercise 37: Performing an Exploratory Visual Check for Outliers Using Rs boxplot Function Exercise 38: Transforming a Fat-Tailed Dataset to Improve Outlier Classification Exercise 39: Finding Outliers without Using R's Built-In boxplot Function Exercise 40: Detecting Outliers Using a Parametric Method Multivariate Outlier Detection Exercise 41: Calculating Mahalanobis Distance Detecting Anomalies in Clusters Other Methods for Multivariate Outlier Detection Exercise 42: Classifying Outliers based on Comparisons of Mahalanobis Distances Detecting Outliers in Seasonal Data Exercise 43: Performing Seasonality Modeling Exercise 44: Finding Anomalies in Seasonal Data Using a Parametric Method Contextual and Collective Anomalies Exercise 45: Detecting Contextual Anomalies Exercise 46: Detecting Collective Anomalies Kernel Density Exercise 47: Finding Anomalies Using Kernel Density Estimation Continuing in Your Studies of Anomaly Detection Activity 14: Finding Univariate Anomalies Using a Parametric Method and a Non-parametric Method Activity 15: Using Mahalanobis Distance to Find Anomalies Summary Appendix Chapter 1: Introduction to Clustering Methods Activity 1: k-means Clustering with Three Clusters Activity 2: Customer Segmentation with k-means Activity 3: Performing Customer Segmentation with k-medoids Clustering Activity 4: Finding the Ideal Number of Market Segments Chapter 2: Advanced Clustering Methods Activity 5: Implementing k-modes Clustering on the Mushroom Dataset Activity 6: Implementing DBSCAN and Visualizing the Results Activity 7: Performing a Hierarchical Cluster Analysis on the Seeds Dataset Chapter 3: Probability Distributions Activity 8: Finding the Standard Distribution Closest to the Distribution of Variables of the Iris Dataset Activity 9: Calculating the CDF and Performing the Kolmogorov-Simonov Test with the Normal Distribution Chapter 4: Dimension Reduction Activity 10: Performing PCA and Market Basket Analysis on a New Dataset Chapter 5: Data Comparison Methods Activity 11: Create an Image Signature for a Photograph of a Person Activity 12: Create an Image Signature for the Watermarked Image Activity 13: Performing Factor Analysis Chapter 6: Anomaly Detection Activity 14: Finding Univariate Anomalies Using a Parametric Method and a Non-parametric Method Activity 15: Using Mahalanobis Distance to Find Anomalies

Rozwiń Zwiń

Specyfikacja

Podstawowe informacje

Autor	Alok Malik, Bradford Tuckfield
Rok wydania	2019

Techniczne

Format	PDF MOBI EPUB
Ilość stron	320

Dodatkowe informacje

Kategorie	Programowanie
Wybrane wydawnictwa	Packt Publishing

Applied Unsupervised Learning with R (e-book) Katowice