Elasticsearch Server. From creating your own index structure through to cluster monitoring and troubleshooting, this is the complete guide to implemen

Lista Ofert

Opis

Spis treści: Elasticsearch Server Second Edition Table of Contents Elasticsearch Server Second Edition Credits About the Author Acknowledgments About the Author Acknowledgments About the Reviewers www.PacktPub.com Support files, eBooks, discount offers, and more Why subscribe? Free access for Packt account holders Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Errata Piracy Questions 1. Getting Started with the Elasticsearch Cluster Full-text searching The Lucene glossary and architecture Input data analysis Indexing and querying Scoring and query relevance The basics of Elasticsearch Key concepts of data architecture Index Document Document type Mapping Key concepts of Elasticsearch Node and cluster Shard Replica Gateway Indexing and searching Installing and configuring your cluster Installing Java Installing Elasticsearch Installing Elasticsearch from binary packages on Linux Installing Elasticsearch using the RPM package Installing Elasticsearch using the DEB package The directory layout Configuring Elasticsearch Running Elasticsearch Shutting down Elasticsearch Running Elasticsearch as a system service Elasticsearch as a system service on Linux Elasticsearch as a system service on Windows Manipulating data with the REST API Understanding the Elasticsearch RESTful API Storing data in Elasticsearch Creating a new document Automatic identifier creation Retrieving documents Updating documents Deleting documents Versioning An example of versioning Using the version provided by an external system Searching with the URI request query Sample data The URI request The Elasticsearch query response Query analysis URI query string parameters The query The default search field Analyzer The default operator Query explanation The fields returned Sorting the results The search timeout The results window The search type Lowercasing the expanded terms Analyzing the wildcard and prefixes The Lucene query syntax Summary 2. Indexing Your Data Elasticsearch indexing Shards and replicas Creating indices Altering automatic index creation Settings for a newly created index Mappings configuration Type determining mechanism Disabling field type guessing Index structure mapping Type definition Fields Core types Common attributes String Number Boolean Binary Date Multifields The IP address type The token_count type Using analyzers Out-of-the-box analyzers Defining your own analyzers Analyzer fields Default analyzers Different similarity models Setting per-field similarity Available similarity models Configuring DFR similarity Configuring IB similarity The postings format Configuring the postings format Doc values Configuring the doc values Doc values formats Batch indexing to speed up your indexing process Preparing data for bulk indexing Indexing the data Even quicker bulk requests Extending your index structure with additional internal information Identifier fields The _type field The _all field The _source field Exclusion and inclusion The _index field The _size field The _timestamp field The _ttl field Introduction to segment merging Segment merging The need for segment merging The merge policy The merge scheduler The merge factor Throttling Introduction to routing Default indexing Default searching Routing The routing parameters Routing fields Summary 3. Searching Your Data Querying Elasticsearch The example data A simple query Paging and result size Returning the version value Limiting the score Choosing the fields that we want to return The partial fields Using the script fields Passing parameters to the script fields Understanding the querying process Query logic Search types Search execution preferences The Search shards API Basic queries The term query The terms query The match_all query The common terms query The match query The Boolean match query The match_phrase query The match_phrase_prefix query The multi_match query The query_string query Running the query_string query against multiple fields The simple_query_string query The identifiers query The prefix query The fuzzy_like_this query The fuzzy_like_this_field query The fuzzy query The wildcard query The more_like_this query The more_like_this_field query The range query The dismax query The regular expression query Compound queries The bool query The boosting query The constant_score query The indices query Filtering your results Using filters Filter types The range filter The exists filter The missing filter The script filter The type filter The limit filter The identifiers filter If this is not enough Combining filters A word about the bool filter Named filters Caching filters Highlighting Getting started with highlighting Field configuration Under the hood Configuring HTML tags Controlling the highlighted fragments Global and local settings Require matching The postings highlighter Validating your queries Using the validate API Sorting data Default sorting Selecting fields used for sorting Specifying the behavior for missing fields Dynamic criteria Collation and national characters Query rewrite An example of the rewrite process Query rewrite properties Summary 4. Extending Your Index Structure Indexing tree-like structures Data structure Analysis Indexing data that is not flat Data Objects Arrays Mappings Final mappings Sending the mappings to Elasticsearch To be or not to be dynamic Using nested objects Scoring and nested queries Using the parent-child relationship Index structure and data indexing Parent mappings Child mappings The parent document The child documents Querying Querying data in the child documents The top children query Querying data in the parent documents The parent-child relationship and filtering Performance considerations Modifying your index structure with the update API The mappings Adding a new field Modifying fields Summary 5. Make Your Search Better An introduction to Apache Lucene scoring When a document is matched Default scoring formula Relevancy matters Scripting capabilities of Elasticsearch Objects available during script execution MVEL Using other languages Using our own script library Using native code The factory implementation Implementing the native script Installing scripts Running the script Searching content in different languages Handling languages differently Handling multiple languages Detecting the language of the documents Sample document The mappings Querying Queries with the identified language Queries with unknown languages Combining queries Influencing scores with query boosts The boost Adding boost to queries Modifying the score The constant_score query The boosting query The function_score query The structure of the function query Deprecated queries Replacing the custom_boost_factor query Replacing the custom_score query Replacing the custom_filters_score query When does index-time boosting make sense? Defining field boosting in input data Defining boosting in mapping Words with the same meaning The synonym filter Synonyms in the mappings Synonyms stored in the filesystem Defining synonym rules Using Apache Solr synonyms Explicit synonyms Equivalent synonyms Expanding synonyms Using WordNet synonyms Query- or index-time synonym expansion Understanding the explain information Understanding field analysis Explaining the query Summary 6. Beyond Full-text Searching Aggregations General query structure Available aggregations Metric aggregations Min, max, sum, and avg aggregations Using scripts The value_count aggregation The stats and extended_stats aggregations Bucketing The terms aggregation The range aggregation The date_range aggregation IPv4 range aggregation The missing aggregation Nested aggregation The histogram aggregation The date_histogram aggregation Time zones The geo_distance aggregation The geohash_grid aggregation Nesting aggregations Bucket ordering and nested aggregations Global and subsets Inclusions and exclusions Faceting The document structure Returned results Using queries for faceting calculations Using filters for faceting calculations Terms faceting Ranges based faceting Choosing different fields for an aggregated data calculation Numerical and date histogram faceting The date_histogram facet Computing numerical field statistical data Computing statistical data for terms Geographical faceting Filtering faceting results Memory considerations Using suggesters Available suggester types Including suggestions The suggester response The term suggester The term suggester configuration options Additional term suggester options The phrase suggester Configuration The completion suggester Indexing data Querying the indexed completion suggester data Custom weights Percolator The index Percolator preparation Getting deeper Getting the number of matching queries Indexed documents percolation Handling files Adding additional information about the file Geo Mappings preparation for spatial search Example data Sample queries Distance-based sorting Bounding box filtering Limiting the distance Arbitrary geo shapes Point Envelope Polygon Multipolygon An example usage Storing shapes in the index The scroll API Problem definition Scrolling to the rescue The terms filter Terms lookup The terms lookup query structure Terms lookup cache settings Summary 7. Elasticsearch Cluster in Detail Node discovery Discovery types The master node Configuring the master and data nodes The master-election configuration Setting the cluster name Configuring multicast Configuring unicast Ping settings for nodes The gateway and recovery modules The gateway Recovery control Additional gateway recovery options Preparing Elasticsearch cluster for high query and indexing throughput The filter cache The field data cache and circuit breaker The circuit breaker The store Index buffers and the refresh rate The index refresh rate The thread pool configuration Combining it all together some general advice Choosing the right store The index refresh rate Tuning the thread pools Tuning your merge process The field data cache and breaking the circuit RAM buffer for indexing Tuning transaction logging Things to keep in mind Templates and dynamic templates Templates An example of a template Storing templates in files Dynamic templates The matching pattern Field definitions Summary 8. Administrating Your Cluster The Elasticsearch time machine Creating a snapshot repository Creating snapshots Additional parameters Restoring a snapshot Cleaning up deleting old snapshots Monitoring your clusters state and health The cluster health API Controlling information details Additional parameters The indices stats API Docs Store Indexing, get, and search Additional information The status API The nodes info API The nodes stats API The cluster state API The pending tasks API The indices segments API The cat API Limiting returned information Controlling cluster rebalancing Rebalancing Cluster being ready The cluster rebalance settings Controlling when rebalancing will start Controlling the number of shards being moved between nodes concurrently Controlling the number of shards initialized concurrently on a single node Controlling the number of primary shards initialized concurrently on a single node Controlling types of shards allocation Controlling the number of concurrent streams on a single node Controlling the shard and replica allocation Explicitly controlling allocation Specifying node parameters Configuration Index creation Excluding nodes from allocation Requiring node attributes Using IP addresses for shard allocation Disk-based shard allocation Enabling disk-based shard allocation Configuring disk-based shard allocation Cluster wide allocation Number of shards and replicas per node Moving shards and replicas manually Moving shards Canceling shard allocation Forcing shard allocation Multiple commands per HTTP request Warming up Defining a new warming query Retrieving the defined warming queries Deleting a warming query Disabling the warming up functionality Choosing queries Index aliasing and using it to simplify your everyday work An alias Creating an alias Modifying aliases Combining commands Retrieving all aliases Removing aliases Filtering aliases Aliases and routing Elasticsearch plugins The basics Installing plugins Removing plugins The update settings API Summary Index

Rozwiń Zwiń

Specyfikacja

Podstawowe informacje

Autor	Marek Rogozinski, Rafal Kuc
Rok wydania	2014

Techniczne

Format	PDF MOBI EPUB
Ilość stron	428

Dodatkowe informacje

Kategorie	Programowanie
Wybrane wydawnictwa	Packt Publishing

Elasticsearch Server. From creating your own index structure through to cluster monitoring and troubleshooting, this is the complete guide to implemen Katowice