Workshop Elasticsearch 01 juin 2016 Paris
Notions de base
goyome.github.io/WS_ES_bases
goyome.github.io/WS_ES_bases
Nœud : Machine physique avec une instance Elasticsearch
Index : Espace logique d'un nœud (~ base)
Shard : Index lucene stockant les données
Autre exemple
La moitié des données de l'index 1 est réparti sur chacun des 2 shards
Le nombre de shard ne peut être modifié a posteriori
Cluster : Ensemble de nœuds répondant aux mêmes requêtes
Replica : Copie d'un index
curl -XPUT "localhost:9200/fr/user/esup-1" -d' { "email": "guillaume.colson@univ.fr", "name": "Guillaume Colson", "username": "@goyome" }'
{ "_index": "fr", "_type": "user", "_id": "esup-1", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true }
curl -XPOST "localhost:9200/_bulk" --data-binaries' { "create": { "_index": "us", "_type": "user", "_id": "1" }} { "email" : "john@smith.com", "name" : "John Smith", "username" : "@john" } { "create": { "_index": "gb", "_type": "user", "_id": "2" }} { "email" : "mary@jones.com", "name" : "Mary Jones", "username" : "@mary" } '
{ "took": 391, "errors": false, "items": [ { "create": { "_index": "us", "_type": "user", "_id": "1", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201 } }, { "create": { "_index": "gb", "_type": "user", ...
> fichier json contenant les tweets à importer
curl -XPOST "localhost:9200/_bulk?pretty" --data-binary @./path/to/your/tweets.json { "took" : 88, "errors" : false, "items" : [ { "create" : { "_index" : "gb", "_type" : "tweet", "_id" : "3", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 201 } }, { "create" : { "_index" : "us", "_type" : "tweet", ...
curl -XGET "localhost:9200/fr/user/esup-1"
{ "_index": "fr", "_type": "user", "_id": "esup-1", "_version": 1, "found": true, "_source": { "email": "guillaume.colson@univ.fr", "name": "Guillaume Colson", "username": "@goyome" } }
curl -XGET "localhost:9200/us/tweet/_search"
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 6, "max_score": 1, "hits": [ { "_index": "us", "_type": "tweet", "_id": "14", "_score": 1, "_source": { "date": "2014-09-24", "name": "John Smith", ...
curl -XGET "localhost:9200/_search?q=name:smith"
{ "took": 3, "timed_out": false, "_shards": { "total": 16, "successful": 16, "failed": 0 }, "hits": { "total": 7, "max_score": 0.4451987, "hits": [ { "_index": "us", "_type": "tweet", "_id": "8", "_score": 0.4451987, "_source": { "date": "2014-09-18", "name": "John Smith", "user_id": 1 } }, ...
curl -XGET "localhost:9200/_search" -d' { "query": { "match": { "name": "smith" } } }'
curl -XGET "localhost:9200/us,gb/tweet/_search" -d' { "query": { "filtered": { "query": { "match": { "tweet": "elasticsearch" } }, "filter": { "range": { "date": { "gte": "2014-09-20" } } } } } }'
... "hits": [ { "_index": "gb", "_type": "tweet", "_id": "13", "_score": 0.375, "_source": { "date": "2014-09-23", "name": "Mary Jones", "tweet": "So yes, I am an Elasticsearch fanboy", "user_id": 2 } }, { "_index": "us", "_type": "tweet", "_id": "10", "_score": 0.3125, "_source": { "date": "2014-09-20", "name": "John Smith", "tweet": "Elasticsearch surely is one of the hottest new NoSQL products", "user_id": 1 } }, ...
GET /_search?q=2014 # 12 resultats GET /_search?q=2014-09-15 # 12 resultats ! GET /_search?q=date:2014-09-20 # 1 resultat GET /_search?q=date:2014 # 0 resultat !
curl -XGET "http://localhost:9200/_search" -d' { "query": { "match": { "tweet": "elasticsearch is easy" } } }'
... "hits": [ { ... "_score": 0.4794072, "_source": { ... "tweet": "The Elasticsearch API is really easy to use", ... }, ... "_score": 0.4082814, "tweet": "Elasticsearch is built for the cloud, easy to scale", ... "_score": 0.22818159, "tweet": "Elasticsearch surely is one of the hottest new NoSQL products", ... "_score": 0.11272853, "tweet": "Elasticsearch means full text search has never been so easy", ...
Les 4 termes les plus fréquemment rencontrés dans les tweets
curl -XGET "http://localhost:9200/us,gb/tweet/_search" -d' { "size": 0, "aggs": { "hot term in tweet": { "terms": { "field": "tweet", "size": 4 } } } }'
... "aggregations": { "hot term in tweet": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 72, "buckets": [ { "key": "elasticsearch", "doc_count": 7 }, { "key": "is", "doc_count": 5 }, { "key": "the", "doc_count": 5 }, { "key": "i", "doc_count": 4 } ...
Séparation des tweets en 2 périodes avant et après le 20/09/2014
curl -XGET "http://localhost:9200/us,gb/tweet/_search" -d' { "size": 0, "aggs": { "Par date": { "range": { "field": "date", "ranges": [ { "to": "2014-09-20" }, { "from": "2014-09-20" } ] } } } }'
... "aggregations": { "Par date": { "buckets": [ { "key": "*-2014-09-20T00:00:00.000Z", "to": 1411171200000, "to_as_string": "2014-09-20T00:00:00.000Z", "doc_count": 7 }, { "key": "2014-09-20T00:00:00.000Z-*", "from": 1411171200000, "from_as_string": "2014-09-20T00:00:00.000Z", "doc_count": 5 } ...
Merci de votre attention ! Des questions ?