Workshop Elasticsearch 01 juin 2016 Paris
Notions de base
goyome.github.io/WS_ES_bases
goyome.github.io/WS_ES_bases


Nœud avec un index contenant un shard
Nœud : Machine physique avec une instance Elasticsearch
Index : Espace logique d'un nœud (~ base)
Shard : Index lucene stockant les données
Nœud avec 1 index de 2 shards et un de 2
Autre exemple
La moitié des données de l'index 1 est réparti sur chacun des 2 shards
Le nombre de shard ne peut être modifié a posteriori
Cluster de 2 nœuds avec un index répliqué
Cluster : Ensemble de nœuds répondant aux mêmes requêtes
Replica : Copie d'un index
curl -XPUT "localhost:9200/fr/user/esup-1" -d'
{
"email": "guillaume.colson@univ.fr",
"name": "Guillaume Colson",
"username": "@goyome"
}'
{
"_index": "fr",
"_type": "user",
"_id": "esup-1",
"_version": 1,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}
curl -XPOST "localhost:9200/_bulk" --data-binaries'
{ "create": { "_index": "us", "_type": "user", "_id": "1" }}
{ "email" : "john@smith.com", "name" : "John Smith", "username" : "@john" }
{ "create": { "_index": "gb", "_type": "user", "_id": "2" }}
{ "email" : "mary@jones.com", "name" : "Mary Jones", "username" : "@mary" }
'
{
"took": 391,
"errors": false,
"items": [
{
"create": {
"_index": "us",
"_type": "user",
"_id": "1",
"_version": 1,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201
}
},
{
"create": {
"_index": "gb",
"_type": "user",
...
> fichier json contenant les tweets à importer
curl -XPOST "localhost:9200/_bulk?pretty" --data-binary @./path/to/your/tweets.json
{
"took" : 88,
"errors" : false,
"items" : [ {
"create" : {
"_index" : "gb",
"_type" : "tweet",
"_id" : "3",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"status" : 201
}
}, {
"create" : {
"_index" : "us",
"_type" : "tweet",
...
Les 3 index sont créés et ont des documents
curl -XGET "localhost:9200/fr/user/esup-1"
{
"_index": "fr",
"_type": "user",
"_id": "esup-1",
"_version": 1,
"found": true,
"_source": {
"email": "guillaume.colson@univ.fr",
"name": "Guillaume Colson",
"username": "@goyome"
}
}
curl -XGET "localhost:9200/us/tweet/_search"
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "us",
"_type": "tweet",
"_id": "14",
"_score": 1,
"_source": {
"date": "2014-09-24",
"name": "John Smith",
...
curl -XGET "localhost:9200/_search?q=name:smith"
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 16,
"successful": 16,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0.4451987,
"hits": [
{
"_index": "us",
"_type": "tweet",
"_id": "8",
"_score": 0.4451987,
"_source": {
"date": "2014-09-18",
"name": "John Smith",
"user_id": 1
}
},
...
curl -XGET "localhost:9200/_search" -d'
{
"query": {
"match": {
"name": "smith"
}
}
}'
curl -XGET "localhost:9200/us,gb/tweet/_search" -d'
{
"query": {
"filtered": {
"query": {
"match": {
"tweet": "elasticsearch"
}
},
"filter": {
"range": {
"date": {
"gte": "2014-09-20"
}
}
}
}
}
}'
...
"hits": [
{
"_index": "gb",
"_type": "tweet",
"_id": "13",
"_score": 0.375,
"_source": {
"date": "2014-09-23",
"name": "Mary Jones",
"tweet": "So yes, I am an Elasticsearch fanboy",
"user_id": 2
}
},
{
"_index": "us",
"_type": "tweet",
"_id": "10",
"_score": 0.3125,
"_source": {
"date": "2014-09-20",
"name": "John Smith",
"tweet": "Elasticsearch surely is one of the hottest new NoSQL products",
"user_id": 1
}
},
...
GET /_search?q=2014 # 12 resultats
GET /_search?q=2014-09-15 # 12 resultats !
GET /_search?q=date:2014-09-20 # 1 resultat
GET /_search?q=date:2014 # 0 resultat !
curl -XGET "http://localhost:9200/_search" -d'
{
"query": {
"match": {
"tweet": "elasticsearch is easy"
}
}
}'
...
"hits": [
{
...
"_score": 0.4794072,
"_source": {
...
"tweet": "The Elasticsearch API is really easy to use",
...
},
...
"_score": 0.4082814,
"tweet": "Elasticsearch is built for the cloud, easy to scale",
...
"_score": 0.22818159,
"tweet": "Elasticsearch surely is one of the hottest new NoSQL products",
...
"_score": 0.11272853,
"tweet": "Elasticsearch means full text search has never been so easy",
...
Les 4 termes les plus fréquemment rencontrés dans les tweets
curl -XGET "http://localhost:9200/us,gb/tweet/_search" -d'
{
"size": 0,
"aggs": {
"hot term in tweet": {
"terms": {
"field": "tweet",
"size": 4
}
}
}
}'
...
"aggregations": {
"hot term in tweet": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 72,
"buckets": [
{
"key": "elasticsearch",
"doc_count": 7
},
{
"key": "is",
"doc_count": 5
},
{
"key": "the",
"doc_count": 5
},
{
"key": "i",
"doc_count": 4
}
...
Séparation des tweets en 2 périodes avant et après le 20/09/2014
curl -XGET "http://localhost:9200/us,gb/tweet/_search" -d'
{
"size": 0,
"aggs": {
"Par date": {
"range": {
"field": "date",
"ranges": [
{
"to": "2014-09-20"
},
{
"from": "2014-09-20"
}
]
}
}
}
}'
...
"aggregations": {
"Par date": {
"buckets": [
{
"key": "*-2014-09-20T00:00:00.000Z",
"to": 1411171200000,
"to_as_string": "2014-09-20T00:00:00.000Z",
"doc_count": 7
},
{
"key": "2014-09-20T00:00:00.000Z-*",
"from": 1411171200000,
"from_as_string": "2014-09-20T00:00:00.000Z",
"doc_count": 5
}
...
Merci de votre attention ! Des questions ?
https://flic.kr/p/6KDtm