Elasticsearch Scripting: Understanding The Difference Between doc And params
Painless scripts allow to customize a lot of things in Elasticsearch. One thing that (almost) every script has in common is the access of document fields. There are two different ways to do so and every developer should know them. Because it can have a huge impact on the performance.
An Example
Let’s start with some sample data.
POST superhero/_doc/1 { "name": "Winter Monk", "race": "Cyborg", "eye_color": "green", "alignment": "good", "strength": 102 } POST superhero/_doc/2 { "name": "Jungle Banana", "race": "Mutant", "eye_color": "red", "alignment": "bad", "strength": 492 } POST superhero/_doc/3 { "name": "Green Flash", "race": "Human", "eye_color": "green", "alignment": "bad", "strength": 98 }
As explained in the last article, we could use scripted sorting to add some custom sort logic.
GET superhero/_search { "sort": [ { "_script": { "type": "number", "script": { "lang": "painless", "source": "params.mapping[doc['alignment.keyword'].value]", "params": { "mapping": { "neutal": 1, "good": 2, "bad": 3 } } } } } ] }
This script maps the alignment to a custom sort order (that differs from the natural order of the alignment strings).
Two different ways to access document attributes
The example above shows one way to access document fields. The keyword doc refers to the document context whose content can be accessed in a dictionary-style.
doc['field_name'].value
This is the recommended way and uses a special data structure called doc_values that is created at index time. Think of it as a mapping between a document and all its terms of every field. It is used for sorting, aggregations and the fast lookup of values from scripts. Elasticsearch loads required entries to RAM. That requires more memory but results in a faster execution. And since search is (in most cases) about query speed, this approach is the one you should go for.
It works only for singe-valued fields, so arrays or more complex objects are not supported. Also, since it depends on loading all field terms into memory, it should be used for non-analyzed fields (keywords, numbers).
The other option is accessing the document source directly.
params['_source']['field_name']
This gives you the full access of the document, even on arrays or nested objects. But there is a pitfall. Elasticsearch has to parse the document source to retrieve the values. That allows also to access all the document fields that were not indexed. And that eats a lot of time. Whenever possible, you should avoid that.
Conclusion
Accessing fields via the source is not an option, except your index is really, really small. If you need to lookup something that is not part of the doc_values, you should rather consider to remodel your index mapping.