Elasticsearch suggester weight

2/29/2024

The backside: deleted stuff does not disappear. The old completion suggester had the deduplication "automatically", because it just worked on the in-memory terms dictionary, but never iterated over documents. For users with a high frequency of coming/going documents thsi is not gonna to work.

The backside: This makes adoption hard, as you have to run this in regular intervals and reindex the suggester index.

To make it work as it should, user would need to create 2nd index, then execute an aggregation on the primary index on a field used for suggestions and migrate the buckets of the terms aggregation as suggestions. For this use case a Document-based suggester as the new one here is not scalable. This is how you would use a suggester like you see it on Google: just present phrases to search for which may return many documents. If you execute the search afterwards you get thousands of documents. The problem with stuff like author names is that they may appear in thousands of documents. This helps users to enter such terms (which are real suggestions, documents are no suggestions, they are already results of search). anything that could also work as a facet), then you are done already. But this makes maintenance hard! If you have structured data and you know that some fields in you documents are useful as auto-suggestion (e.g, names of authors, journal names. Generally you can do the same also with the new suggester, but you must take care of using a separate, deduplicated index and index the suggester phrases from there. The same like Google is doing - and this is what I would need (and the user who opened the issue). This suggester does not suggest documents, it just suggests terms/phrases you could enter into the search field and execute them. The old suggester was just "dictionary based" (a variant of the term dictionary that has payloads and some weights). One example of this is the search engine on Elastic's home page. If users click on those items, they are directly directed to the document. This type of suggester works fine if you index for suggestions is unique, e.g., the document title (that should be almost unique) and suggest those in the drop down. Basically this suggester just executes a query and returns TopDocs (not really Topdocs, because it uses another scoring, but basically it is the same). The new suggester is document based, that is fine if you really want to suggest documents (and also want to filter deleted documents). Let me explain the 2 different types of Suggester "use cases": Your patch on the Lucene issue already mentions it: If you have lots of duplicates, this slows down and the idea behind a suggester is broken (as its slow, possibly horrible slow). But nevertheless a Suggester that is document based is not always the best idea. If yes, what would be the reaction time here.īesides #1 and #2 also, if you have any idea to quickly decrease heap usage in an emergency scenario, please let me know.Thanks for opening the Lucene issue! It looks good to me. In case memory usage goes too high, can we rely on stopping the queries to suggester to bring the memory usage down ? Assuming that ElasticSearch will remove the FST from memory. I know that FST is loaded into memory on first query for completion. In case memory due to completion suggester just occupies a lot of heap, is there any emergency way to turn off completion suggester for the entire index / cluster quickly through some API call ? I know that node-stats give direct os-> mem indication but since we've multiple indices in cluster, its hard to isolate measurements for any single index. The field in stats response that seems closest to overall memory is "segments"-> "memory_in_bytes"īut if I go by that field, 99.39% of RAM is being captured by FST for our index which is shockingly high. However, for overall RAM usage of index, I'm not finding any metric from index-stats. I am trying to compare RAM usage of FST vs overall RAM usage for a given index.įor FST, confirmed that the "completion" -> "size_in_bytes" metric is heap metric in reply to my post here Here are a few questions I had in that regard: However, there is growing concern due to memory usage as our data increases. We're using elasticsearch for our search use-case and have an index that serves both regular queries as well as autocompletion.įor autocompletion, I've enabled completion suggester on it.

0 Comments

Elasticsearch suggester weight

Leave a Reply.

Author

Archives

Categories