Massive-scale industrial search methods optimize for relevance to drive profitable periods that assist customers discover what they’re in search of. To maximise relevance, we leverage two complementary aims: behavioral relevance (outcomes customers are likely to click on or obtain) and textual relevance (a outcome’s semantic match to the question). A persistent problem is the shortage of expert-provided textual relevance labels relative to plentiful behavioral relevance labels. We first deal with this by systematically evaluating LLM configurations, discovering {that a} specialised, fine-tuned mannequin considerably outperforms a a lot bigger pre-trained one in offering extremely related labels. Utilizing this optimum mannequin as a drive multiplier, we generate hundreds of thousands of textual relevance labels to beat the information shortage. We present that augmenting our manufacturing ranker with these textual relevance labels results in a big outward shift of the Pareto frontier: offline NDCG improves for behavioral relevance whereas concurrently growing for textual relevance. These offline features have been validated by a worldwide A/B take a look at on the App Retailer ranker, which demonstrated a statistically vital +0.24% enhance in conversion fee, with essentially the most substantial efficiency features occurring in tail queries, the place the brand new textual relevance labels present a strong sign within the absence of dependable behavioral relevance labels.

