This work evaluates the potential of enormous language fashions (LLMs) to energy digital assistants able to complicated motion execution. These assistants depend on pre-trained programming information to execute multi-step targets by composing objects and capabilities outlined in assistant libraries into motion execution applications. To realize this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM information technology engine. Our engine permits builders to information LLM technology of high-quality duties consisting of complicated person queries, simulation state and corresponding validation applications, tackling information availability and analysis robustness challenges. Alongside the framework we launch Asper-Bench, an analysis dataset of 250 difficult duties generated utilizing ASPERA, which we use to point out that program technology grounded in customized assistant libraries is a major problem to LLMs in comparison with dependency-free code technology.
- * Work achieved whereas at Apple
- † College of Cambridge
- ‡ Meta