Around the time GPT-4 was making headlines for acing standardized tests, Microsoft researchers and collaborators were putting other AI models through a different type of test — one designed to make the models fabricate information.
To target this phenomenon, known as “hallucinations,” they created a text-retrieval task that would give most humans a headache and then tracked and improved the models’ responses. The study led to a new way to reduce instances when large language models (LLMs) deviate from the data given to them.