News

Google updates the progress of Project Astra, its multi-modal AI agent/assistant

During Google I/O 2024, the company shared its progress on Project Astra (advanced seeing and talking responsive agent). Created as part of Google DeepMind’s mission to build AI responsibly to benefit humanity, Project Astra is a real-time, universal assistant powered by multimodal AI that Demis Hassabis, CEO of Google DeepMind said is part of Google’s attempt to “develop universal AI agents that can be helpful in everyday life”. During a two-part video demo, Astra was able to recognise and identify objects through a smartphone’s camera in response to a user’s queries. It was also able to recognise and explain some developer’s code on a computer screen. Impressively, it could also remember objects seen in passing and locate a misplaced pair of glasses for the user based on where the camera swept over the table with the glasses in view. According to Google, Project Astra is currently based on Gemini 1.5 Pro, requiring 1 million tokens to help solve the user’s needs. Long-context capabilities in 1.5 Pro is what makes it possible to take context in a complex session to be efficient in providing information. But the difference between Project Astra and Gemini 1.5 Pro is that Project  Astra is a project that determines how to personalise models into agents that interact with users and fix their needs. Gemini 1.5 Pro is a model of the information library and the core technology of AI. When asked about how Project Astra compared to the company said that Astra is an agent that uses context that users interface with. Google is using Astra to figure out what the problems users face and personalise their needs.