Google IO 2024 Project Astra Real Time Multimodal AI Assistant is The Future Here’s How it Works

Google, at its I/O developer conference, showed off Project Astra, a realtime AI assistant. Its job is to become a universal helper for users. Demis Hassabis, head of Google’s DeepMind, showed how this AI assistant can easily perform complex tasks like finding your glasses. As a multimodal AI, Astra claims to help in complex tasks including visualizing the world, identifying objects, locating lost objects and many more.

During the Google I/O event, Hassabis showed off Astra’s power in a demo video, in which the powerful AI model was answering questions in realtime while scanning the surrounding environment through a camera. With this, questions can be asked by pointing out things and Astra can give information about that object. For example, it is as if you opened the camera and drew a circle or arrow towards any object visible on the screen and asked ‘What is this called?’ and immediately you get a voice from the smartphone in which you describe that object. All necessary information is given about it. This is just an example. Astra can answer many complex and humorous questions in realtime.

Hassabis emphasized during the I/O the importance of AI agents that not only communicate but also execute tasks on behalf of users. Hassabis believes that the future of AI lies not just in fancy technology, but in practical use. He talked about AI agents, who do not just talk but perform many tasks on your behalf. He believes that in the future there will be different types of agents, from simple assistants to more advanced assistants, depending on your needs and your situation.

He says Astra’s development was made possible by improvements to Google’s language model, Gemini 1.5 Pro. Over the past six months, the team worked on making Astra faster and more responsive. This involved not only refining the model but also making sure everything ran smoothly on a larger scale.

Astra is one of several announcements from Gemini at this year’s I/O. Google also highlighted the 1.5 Flash AI model with Gemini 1.5 Pro, which has been developed to perform tasks like summarization and captioning faster. Additionally, Veo has also been announced, which can generate videos from text prompts. There’s also the Gemini Nano, a model designed to be used locally on a device like your phone.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *