Google Gemini, the multimodal AI model, is here; Know its features and use cases
07.12.2023 - 13:35
/ tech.hindustantimes.com
/ Gemini Ai
/ Ai
Google Gemini was unveiled by Alphabet CEO Sundar Pichai and the company's AI research division DeepMind's CEO Demis Hassabis yesterday, December 6. Leaving PaLM-2 behind, it has now become the largest large language model released by the company so far. With its size, it also gains new capabilities. Being a multimodal AI model, its highest variant, Gemini Ultra, is capable of responding with text, images, videos, and audio, pushing the boundaries of what a general-purpose foundation model can do. So, if you have been wondering about the features and use cases of Gemini AI, then check them below.
After the announcement of its new AI model, Google posted a YouTube video where it showcased the capabilities of Google Gemini. The video mentions, “We've been capturing footage to test it on a wide range of challenges, showing it a series of images, and asking it to reason about what it sees”. The entire video highlights some of the more advanced features and use cases of Gemini.
Throughout the video, Gemini has been given access to a camera and it can see whatever the user is doing. The video puts the AI model through several tests, where it has to analyze whatever is going on in the visual medium.
In the first segment, the user draws on a piece of paper and asks Gemini to guess what it sees. The AI model keeps guessing the image as the user continues to add more complexities to it. At each step, Gemini is capable of offering a reasonable analysis of the drawing and providing additional information about the object. It also recognized objects and offered information about what they might be made up of.
In the second segment, the user asks the AI to tell him how to pronounce a word in a different language. Not only does the AI show the response in text format, but it also offers an audio response to help the user pick up the dialect. It also helped him with the pronunciation.
In the third segment, the user puts a world map and a rubber duck on the table and asks the AI to create a fun game based on it and to use emojis for the game. Gemini obliges and creates a country guessing game where the user will have to guess the name of the country based on three emojis.
In the next segment, the AI is put to the test and is asked to solve some puzzle presented to it in the real world. The video shows it to be capable enough to easily follow the puzzles in real time and solve them.
In the next segment, the user keeps two random objects on the table and asks Gemini what it sees. Based on the visual context, the AI is able to make a connection between the two objects and categorize them. The user keeps switching out objects, but each time it is able to find a correct category to group the items together.
Next, the user keeps