Google's Gemini: Is the new artificial intelligence model really better than ChatGPT?
17.12.2023 - 02:01
/ tech.hindustantimes.com
Google Deepmind has recently announced Gemini, its new AI model to compete with OpenAI's ChatGPT. While both models are examples of “generative AI”, which learn to find patterns of input training information to generate new data (pictures, words or other media), ChatGPT is a large language model (LLM) which focuses on producing text.
In the same way that ChatGPT is a web app for conversations that is based on the neural network know as GPT (trained on huge amounts of text), Google has a conversational web app called Bard which was based on a model called LaMDA (trained on dialogue). But Google is now upgrading that based on Gemini.
What distinguishes Gemini from earlier generative AI models such as LaMDA is that it's a “multi-modal model”. This means that it works directly with multiple modes of input and output: as well as supporting text input and output, it supports images, audio and video. Accordingly, a new acronym is emerging: LMM (large multimodal model), not to be confused with LLM.
In September, OpenAI announced a model called GPT-4Vision that can work with images, audio and text as well. However, it is not a fully multimodal model in the way that Gemini promises to be.
For example, while ChatGPT-4, which is powered by GPT-4V, can work with audio inputs and generate speech outputs, OpenAI has confirmed that this is done by converting speech to text on input using another deep learning model called Whisper. ChatGPT-4 also converts text to speech on output using a different model, meaning that GPT-4V itself is working purely with text.
Likewise, ChatGPT-4 can produce images, but it does so by generating text prompts that are passed to a separate deep learning model called Dall-E 2, which converts text descriptions into images.
In contrast, Google designed Gemini to be “natively multimodal”. This means that the core model directly handles a range of input types (audio, images, video and text) and can directly output them too.
The distinction between these two approaches might seem academic, but it's important. The general conclusion from Google's technical report and other qualitative tests to date is that the current publicly available version of Gemini, called Gemini 1.0 Pro, is not generally as good as GPT-4, and is more similar in its capabilities to GPT 3.5.
Google also announced a more powerful version of Gemini, called Gemini 1.0 Ultra, and presented some results showing that it is more powerful than GPT-4. However, it is difficult to assess this, for two reasons. The first reason is that Google has not released Ultra yet, so results cannot be independently validated at present.
The second reason why it's hard to assess Google's claims is that it chose to release a somewhat deceptive demonstration