OpenAI vs. Google – the AI cage-match of the month!
Yesterday, OpenAI launched GPT-4o.
Today, Google announced... a lot of things in their Google I/O event.
What did they announce, and what does it mean?
tl;dr
OpenAI went BIG vision in their 30-minute announcement. They reimagined a world where ChatGPT is the must-use assistant that feels seamless and instant.
Google announced several incremental improvements across multiple areas, with their biggest impact on multi-step search queries and the ability to have personalized (chat-based) search of your Google-stored content.
OpenAI GPT-4o
Their focus was on building an exciting vision for the near future. They alluded to a focus on security as the reason for a slow roll-out of features over the next few weeks...
Announcements:
GPT-4o is the first to launch integrated text, voice, and vision into a seamless model, that allows for nearly instant responses regardless of the modality of the requests. It can act as a true assistant, without putting unnecessary effort on the user.
OpenAI has also finally launched its memory feature, which means that users can control what the GPT learns and retains about them for a faster, more personalized experience.
They've made most features available in their free (sign-up not required) version and will cut the price of their API in half (while increasing the number of calls allowed). They are putting a lot of emphasis on the GPT store, and making themselves an invaluable part of the developer and consumer toolkit.
For businesses, the most exciting announcement (if it lives up to the demo), is the speed and accuracy that GPT-4o is supposed to be able to analyze a dataset or set of documents. This means it may finally offer an accurate and accessible way to process qualitative and quantitative data for insights. If it lives up to the demo.
Google Gemini 1.5 Pro & the I/O announcements
They focused on "building AI responsibly" and showcasing the breadth of the workflow they can enhance.
Announcements:
Google search with multi-step question queries meaning that you can ask complex questions (think crowdsourcing at scale), with AI summaries, but also all the robust details of Google search results. They're also planning to double the size of the token window to 2M meaning that Gemini 1.5 pro will be able to process 2-hour long videos, or hundreds of documents, to give real-time responses on those inputs.
An AI assistant to, in theory, rival the multi-modal text, voice, and vision model of GPT-4o. It wasn't announced when it would be live, but was described as "a universal AI agent that's truly helpful in everyday life"
Using your own Google spaces as a personalized, searchable database through Gemini: chat responses that can retrieve your Google photos (AskPhotos), synthesis of multiple email chains (Workspace Labs), creating personalized lectures of content you've uploaded (NotebookLM), etc.
Advancements to their creative tools (video, image, music) -- they're the first big LLM to succeed at music generation. The others all abandoned these products along the way.
What does this mean for businesses?
Internal search tools are for the bots. People have no reason to search for actual information from a business anymore. Plan for this, design for this.
With these advancements in computer vision natively in these LLMs, assistants will finally become easy to use without much effort on the user's part. This means that building assistants into your business workflow will become much easier.
This also means that businesses must start reconsidering what tasks they are willing to allow AI to take on. This was a big point of conversation with Ben Yoskovitz on our May 7th Design of AI podcast. Right now, everything is "human-gated," but consumer demand will likely start asking for AI tools to have more access to 3rd party tools to control more of their mundane tasks.
There is finally a glimmer of hope for the ability to use AI for qualitative and quantitative research analysis, without having to build a proprietary system. Privacy is still a question mark, but we're close to finally seeing meaningful insights from these tools.
This is just based on an initial assessment of these launch announcements. So there will be more ideas and opportunities that come to light as the conversations around what these demos really mean get more in-depth.
What are you most excited about regarding either of these launches?
Brittany Hobbs is a Freelance product insights leader specializing in the adoption of AI at https://ph1.ca and the co-host of the Designof.AI podcast.