Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

Last active May 11, 2023 13:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lmmx/77af3410aba34ac95da5de8d0748441f to your computer and use it in GitHub Desktop.
Save lmmx/77af3410aba34ac95da5de8d0748441f to your computer and use it in GitHub Desktop.
Transcript of using Whisper (large v2)

The talk at Google IO 2023 covered several advancements in AI and new features across Google's products and services. Here's a summary:

  1. Help Me Write in Gmail: Introducing a feature that uses AI to help users draft emails. Users can request assistance in crafting emails, such as asking for a full refund, and Help Me Write generates a draft using prompts and relevant information from previous emails.

  2. Magic Eraser and Magic Editor in Google Photos: AI-powered computational photography tools for photo editing. Magic Eraser removes unwanted distractions, while Magic Editor allows users to make advanced edits like removing objects and adjusting elements in photos.

  3. Palm 2 and MedPalm 2: New models in Google's AI lineup. Palm 2 is a highly capable model for various tasks, and MedPalm 2 is fine-tuned on medical knowledge, performing at an expert level on medical licensing exam-style questions.

  4. Gemini and Bard: Google's next-generation foundation model, Gemini, designed to be multimodal and enable future innovations. Bard, powered by Palm 2, is an AI collaborator for tasks like code generation, debugging, and explaining code snippets, supporting various programming languages.

  5. Bard integrations and extensions: Bard can tap into services from Google and partner extensions to expand its capabilities. Users can prompt Bard with images and access services from partners like Instacart, Indeed, Khan Academy, and Adobe Firefly for image generation.

  6. AI-enhanced Google Sheets and Presentations: Google Sheets uses AI to generate tables based on user input, while Presentations can provide automated suggestions for creating speaker notes based on the content.

  7. Improved Google Search: Search results now include integrated snapshot summaries, AI-powered topic overviews, and conversational search capabilities. Users can explore search results, get more details, and receive personalized results.

  8. AI in Google Cloud: Vertex AI offers generative applications using AI models, including Imagine for image generation, Kodi for code completion, and Chirp for speech-to-text accuracy. Reinforcement learning from human feedback is also introduced.

  9. Project Tailwind: An AI-powered notebook that helps users learn and study efficiently by compiling personalized study guides, glossaries, and different viewpoints based on provided materials.

  10. Image context and video dubbing: About This Image tool in Google Search provides context and reliability information for images. Universal Translate is an AI video dubbing service that matches translated speech with lip movements.

  11. Magic Compose and generative AI wallpapers: Magic Compose, powered by generative AI, enhances messaging in Google Messages. Generative AI wallpapers allow users to create custom wallpapers based on their preferences.

  12. No-code app development with Duet AI and AR experiences: Duet AI enables the creation of apps without coding knowledge using AppSheet integrated with Google Workspace. Google introduces geospatial creator tools for easily designing and publishing AR experiences.

The talk showcased Google's advancements in AI across various products and services, empowering users with improved features and capabilities.

AI is having a very busy year.
We are taking the next step in Gmail with Help Me Write.
Let's say you got this email that your flight was canceled.
The airline has sent a voucher,
but what you really want is a full refund.
You could reply and use Help Me Write.
Just type in the prompt of what you want,
an email to ask for a full refund,
hit Create, and a full draft appears.
It conveniently pulled in flight details
from the previous email.
Maybe you want to refine it further.
In this case, a more elaborate email
might increase the chances of getting the refund.
Help Me Write will start rolling out
as part of our workspace updates.
Another product made better by AI is Google Photos.
Magic Eraser, launched first on Pixel,
uses AI-powered computational photography
to remove unwanted distractions.
Using a combination of semantic understanding
and generative AI, you can do much more
with a new experience called Magic Editor.
Say you're on a hike and you stop to take a photo
in front of a waterfall.
You wish you had taken your bag off for the photo,
so let's go ahead and remove that back strap.
And maybe you want to even get rid of some clouds,
make it feel as sunny as you remember it.
You wish you had posed so it looks like
you're really catching the water in your hand.
No problem, you can adjust that.
We are excited to roll out Magic Editor
in Google Photos later this year.
We are ready to announce our latest farm model
in production, Farm 2.
Highly capable at a wide range of tasks
and easy to deploy.
We are announcing over 25 products and features
powered by Farm 2 today.
Farm 2 can also help developers
collaborating around the world.
Let's say you're working with a colleague in Seoul
and you're debugging code.
You can ask it to fix a bug and help out your teammate
by adding comments in Korean to the code.
It first recognizes the code is recursive,
suggests a fix, and even explains
the reasoning behind the fix.
And as you can see, it added comments in Korean
just like you asked.
It really shines when fine-tuned
on domain-specific knowledge.
Another example is MedPalm 2.
In this case, it's fine-tuned on medical knowledge.
This fine-tuning achieved a 9x reduction
in inaccurate reasoning when compared to the model,
approaching the performance of clinician experts
who answered the same set of questions.
In fact, MedPalm 2 was the first language model
to perform at expert level
on medical licensing exam-style questions.
We are also working to add capabilities to MedPalm 2
so that it can synthesize information from medical imaging
like plane films and mammograms.
You can imagine an AI collaborator
that helps radiologists interpret images
and communicate the result.
These are some examples of Palm 2
being used in specialized domains.
So I am pleased to announce
that it is now available in preview.
Using the computational resources of Google,
they are focused on building more capable systems.
This includes our next-generation foundation model, Gemini,
which is still in training.
Gemini was created from the ground up to be multimodal,
highly efficient at tool and API integrations,
and built to enable future innovations
like memory and planning.
Once fine-tuned and rigorously tested for safety,
Gemini will be available at various sizes and capabilities,
just like Palm 2.
We are a rapidly evolving board.
It now supports a wide range of programming capabilities,
and it's gotten much smarter at reasoning and math problems.
As of today, it is now fully running on Palm 2.
With Palm 2, Bard's math, logic, and reasoning skills
made a huge leap forward.
Bard can now collaborate on tasks like code generation,
debugging, and explaining code snippets.
Bard has already learned more than 20 programming languages,
including C++, Go, JavaScript, Python, Kotlin,
and even Google Sheets functions.
I'm excited to announce that tools are coming to Bard.
As you collaborate with Bard,
you'll be able to tap into services from Google
and extensions with partners
to let you do things never before possible.
We're starting with some of the Google apps
that people love and use every day.
In the next few weeks, Bard will become more visual,
both in its responses and your prompts.
We'll also make it easy for you to prompt Bard with images,
giving you even more ways to explore and create.
You might upload an image and ask Bard
to write a funny caption about these two.
Lens detects that this is a photo
of a goofy German shepherd and a golden retriever,
and then Bard uses that to create some funny caption.
Now, that's a taste of what's possible
when Bard meets some of Google's apps.
Bard will be able to tap into all kinds of services
from across the web,
with extensions from incredible partners like Instacart,
Indeed, Khan Academy, and many more.
With Adobe Firefly,
you'll be able to generate completely new images
from your imagination right in Bard.
Now, let's say I'm planning a birthday party
for my seven-year-old who loves unicorns.
I want a fun image to send out with the invitations.
Make an image of a unicorn and a cake at a kids' party.
Okay, now Bard is working with Firefly
to bring what I imagined to life.
How amazing is that?
So today, we are removing the wait list
and opening up Bard to over 180 countries and territories.
Imagine you run a dog walking business.
Sheets can help you get organized.
In a new sheet, simply type something like
client and pet roster for a dog walking business
with rates and hit Create.
Sheets sends this input to a fine-tuned model
that we've been training
with all sorts of sheet-specific use cases.
The model figured out what you might need.
The generated table has things like the dog's name,
client info, notes, et cetera.
This is a good start for you to tinker with.
Say you're about to give an important presentation
and you've been so focused on the content
that you forgot to prepare speaker notes.
Presentation is in an hour.
No need to panic.
Look at what one of the suggestions is.
Create speaker notes for each slide.
What happened behind the scenes here
is that the presentation and other relevant context
was sent to the model to help create these notes.
And once you've reviewed them,
you can hit Insert and edit the notes
to convey what you intended.
Let's start with a search for what's better
for a family with kids under three and a dog,
Bryce Canyon, or arches.
What you see here looks pretty different.
So let me first give you a quick tour.
You'll notice a new integrated search results page.
So you can get even more out of a single search.
There's an AI-powered snapshot
that quickly gives you the lay of the land on a topic.
And so here you can see
that while both parks are kid-friendly,
only Bryce Canyon has more options for your furry friend.
Then if you wanna dig deeper,
there are links included in the snapshot.
So you can check out more details
and really explore the richness of the topic.
In this case, maybe you wanna ask a follow-up about e-bikes.
So you look for one in your favorite color, red.
And without having to go back to square one,
Google Search understands your full intent
and that you're looking specifically for e-bikes in red
that would be good for a five-mile commute with hills.
And even when you're in this conversational mode,
it's an integrated experience.
So you can simply scroll to see other search results.
Maybe this e-bike seems to be a good fit for your commute.
With just a click,
you're able to see a variety of retailers
that have it in stock
and some that offer free delivery or returns.
You'll also see current prices, including deals,
and can seamlessly go to a merchant site,
check out, and turn your attention to what really matters,
getting ready to ride.
If you're in the US, you can join the waitlist today
by tapping the Labs icon
in the latest version of the Google app or Chrome desktop.
There are three ways Google Cloud can help you
take advantage of the massive opportunity in front of you.
You can build generative applications
using our AI platform, Vertex AI.
With Vertex, you can access foundation models
for chat, text, and image.
You just select the model you want to use,
create prompts to tune the model,
and you can even fine-tune the model's weights
on your own dedicated compute clusters.
In addition to Palm 2,
we're excited to introduce three new models in Vertex,
including Imagine, which powers image generation,
editing, and customization from text inputs.
Kodi for code completion and generation,
which you can train on your own code base
to help you build applications faster
and Chirp, our universal speech model,
which brings speech to text accuracy
for over 300 languages.
We're also introducing reinforcement learning
from human feedback into Vertex AI.
You can fine-tune pre-trained models,
incorporating human feedback
to further improve the model's results.
You can also fine-tune a model
on domain or industry-specific data.
All of these features are now in preview,
and I encourage each and every one of you to try them.
Now, to show you just how powerful the Palm API is,
I wanna share one concept
that five engineers at Google put together
over the last few weeks.
The idea is called Project Tailwind,
and we think of it as an AI-first notebook
that helps you learn fast.
Like a real notebook,
your notes and your sources power Tailwind.
How it works is you can simply pick the files
from Google Drive,
and it effectively creates a personalized
and private AI model
that has expertise in the information that you give it.
Now, imagine that I'm a student
taking a computer science history class.
I'll open up Tailwind,
and I can quickly see in Google Drive
all my different notes and assignments and readings.
I can insert them,
and what'll happen when Tailwind loads up,
as you can see,
my different notes and articles on the side,
here they are in the middle,
and it instantly creates a study guide
on the right to give me bearings.
You can see it's pulling out key concepts and questions
grounded in the materials that I've given it.
Now, I can come over here
and quickly change it to go across all the different sources
and type something like create glossary for Hopper.
And what's gonna happen behind the scenes
is it'll automatically compile a glossary
associated with all the different notes and articles
relating to Grace Hopper,
the computer science history pioneer.
Look at this, Flomatic, COBOL, Compiler,
all created based on my notes.
Now, let's try one more.
I'm gonna try something else
called different viewpoints on Dynabook.
So the Dynabook, this was a concept from Alan Kay.
Again, Tailwind, going out,
finding all the different things.
You can see how quick it comes back.
There it is.
And what's interesting here
is it's helping me think through the concept.
So it's giving me different viewpoints.
It was a visionary product.
It was a missed opportunity.
But my favorite part is it shows its work.
You can see the citations here.
When I hover over, here's something from my class notes.
Here's something from an article the teacher has signed.
It's all right here, grounded in my sources.
In the coming months, we're adding two new ways
for people to evaluate images.
First, with our About This Image tool in Google Search,
you will be able to see important information
such as when and where similar images may have first appeared
where else the image has been seen online,
including news, fact-checking, and social sites.
All this providing you with helpful context
to determine if it's reliable.
Later this year, you'll also be able to use it
if you search for an image or screenshot using Google Lens.
Universal Translate is an experimental
AI video dubbing service
that helps experts translate a speaker's voice
while also matching their lip movements.
What many college students don't realize
is that knowing when to ask for help
and then following through on using helpful resources
is actually a hallmark of becoming a productive adult.
Many college students don't realize
that knowing when to ask for help and then following through on using helpful resources
is actually a hallmark of becoming a productive adult.
We use next-generation translation models
to translate what the speaker is saying,
models to replicate the style and the tone,
and then match the speaker's lip movements.
Then we bring it all together.
Messages and conversations can be so much more expressive,
fun, and playful with Magic Compose.
It's a new feature coming to Google Messages
powered by generative AI.
So just type your message like you normally would
and then choose how you want it to sound.
Magic Compose will do the rest.
So your messages give off more positivity,
more rhymes, more professionalism.
With our new generative AI wallpapers,
you choose what inspires you,
and then we create a beautiful wallpaper to fit your vision.
So let's take a look.
So this time, I'm going to go and select
Create a Wallpaper with AI.
And I like classic art, so let me tap that.
Now, you'll notice at the bottom,
we use structured prompts to make it easier to create.
So for example, I can pick, what am I going to do?
City by the Bay in a post-impressionist style.
Tap Create Wallpaper.
Now, behind the scenes,
we're using Google's text-to-image diffusion models
to generate completely new and original wallpaper.
And I can swipe through
and see all the different options that it's created.
Generative AI wallpapers will be coming this fall.
What if you could leverage the power of Duet AI
to build apps on Workspace
without even knowing how to code?
Let's say I'm asked to create an app
to better manage traffic requests for our team.
I head over to AppSheet,
a no-code platform used to build apps
integrated with Google Workspace.
I describe in natural language
the travel approval app I want to build.
Duet AI walks me through the process step-by-step,
asking a simple set of questions like,
how would I like to be notified?
What are the key sections of my app?
And most importantly, what's the name of the app?
Once the questions are answered,
Duet AI creates the app with travel requests
from my team within Google Workspace.
I love how Duet AI empowers everyone
to get things done quickly and more effectively.
Last year, we launched the ARCore Geospatial API,
and it's enabled many of you
to build location-based immersive experiences.
With ARCore now available on over 1.4 billion devices,
we wanted to go a step further
and share this opportunity with more types of creators
than just the hardcore coders.
Starting today, you can easily design and publish
with our new geospatial creator
powered by ARCore and Google Maps platform.
It's available today in tools
that creators already know and love,
Unity and Adobe Aero and Geospatial Prerelease.
Anyone can now create engaging AR experiences
with just a few clicks.
To give you an idea of what you can achieve with these tools,
we're partnering with Tytho
to launch the Space Invaders World Defense game
later this summer.
It's inspired by the original gameplay,
and it turns your city into a playground.
And now you can head to the Google I.O. website
to find 200 sessions and other learning material
to go deeper into everything you heard today,
all the I.O. announcements.
Go create!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment