Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save matijagrcic/35a2082672ea4a9c65b564814ec3a218 to your computer and use it in GitHub Desktop.
Save matijagrcic/35a2082672ea4a9c65b564814ec3a218 to your computer and use it in GitHub Desktop.
AWS re:Invent 2022 - Keynote with Swami Sivasubramanian
[MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] [MUSIC PLAYING] Please welcome the Vice President of Data and Machine Learning at AWS, Dr. Swami Siva Supramanian. [MUSIC PLAYING] Welcome to Day Three of Rain Man, everyone. You know, this past summer, my seven-year-old daughter, who wants to grow up to be an inventor and a scientist among 20 other things, asked me a question, Dad, how do scientists come up with these amazing new inventions? How do they come up with new ideas? To answer your questions, I didn't want to just make up an answer within like 10 seconds. I actually said, now, maybe let's watch a few documentaries behind some of the greatest inventions that changed humankind. And here I am, several months into this exploration, still very fascinated by how great inventions are born. We like to think that the genesis for every great idea happens with a spark, at random thought, or a light bulb moment, or a simply stroke of genius. And as history would dictate, it always happens to seem so suddenly, with a flash of realization. Ancient mathematician Archimedes uncovered the physical law of buoyancy in his bathtub. Isaac Newton developed his theory of gravitation after observing an apple fall from a tree. First, he Spencer discovered the earliest microwave oven when his candy bar accidentally melted in his pocket when he was standing in an MIT lab. And he was standing next to an active magnetron. These are the vacuum tubes used in early radar systems. But is that really how these light bulb moments work? Are they really as instantaneous as we have been led to believe? These aha moments are actually preceded by ingesting hundreds, if not thousands, of information that our minds assimilate over time. Let's revisit the microwave oven example. Percy Spencer had more than 20 years of experience working with magnetrons, leading up to that moment in the MIT lab. And before that, he was an expert in radio technology while working for the US Navy. In fact, it actually took Spencer more than 30 years to arrive at his microwave epiphany. He just had to connect the dots, just like we do with our data. Researchers Dr. Mark Beeman and Dr. John Cunius wanted to explore this phenomenon even further. They measured participants' brain activity through a specific set of tasks and found that creativity follows a real scientific process in which our brains produce these big ideas. That research demonstrates that human beings concentrate, analyze, and find correlations in different lobes of our brain when we are making sense of new information, and be even processed when we sleep. And this all happens before this creative spark occurs in this lobe right above the right ear. To put it simply, they prove that insights can occur when our observations are paired with the power of analytical processing. Really cool, right? I am fascinated by this research for several reasons. If you look closely, the human mind shows us how we can harness the power of data to drive creativity. The same process can apply to organizations. However, applying the neuroscience of creativity to a modern day organization is not always perfect. Our environments in organizations present many specific challenges and several important caveats. Within a business, we refer to the knowledge or information we acquire as data points. But unlike the human brain, there isn't one centralized repository to collect all our data, which often means it leads to data silos and inconsistencies across an organization. It takes a considerable amount of time and effort to clean your data and store it in accessible locations. Unlike the human brain, data isn't automatically processed when we sleep. You all wish it did. We have to build automation into our data infrastructure to avoid manual replications and costly updates after working ours. Data doesn't naturally flow within our organization, like the neural pathways in our brain. We had to build complex pipelines to move data to the right place and set up mechanisms for the right individuals to get access to the data, when and where they need it. And finally, data isn't always easy to analyze or visualize, which can make it really difficult for you to identify critical correlations that spark these new ideas. If you step back, you need all of these elements to come together in order for these sparks, your new products or new customer experiences to come to life. So while the theory of neuroscience can be applied to the principles of data science, we must acknowledge that the processes required to maximize the value of our data are far from innate. I strongly believe data is the genesis for modern invention. To produce new ideas with our data, we need to build a dynamic data strategy that leads to new customer experiences as its final output. And it is absolutely critical that today's organizations have the right structures and technology in place that allows new ideas to form and flourish. While building a data strategy can really feel like a daunting task, you are not alone. We have been in the data business long before even haired of this came into existence. In fact, Amazon's early leaders often repeated the phase that data beats intuition. We built our business on data. They have enabled data-driven decision-making with Beblad, our internal AV testing suite, to produce the earliest book recommendation on Amazon.com. Since then, we have used data to develop countless products and services, including two-day shipping to local grocery delivery, and many more. We also use data to anticipate our customers' expanding storage needs, which paid the way for the development of AWS. And for more than 15 years, we have solved some of the most complex data problems in the world with our innovations in storage, databases, analytics, and AI and ML. We delivered the first scalable storage in the cloud with S3. The first purposeful database in the cloud with DynamoDB. The first fully managed cloud data warehouse with Redshift, and many more. Since introducing several of these first, we have continued to launch new features and services that make it easy to create, store, and act on data. And we have seen recognition for many of our services. This year, AWS received a 95 out of 100 score in the Godless Solutions scorecard for Amazon ideas, including Amazon Aurora. These type of achievements are why more than 1 and 1/2 million customers have come to AWS for their data needs. We have worked with some of the biggest brands in the world, like Toyota, Coca-Cola, and Capital One, to build comprehensive end-to-end data strategy. And our customers are using these strategies to transform their data into actionable insights for their businesses every day. For example, organizations like Bristol Meyer Square use AWS data services to advance the application of single-cell technologies in drug development and clinical diagnosis. Nielsen built a data lake capable of storing 30 petabytes of data, expanding their ability to process customer insights from 40,000 households to 30 million households on a daily basis. And in the race to launch autonomous vehicles, Hyundai leverages AWS to monitor, trace, and analyze the performance of their machine learning models, achieving a 10x reduction in their model training time using Amazon SageMaker. By working with leaders across all industries and of all sizes, we are discovered at least three core elements of a strong data strategy. First, you need a future-proof data foundation supported by core data services. Second, you need solutions that be a connective tissue across your entire organization. And third, you need the right tools and education to help you democratize your data. Now, let's dig in starting with the future-proof data foundation. In the technology industry, we often hear the phrase future-proof turn around a lot to market all sorts of products and technologies. But my definition of future-proof foundation is clear. It means using the right services to build a foundation that you don't need to be heavily re-architecting or incur technical debt as your needs evolve under volume and types of data changes. Without a data strategy that is built for tomorrow, organizations won't be able to make decisions that are key to gaining a competitive edge. To that end, a future-proof data foundation should have four key elements. It should have access to the right tools for all workloads and any type of data so you can adapt to changing needs and opportunities. It should be able to keep up with the growing volume of data by performing at really high scale. It should remove the undifferentiated heavy lifting for your IT and data teams so you can spend less time managing and preparing your data and more time getting value from it. And finally, it should have the highest level of reliability and security to protect your data stores. For the first element of future-proof data foundation, you will need the right tools for every workload. We believe that every customer should have access to a wide variety of tools based on data types, personas, and use cases, as they grow and change. A one-size-fits-all approach simply does not work in the long run. In fact, our data supports this. 94% of our top 1,000 AWS customers use more than 10 of our databases and analytics services. That's why we support your data journey with the most comprehensive set of data services out of any cloud provider. We support data workloads for your application with the most complete set of relational databases, like Aurora, and any purpose-built databases, like DynamoDB. We offer the most comprehensive set of services for your analytics workloads, like SQL Analytics with Redshift, Big Data Analytics with EMR, Business Intelligence with QuickSight, and Interactive Log Analytics with OpenSearch. We also provide a broad set of capabilities for your machine learning workloads, with deep learning frameworks like PyTorch and TensorFlow running on optimized instances, and services like Amazon SageMaker that makes it really easy for you to build, train, and deploy ML models end-to-end, and AI services with built-in machine learning capabilities with services like Amazon Transcribe and Amazon Textract. All of these services together come to form your end-to-end data strategy, which enables you to store and query your data for your databases, data lakes, and data viruses, act on your data with analytics, BI, and machine learning, and catalog and govern your data with services that provide you with centralized access controls, with services like Lake Formation and Amazon Data Zone, which I will dive into later on. By providing a comprehensive set of data services, we can meet our customers where they are in their journey. From the places they store their data to the tools and programming languages, they used to get the job done. For example, take a look at Amazon Athena, our serverless interactive query service, which was designed with a standard SQL interface. We made Athena really easy to use. Simply point to your data in S3, define your schema, and start querying to receive insights just within seconds. Athena's SQL interface and ease of use is why it's so popular among data engineers, data scientists, and many other developers. In fact, tens of thousands of AWS customers use Amazon Athena today. While we have made it really easy to leverage SQL on Athena, many of our customers are increasingly using open source frameworks like Apache Spark. Apache Spark is one of the most popular open source frameworks for complex data processing. Like regression testing or time series forecasting. Our customers regularly use Spark to build distributed applications with expressive languages like Python. However, to build interactive applications using Spark, our Athena customers told us that they want to perform this kind of complex data analysis using Apache Spark, but they do not want to deal with all this infrastructure setup and keeping up all these clusters for interactive analytics. They wanted the same ease of use we gave them with SQL on Athena. That's why today I am thrilled to announce Amazon Athena for Apache Spark, which allows you to start running interactive analytics on Apache Spark in just under one second. Amazon Athena for Apache Spark enables you to spin up Spark workloads up to 75 times faster than other serverless Spark offerings. You can also build Spark applications with a simplified notebook interface in the Athena console or using Athena APIs. Athena is deeply integrated with other AWS services like SageMaker and EMR enabling you to query your data from various sources. And you can change these calculations together and visualize your results. And with Athena, there is no infrastructure to manage. And you want to pay for what you use. We are thrilled to bring Apache Spark to our Athena customers. But we are not stopping there. Just yesterday, we announced Amazon Redshift integration for Apache Spark, which makes it easier for running Spark applications on Redshift data from other AWS analytics services. This integration enables EMR applications to access Redshift data to run out to 10x faster compared to existing Redshift Spark connectors. And with a fully certified Redshift connector, you can quickly run analytics and ML without compromising on security. With these new capabilities, AWS is the best place to run Apache Spark in the cloud. Customers can run Apache Spark on EMR, Glue, SageMaker, Redshift, and Athena with our optimized Spark runtime, which is up to 3x faster than open source Spark. We are pleased to bring these new integrations to our customers. So we have discussed how critical it is to have a variety of tools at your fingertips when you need them. But these tools should also include high performing services that enable you to grow your businesses without any constraints. That brings me to the second element of our future-proof data foundation, performance at scale. Your data foundation should perform at scale across your data viruses, databases, and data lakes. You will need industry-leading performance to handle inevitable growth spurts in your business. You will need it when you want to quickly analyze and visualize your data. You will need to manage your costs without compromising on your capacity requirements. Our innovations have helped our customers at scale right from day one. And today, Amazon Aurora auto scales of 228 terabyte per instance at 1/10 the cost of other legacy enterprise databases. DynamoDB processed more than 100 million requests again across trillions of API calls on Amazon Prime Day this year. With Amazon Redshift, tens of thousands of customers collectively processed exabytes of data every day with up to five times better price performance than other cloud data viruses. Redshift also denivists up to seven times better price performance on high concurrency, low latency workloads like dashboarding. UndocumentedDB are fully managed document database service that can automatically scale up to 64 TB bytes of data per cluster with no latency that serves millions of requests per second. Tens of thousands of AWS customers, including Venmo, Liberty Mutual, and United Airlines rely on DocumentDB to run their JSON document workloads at scale. However, as for DocumentDB customers experience growth, they have asked us for easier ways to manage scale without having performance impacts. For example, they said it's really difficult to handle throughput beyond the capacity of a single database node. So they told the scaling out or sharding their data sets across multiple database instances is really, really complex. You've got to actually build special application logic for sharding. You've got to manage the capacity. And you've got to reshard your database live without having any performance impact. In such a distributed setting, even routine tasks can become increasingly cumbersome as the application scales across hundreds of instances. They also wanted the ability to auto scale to petabytes of storage. And the only alternative options that exist either they scale slowly or they are really expensive. So they asked us for an easy button to scale reads and writes. That's why I'm pleased to announce the general availability of Amazon DocumentDB elastic clusters, a fully managed contribution for document workloads of virtually any size and scale. Elastic clusters automatically scale to handle virtually any number of reads and writes with petabytes of storage in just minutes, but little to no downtime or performance impact. You don't have to worry about creating, removing, upgrading, or managing or scaling your instances. Elastic clusters takes care of all these underlying infrastructure. This solution will save developer months of time for building and configuring all these custom scaling solutions. I am proud to share this new capability with you today. But it's just one example of how we are helping you scale. In the leadership session today, Jeff Carter, our VP of database services and migration services, explain how fully managed databases can help you build faster and scale further than ever before. So when our organizations are backed by high performing services, they can deliver better experiences than ever before. And we are helping our customers perform at scale across a variety of AWS data services. Netflix provides a quality customer experience using S3 and VPC flow logs to ingest terabytes of data per day, enabling them to respond to events in real time across billions of traffic flows. Phillips uses Amazon SageMaker to apply machine learning to over 48 petabytes of data, making it easy for clinicians using its digital health suite platform to identify utterous patience. And with AWS data services, Expedia is able to deliver more scalable products and online experiences to their travelers. But I won't steal their thunder. Let's welcome Rati Murthy, Expedia Group CTO and president of Expedia Product and Technology. [APPLAUSE] Good morning. I'm super excited to speak to an audience that actually understands the power of data. When Expedia started almost 25 years ago, it disrupted the travel space. Online travel was truly a groundbreaking innovation. Today, we connect over 168 million loyalty members, over 50,000 B2B partners with over 3 million properties, 500 airlines, car rentals, and cruise lines. Expedia is one of the world's largest online travel companies powering travel in over 70 countries. But at our core, we are a technology company. We have gathered decades worth of data on travel behaviors, booking patterns, traveler preferences, and partner means. When I joined Expedia Group last year, I was super excited to work for a company that brought together my passion to lead technology, my love for travel, with customer centricity at its core. It felt like a perfect marriage between technology, travel, and transformation. Today, we are mastering the art of transformation on two fronts. One, transforming our own company, and two, transforming the travel industry. Like many companies in the room here today, Expedia Group scaled through acquisitions. And as technologists, we all know these means multiple stacks, added complexity. And as you bring in more partners, you need to reconfigure, which can be costly and time-consuming. And like AWS, we are also a customer first company. We understand the power of data. And data is key to drive our innovation and our long-term success. And we've continued to invest in our AIML to drive those great experiences across our platform. Just to give you an idea of our scale, today we process over 600 billion AI predictions per year, powered by over 70 petabytes of data. We also use AIML to run over 360,000 permutations of one page on one of our brands, which means that every time a traveler comes to our site, they see what is most relevant to them. And to help us with this massive transformation, we've been working with AWS on a few fronts. One, helping us modernize our infrastructure to stay highly available for our travelers by helping us migrate our applications to a container-based solution by leveraging Amazon EKS and Carpenter. Two, to help us render relevant photos and reviews for our travelers, it at sub-millisecond latency with over 99% accuracy by leveraging Amazon DB and SageMaker. And last but not the least, also helping us self-serve our travelers by hosting our conversation platform, which is powered over 29 million virtual conversations, saving us over 8 million agent hours. But before I continue, let's just take a moment and think about a truly great holiday. What made it great was the places you visited, the people you were with, the things you saw. When my children were seven and five years old, we decided as a family that we would visit a new country every year, and this is a picture from one of our trips to Paris. It doesn't look like it, but yes, it was a family trip. What touched me most was when I read on the college essays that they learned more from these trips about the world and the culture and life than any textbook had taught them thus far. Travel is so much more than just a transaction. Some of our best memories are from a trip away. And the best way to broaden our understanding of the world is to actually go out there and experience it. This is the reason I love being a technologist in travel, where we can innovate products that bring joy to so many people all over the world. Data is our competitive advantage, and we want to leverage the immense amount of data we've hosted on AWS to innovate products and create those memories. Now, knowing when to book a flight truly seems like dark art. Earlier this year, we launched price tracking and predictions. This uses machine learning and our flight shopping data to map past trends and future predictions for the prizes for your flight route, so you understand the best time to book your flight with confidence. Equally, comparing hotel rooms is also super complex. With our smart shopping, we have the ability now to compare different hotels easily. We leverage AI to read through millions of rate descriptors that pull out attributes like room features, upgrades, amenities all together on one page so you can easily compare different hotel types side to site and make your right choices. Every time a traveler interacts with us, we collect more data, our models become smarter, and our responses become more personalized. 2022 was a transformative year for us at Expedia Group. Earlier this year, we launched our open world vision to power partners of all sizes with the technology and supply needed to thrive in the travel market. The first in the travel sector. At its core, it's truly rebuilding our platform in an open way, in a way taking all of our big capabilities, breaking it up into microservices or small building blocks that are configurable, extensible, and externalizable so that we can accelerate anyone in the travel business or even help someone enter the travel market. So if you're an airline wanting to expand its offerings with hotels, or if you're an influencer, wanting to make it easy for your followers to book that same amazing trip, we can provide you with the building blocks to create everything from the basic payment portal to the complete travel store. So just as we opened the world to travel 25 years ago, we are now making travel as a business more open and accessible to all. Thank you. [APPLAUSE] [MUSIC PLAYING] Thank you, Rethi. So as you saw with the Expedia story, when customers are backed by tools that enable them to perform at scale, they can analyze that data and innovate a lot faster and all with less manual effort. This brings me to the third element of a future-proof data foundation, removing heavy lifting. You're always looking for ways to tackle our customers' pain points by reducing manual tasks through automation and machine learning. For instance, DevOps Guru uses machine learning to automatically detect and remediate database issues before they even impact customers, while also saving database administrators time and effort to debug their issues. Amazon S3 intelligent tiering reduces ongoing maintenance by automatically placing infrequently access data into lower cost storage classes by saving users up to $750 million today. And with Amazon SageMaker, we are removing the heavy lifting associated with machine learning so that it's accessible to many more developers. Now, let's take a closer look at Amazon SageMaker. As I mentioned earlier, SageMaker enables customers to build, train, and deploy ML models for virtually any use case, and with tools for every step of your machine learning development. Tens of thousands of customers are using SageMaker ML models to make more than a trillion predictions every month. For example, Dow Jones and Company created an ML model to predict the best time of day to reach their customers of Wall Street Journal, Barons, and market-wide subscribers, improving their engagement rate by up to two x their previous strategies. Many of our customers are solving complex problems with SageMaker by using the data to build ML models right from optimizing driving routes or in right-share apps to accelerating drug discovery. Most of these models are built with structured data, which is really well organized and quantitative. However, according to Gartner, 80% of all new enterprise data is now unstructured or semi-structured, including things like images and handwritten notes, preparing and labeling unstructured data for ML is really, really complex and labor intensive. For this type of data, we provide features like SageMaker Ground Truth and Ground Truth Plus that helps you lower your costs and make data labeling a lot easier. However, customers told us that certain types of data are still too difficult to work with, such as your geospatial data. Geospatial data can be used for a wide variety of use cases, right from maximizing harvest yield in agricultural farms to sustainable urban development, to identifying a new location for opening a retail store. However, accessing high-quality geospatial data to train your ML models requires working with multiple data sources and multiple vendors. And these data sets are typically massive and unstructured, which means time-consuming data preparation before you can even start writing a single line of code to build your ML models. And tools for analyzing and visualizing data are really limited, making it harder to uncover relationships within your data. So not only is this such a complicated process, but it requires such a steep learning curve for your data scientists. So today, we are making it easier for customers to unlock the value of the geospatial data. I'm really excited to announce that Amazon SageMaker now supports new geospatial ML capabilities. [APPLAUSE] With these capabilities, customers can access geospatial data on SageMaker from different data sources with just a few clicks. To help you prepare your data, our purpose-built operations enable you to efficiently process and enrich these large data sets. It also comes with built-in visualization tools, enabling you to analyze your data and explore model predictions on an interactive map using 3D accelerated graphics. Finally, SageMaker also provides built-in pre-trained neural nets to accelerate model building for many common use cases. Now, let's see how it works. Please welcome Kumar Chalapala, our GM for ML and AI services at AWS, who will demonstrate these new capabilities in action. [MUSIC PLAYING] Thanks, Swami. Imagine a world where when natural disasters such as floods, tornadoes, and wildfires happen, we can mitigate the damage in real time. With the latest advances in machine learning and readily available satellite imagery, we can now achieve that. Today, we have the ability to not only forecast natural disasters, but also manage our response using geospatial data to make life-saving decisions. In this demo, I'm going to take on the role of a data scientist who is helping first responders with relief efforts as a flood occurs. Using the geospatial capabilities in Amazon SageMaker, I can predict dangerous road conditions caused by rising water levels so that I can guide first responders on the optimal path as the delivery aid, send emergency supplies, and evacuate people. In such a scenario, I want to move as quickly as I can, because every minute counts. I want to get people to safety. Without SageMaker, I can take a few days to get access to data about real-world conditions and even more time to make predictions, because the data is scattered and difficult to visualize. And there's no efficient way to train and deploy models. Now, let me dive into the demo and show how to access geospatial data, build and train a model, and make predictions using the new geospatial capabilities in Amazon SageMaker. To build my model, I need to access geospatial data, which is now readily available in SageMaker. In suspending time gathering data from disparate sources and vendors, I simply select the data I need. In this case, I select open-source satellite imagery from Sentinel-2 for the affected area. In order to understand where the water spread, I apply land classification, a built-in SageMaker model, which classifies the land as having water or not. Looking at images before and after the flood occurred, I can clearly see how the water spread across the entire region and where it costs the most severe damage. Knowing where flood bars are spreading is super helpful. But I still need to zoom in to see which roads are still there and help first responders navigate safely. Next, I add high-resolution satellite imagery from Planet Labs, one of the third-party data providers in SageMaker. These visualizations allow me to overlay the roads on the map so I can easily identify which roads are underwater and keep first responders up to date as conditions unfold on the ground. Now that I understand my data, I start making predictions. With SageMaker, I don't have to spend weeks iterating on the best model for my data. I simply select one of the pre-trained models in SageMaker, in this case road extraction, which makes it easy for me to train the model on my data and send directions to the first aid team. Once the model is ready, I can start making predictions. In this case, the model I built identifies which roads are still intact and not underwater. Using the visualization tools in SageMaker, I can view the predictions in an interactive map so that I have full visibility on what's happening on the ground. I can see that the red colored roads are flooded, but the green colored roads are still available and safe to drive on. Similar to satellite imagery from Planet Labs, I can add point-of-interest data from four square to see where the nearest hospitals, medical facilities, and airports are. For example, I can see that the airfield on the left is surrounded by water, so I must use the temporary helipad or the international airport on the right instead. With this information in hand, I can now give clear directions within minutes so that they know the best path for sending emergency aid, directing medical staff, and routing people out of the flood zone. We covered flood path predictions, but SageMaker can support many different industries. In fact, later today, during the AIML leadership session with Brat and Saha, you will hear how BMW uses geospatial machine learning. As Swami mentioned, it's not just automotive. Customers use geospatial machine learning for a variety of use cases in retail, agriculture, and urban planning, and the list goes on. We can't wait to hear what you will do with geospatial data. Head over to the console today and try the new geospatial capabilities in Amazon SageMaker. Thank you. (upbeat music) Thank you, Kumar. These types of innovations demonstrate the enormous impact that data can have for our customers and for the world. It's clear that data is extremely powerful, and today it is critical to almost every aspect of your organization, which means you need to put the right safeguards in place to protect it from costly disruptions and potential compromises. This brings me to the last element of the Future Proof Data Foundation, reliability and security. AWS has a long history of building secure and reliable services to help you protect your data. S3 was built to store your data with 11/9ths of durability, which means you can store your data without worrying about backups or device failures. Lake formation helps you build a secure data lake in just days with fine-grained access control. And our core database services like TynomerDB, Aurora, had an audience where architected with multi-AC capabilities to ensure seamless failovers in the unlikely event and AC is disrupted, thereby protecting our customers' mission critical applications. But today, our customers' analytics applications on Redshift are mission critical as well. While our Redshift customers have recovery capabilities like automated backups and the ability to relocate their cluster to another AC in just minutes, they told us that sometimes minutes are simply not enough. Our customers told us they want their analytics applications to have the same level of reliability that they have with their databases like Aurora and Tynomer. I'm honored to introduce Amazon Redshift multi-AC, a new multi-AC configuration that delivers the highest levels of reliability. (audience applauding) This new multi-AC configuration enances availability for your analytics applications with automated failover in the unlikely event and AC is disrupted. Redshift multi-AC enables your data warehouse to operate on multiple AC simultaneously and process reads and writes without the need for an underutilized standby sitting idle in a separate AC, thereby you can maximize on your return on investment and no application changes or other manual intervention required to maintain business continuity. But high availability is just one aspect of a secure and reliable data foundation. We are making ongoing investments to protect your data from the core to the perimeter. While these security mechanisms are critical, we also believe they should not slow you down. For example, let's take a look at the security as it relates to Postgres. Postgres on RDS and Aurora has become our fastest growing engine. Developers love Postgres extensions because it enhances the functionality of their databases. And with thousands of them available, our customers told us they want them in a managed database. However, extensions provide super user access to your underlying file systems, which means they commit a huge amount of organizational risk. That's why they must be tested and certified to ensure they do not interfere with the integrity of your database. This model is like imagine you're building an impenetrable fortress, one need to leave the keys on the front door. To solve this problem for our customers, we have invested in an open source project that makes it easier to use certified Postgres extension in our databases. Today, I'm excited to announce trusted language extensions for Postgres, a new open source project that allows developers to securely leverage Postgres extensions on RDS and Aurora. (audience applauding) These trusted language extensions help you safely leverage Postgres extensions to add the data functionality you require for your use cases without waiting for AWS certification. They also support popular programming languages you know and love like JavaScript, Perl, PLPG SQL. With this project, our customers can start innovating quickly without worrying about unintended security impacts to their core databases. We will continue to bring value to our customers with these types of open source tools, while also making ongoing contributions back to the open source community. So now that we are talked about protecting your data at the core, let's look at how we are helping your customers protect the data at the perimeter. When you leverage your database services on AWS, you can rely on us to operate, manage and control the security of the cloud like the hardware, software and networking layers. With our shared responsibility model, our customers are responsible for managing the security of their data in the cloud, including privacy controls for your data, who has access to it, how it's encrypted. While this model uses a significant portion of the security burden for our customers, it can still be very difficult to monitor and protect against these evolving security threats to your data year round. To make this easier for our customers, we offer services like Amazon GuardDuty, an intelligent threat detection service that uses machine learning to monitor your AWS accounts for various malicious activity. And now we are extending the same threat detection service to our fastest growing database. Bill for Amazon Aurora, I'm very excited to announce the preview of GuardDuty RDS protection. Which provides intelligent threat detection in just one click. GuardDuty RDS protection leverages ML to identify potential threats like access attacks for your data stored in Amazon Aurora. It also delivers detailed security funding so you can quickly locate where the event occurred and what type of activity took place. And all this information is consolidated at an enterprise level for you. Now that we have explored the elements of data proof foundation, future proof data foundation, we will dive deep into how you can connect the dots across your data stores. The ability to connect your data is as instrumental as the foundation that supports. For the second element of a strong data strategy, you will need a set of solutions that help you be the connective tissue across your organization from automated data pathways to data governance tools. Not only should this connective tissue integrate your data, but it should also integrate your organization's departments, teams and individuals. To explain the importance of this connective tissue, I wanted to share an analogy that is really close to my home. This is a picture of Jinki and Jirin in the northeastern part of India. It is a living bridge made of elastic tree roots in the state of Meghalaya. These bridges are built by the Kasi tribe, indigenous farmers and hunters who trek through dense valleys and river systems just to reach nearby towns. Every year, the monsoon season means the forest rivers become almost impossible, further isolating their villages that are sitting on top of the foothills of Himaleas. That is until these living bridges came to be. So you might be asking yourself, why is Swami talking about these ancient root bridges when He is supposed to be talking about my data? Well, I wanted to share this story because we can apply many valuable engineering lessons from the Kasi on how we can build connective tissue with our data stores. First, they use quality tools that enable growth over time. The Kasi built the structures with durable root systems that were able to withstand some of the heaviest rainfall in the world. And these bridges can last up to 500 years by attacking and growing within their environment. Similarly, your connective tissue needs both quality tools and quality data to fuel long-term growth. Second, they leverage the governed system of cooperation. Over a period of decades and sometimes even centuries, tribal members cooperated and shared the duty of pulling these elastic roots one by one until a possible bridge was found. With data, governments enabled safe passage for disconnected teams and disconnected data stores. So your organizations can collaborate and act on your data. And finally, they created strong pathways to their vital resources. These bridges protected the region's agricultural livelihood by providing a pathway from remote villages to nearby towns. The Kasi were engineers of connection because their success depended on it. Today, one of the most valuable assets in our organization is connected data stores, connectivity which drives ongoing innovation is also critical for our survival in the organization. Now, let's revisit the importance of using high quality tools and high quality data to enable future growth. When our customers want to connect their structured and unstructured data for analytics and machine learning, they typically use a data lake. Hundreds of thousands of data lake run on AWS today, leveraging services like S3, Lake Formation, and AWS Blue, our data integration service. Bringing all this data together can help you gather really rich insights, but only if you have quality data. Without it, your data lake can quickly become a data swamp. To closely monitor the quality of your data, you need to set up quality rules. And customers told us building these data quality rules across data lakes and the data pipelines is very, very time consuming and very error prone with a lot of trials and errors. It takes days, if not weeks, for engineers to identify and implement them. Plus, additional time needs to be invested for ongoing maintenance. They ask for a simple and automated way to manage their data quality. To help our customers do this, I'm pleased to share the preview of AWS Blue Data Quality, a new feature of AWS Blue. Blue Data Quality helps you build confidence in your data so that you can make data-driven decisions every day. Interest can generate automated rules for specific data sets in just hours, not days, increasing the freshness and accuracy of your data. Rules can also be applied to your data pipelines, so poor quality data does not even make it to your data lakes in the first place. And if your data quality deteriorates for any reason, Blue Data Quality alerts you so you can take action right away. Now, with high quality data, you'll be able to connect the dots with precision and accuracy. But you also need to ensure that the right individuals within your organization are able to access this data so you can collaborate and make these connections happen. This brings me to the second lesson we learned from the COSY, creating a system of governance to unleash innovation within your organization. Governance was historically viewed as a defensive measure, which meant really locking down your data silos. But in reality, the right governance strategy helps you move and innovate faster with well-defined guardrails that give the right people access to the data when and where they need it. As the amount of data rapidly expands, our customers want an end-to-end strategy that enables them to govern the data across their entire data journey. They also want to make it easier to collaborate and share their data, while maintaining quality and security. But creating the right governance controls can be complex and time-consuming. That's why we are reducing the amount of manual efforts required to properly govern all of your data stores. As I mentioned earlier, one of the ways we do this today is through Lake Formation, which helps you govern and audit your data lakes on S3. Last year, we announced new row and cell-level permissions that help you protect your data while giving users access to the data they need to perform their job. But end-to-end governance doesn't just stop with data lakes. We also need to address data access and privileges across more of customers' use cases, figuring out which data consumers in your organization have access to what data can itself be time-consuming. From manually investigating data clusters to see who has access to designating user roles with custom code, there is really simply too much heavy lifting involved. And failure to create these types of safety mechanisms can mean unnecessary exposure or quality issues. Our customers told us they want an easier way to govern access and privileges with more of our data services, including Amazon Redshift. So today, I'm pleased to introduce a new feature in Redshift data sharing, centralized access controls that allow you to govern your Redshift data shares using Lake Formation Console. (audience applauding) With this new feature in Redshift data sharing, you can easily manage access for data consumers across your entire organization from one centralized console. Using the Lake Formation Console, you can designate user access without complex querying or manually identifying who has access to what specific data. This feature also improves the security of data by enabling admins to granular role level and cell level access within Lake Formation. Now centralized access controls are critical to helping users access silo datasets in a governed way. Another key element of an end-to-end data strategy is machine learning, which is really critical for governance as well. Today, more companies are adopting ML for their applications. However, governing this end-to-end process for ML presents a unique set of challenges very specific to ML, like onboarding users and monitoring ML models. Because ML model building requires collaboration among many users, including data scientists and data engineers, setting up permissions requires time consuming, customized policy creation for each user group. It's also challenging to capture and share model information with other users in one location, which can lead to inconsistencies and delays in approval workflows. And finally, custom instrumentation is needed to gain visibility into the model performance, and that can be really expensive. To address this for our customers, we are bringing you three new machine learning governance capabilities for Amazon SageMaker, including SageMaker role manager, model cards and model dashboards. (audience applauding) These are really powerful governance capabilities that will help you build ML governance responsibly. To address power permission sharing, role manager helps you define minimum permissions for users in just minutes without automated policy creation for your specific needs. To centralize the ML model documentation, model cards create a single source of truth throughout your entire ML model lifecycle, an auto-populate model training details to accelerate your documentation process. And after your models are deployed, model dashboard increases the visibility with unified monitoring for the performance of your ML models. With all these updates, we are covered now, governance for your data lakes, data viruses and machine learning. But for a true end-to-end governance, you will need to manage data access across all of your services, which is the future state we are building towards. As Adam announced yesterday, we are launching Amazon Data Zone. A data management service that helps catalog, discover, analyze, share and govern data across your organization. Data zone helps you analyze more of your data, not just what's in AWS, but also third-party data services while meeting your security and data privacy requirements. I have had the benefit of being an early customer of data zone. I leverage data zone to run the AWS weekly business review meeting, where we assemble data from our sales pipeline and revenue projections to inform our business strategy. Now, to show you data zone in action, let's welcome our head of product for Amazon Data Zone, Sheikah Varma, to demonstrate how quickly you can enable your organization to access and act on your data. (upbeat music) - Thanks Swami. Wow, it's great to see you all out here. I am so excited to tell all the data people over here. Now you don't have to choose between agility and getting to the data you need and governance to make sure you can share the data across your enterprise. You can get both. We have built Amazon Data Zone to make it easy for you to catalog, organize, share and analyze your data across your entire enterprise with the confidence of the right governance around it. As we know, every enterprise is made up of multiple teams that own and use data across a variety of data stores. And to do their job, data people, like data analysts, engineers and scientists, have to pull this data together, but do not have an easy way to access or even have visibility to this data. Amazon Data Zone fills this gap. It provides a unified environment, a zone, where everybody in your organization from data producers to consumers can go to access, share and consume data in a governed manner. Let's jump into how this works. I'm going to use a very typical scenario that we see across our customers. This may seem familiar to many of you. In this scenario, a product marketing team wants to run campaigns to drive product adoption. Sounds familiar? To do this, they need to analyze a variety of data points, including data that they have in the data warehouse in Redshift, data that they have in their data lake around data marketing campaigns, as well as third party sources like Salesforce. In this scenario, Julia is a data engineer. She is a rock star. She knows the data in and out and often gets requests to share the data in the data lake with other users. To share this more securely with a variety of users across her enterprise, she wants to catalog and publish it in Amazon Data Zone. She is our data producer. And Marina is a rock star marketing analyst. She's a campaign expert who wants to use the data in the data lake to run the marketing campaigns. She is our data consumer. Let's see how Amazon Data Zone helps them connect. Let's start with Julia and see how she publishes data into the data zone. She logs into the data zone portal using her corporate credentials. She knows the data sources that she wants to make available. So she creates an automated sales publishing job. She provides a quick name and description. So let's say publishing agreement, which is essentially like a data contract that tells the consumers how frequently she'll keep this data updated, how to get access, who will authorize access and things like that. She then selects the data sources and the specific tables and the columns that she wants to make available in Amazon Data Zone. She also sets the frequency of how quickly this data will be kept into sync. Within a few minutes, the sales pipeline data and the campaign data from the data lake will be available in Amazon Data Zone. Now Julia has the option to enrich the metadata and add useful information to it so that data consumers like Marina can easily find it. She adds a description, additional context, any other information that would make this data easier to find. We also know that for large data sets, adding and curating all of this information manually is laborious, time consuming and even impossible. So, we are making this much easier for you. We are building, thank you, we are building machine learning models to automatically generate business names for you. And then Julia will have the option at a column level to select the recommendation that we came up with or edit it as you see please. How awesome is that? Thank you, I think so too. Once Julia has created this particular data asset in Data Zone, she wants to make available for the data consumers. In this scenario, since Julia is a data expert and she knows this data very well, she also functions as a data steward. And she could publish this directly into the data zone. But we also know that many of you have set up data governance frameworks or want to set up data governance frameworks where you want to have business owners and data stewards managing your domain the way you'd like to. For this, we also have that option. Now that the data is published and available in Amazon Data Zone, Marina can easily find it. Let's see how easy this is. Marina goes back to Amazon Data Zone, logs in using her corporate credentials, uses the search panel to search for sales. A list of relevant assets is returned and she learns more about the data and where it comes from. She can see a bunch of domains in there. She sees sales, marketing, finance. She can also see that there are data that is data from all kinds of sources. She notices Redshift, Data Lake and Salesforce. And you also saw that there was a variety of facets that she could have sorted the data, the search results on. It's really easy peasy. Now to perform the campaign analysis, Marina wants to work with a few of her team members because they want the same access as her. So now she creates a data project. Creating a data project is a really easy way for her to create a project where she wants the team members to collaborate with. They will get the same access that she wants to the right data sets as well as the right tools, such as Athena, Redshift or QuickSight. Marina knows the data she's after. So she subscribes to it or gets access to it using the identity of the project. And after this, any of her team members can use the deep links available in Amazon DataZone to get to the tools that they want. Using the deep links, they can get to the service directly without any additional configuration or individual permissions. In this particular case, they choose Athena. And now Marina and her team members can query the data that Julia wanted to make available for them using the project context and using the tools that they wanted. So I know this went back quick, but hopefully you can see how easy this is and the entire data discovery, access and usage life cycle is happening through Amazon DataZone. You get complete visibility into who is sharing the data, what data sets is being shared and who authorized it. Essentially, Amazon DataZone gives your data people the freedom that they always wanted, but with the confidence of the right governance around it. As Adam mentioned yesterday, there is really nothing else like it. So I can't wait to see how you use it and come find out more in our dedicated breakout session later today. Thank you. (audience applauds) - Thank you, Shika. It's really exciting to see how easy it is for customers to locate their data and collaborate with DataZone. You'll continue to make it even easier for customers to govern their data for this new service. We're just getting started. So I shared how governance can help be connected tissue by managing data sharing and collaboration across individuals within your organization. But how do you weave a connected tissue within your data systems to mitigate data sprawl and derive meaningful insights? This brings me back to the third lesson from the COSY's living bridges, driving data connectivity for innovation and ultimately survival. Typically, connecting data across silos requires complex ETL pipelines. And every time you want to ask a different question of your data or you want to build a different machine learning model, you need to create a ton of the data pipeline. This level of manual integration is simply not fast enough to keep up with the dynamic nature of data and the speed at which you want your business to move. Data integration needs to be more seamless. To make this easier, AWS is investing in a zero ETL future where you never had to manually build the data pipeline again. (audience applauding) Thank you. We have been making strides in the zero ETL future for several years by deepening integrations between our services that help you perform analytics and machine learning without the need for you to move your data. We provided direct integrations with our AWS streaming services. So you can analyze your data as soon as it's produced and gather timely insights to capitalize on new opportunities. We have integrated SageMaker with our databases and data viruses, so you can leverage your data for machine learning without having to build data pipelines or write a single line of ML code. And with federated querying on Redshift and Athena, customers can now run predictive analytics stored across data, stored in operational databases, data viruses and data lakes without any data movement. While federated querying is a really powerful tool, querying and analyzing data, stored in really different locations, is an optimized for maximum performance when compared to traditional ETL methods. That's why this week, we are making it easier for you to leverage your data with creating and managing ETL pipelines. Yesterday, we announced Aurora now support zero ETL integration with Amazon Redshift, thereby bringing your transactional data sitting in Aurora to the analytics capabilities in Redshift together. This new integration is already helping customers like Adobe to spend less time building Redshift ETL pipelines and more time gathering insights to enhancing their core service like Adobe Acrobat. We are also removing the heavy lifting from ETL pipeline creation for customers who want to move data between S3 and Redshift. For example, imagine you're an online retailer trying to ingest terabytes of customer data from S3 into Redshift every day. To quickly analyze how your shoppers are interacting with your site and your application and how are they making this purchasing choices? While this typically requires creation of ETL pipelines, what if you had the option to automatically and continuously copy all of your data with a single command? Would you take a... Today, I'm excited to announce Amazon Redshift now supports auto copy from S3 to make it easier to continuously ingest your data. (audience applauding) With this update, now customers can easily create and maintain simple data pipelines for continuous ingestion. Ingestion rules are automatically triggered when new files are landing on your S3 buckets without relying on custom solutions or managing third party services. This integration also makes it easy for analysts to automate data loading without any dependencies on your critical data engineers. With these updates I have shared today, including Aurora's zero ETL integration with Redshift, auto copy from S3, as well as integration of Apache Spark with Redshift. We are making it easy for you to analyze all of your Redshift data no matter where it resides. And I didn't even cover all of our latest innovations in this space. To learn more, make sure to attend this Atollons Leadership session with G2 Krishramurthy, our VP of AWS Analytics. With our zero ETL mission, we are tackling the problem of data sprawl by making it easier for you to connect to your data sources. But in order for this to work, you can't have connections just to some of your data sources. You need to be able to seamlessly connect to all of them, whether they live in AWS or in external third party applications. That's why we are heavily investing in bringing your data sources together. For example, you can stream data in real time from more than 20 AWS and third party sources at Kinesis Data Firehouse. A fully managed serverless solution that enables customers to automatically stream the data into S3, Redshift, OpenSearch, Splunk, Sumo Logic, and many more with just a few clicks. Amazon SageMaker Data Wrangler, our low-code visual data prep tool for machine learning, makes it easy to import data from a wide variety of data sources for building your ML models. And Amazon Aplo, our no-code, fully managed integration service, offers connectors to easily move your data between your cloud-based SaaS services and your data lakes and data warehouses. Because these connectors are fully managed and supported by us, you can spend less time building and maintaining these connections between your data stores and more time maximizing business value with your data. Our customers tell us they love the no-code approach to our connector library. However, as expected, they have continued to ask for even more connectors to help them bring their data sources together. That's why today I'm pleased to share the release of 22 new Aplo connectors, including popular marketing sources like LinkedIn ads and Google ads. (audience applauding) With this update, our Aplo library now has more than 50 connectors in total from data sources like S3, Redshift, and Snowflake to cloud-based application services like Salesforce, SAP, and Google Analytics. In addition to offering new connectors in Aplo, we are also doing the same for Data Wrangler in SageMaker. While SageMaker already supports popular data sources like Databricks and Snowflake, today we are bringing you more than 40 new connectors through SageMaker Data Wrangler, allowing you to implement and import even more of your data for ML model building and training. With access to all of these data sources, you can realize the full value of your data across your SaaS services. Now, looking across of all of our services, AWS connects to hundreds of data sources, including SaaS application, On-Prem and other clouds, so you can leverage the power of all of your data. We are thrilled to introduce these new capabilities that make it easier to connect and act on your data. Now, to demonstrate the power of bringing all your data together to uncover actionable insights, let's welcome Anna Beria-Asperia, Global Vice President R&D IT at AstraZeneca. (upbeat music) - Good morning. I know it's really early, but I need your help. Can I ask you to raise your hand if you or any of your loved ones have been impacted by lung disease? And now keep those up, and now add to those hands if you or anyone you know have been impacted by heart failure or heart diseases, and add to those hands if you know anyone or any of your loved one has been impacted by cancer. Look around, these touches so many of us. It's important, you can take the hands down. Thank you very much. We at AstraZeneca, a global biopharmaceutical company, are breaking the boundaries of science to deliver life-changing medicines. We use data, AI and ML with ambition to eliminate cancer as a cause of death and protect the lives of patients with heart failure or lung diseases. In order to understand how we are breaking the boundaries of the physical science, we need to zoom in and start really small with the genomes. The transcriptome, proteome, melanolones, say that fast with the Swedish accent three times, it's quite hard. The genome is a complete set of our DNA, and every single cell in the body contains a copy of the three billion DNA-based pairs. Mapping the genome and covering new insights to disease biology and help us discover new disease therapies. Today, our center of genome research is on track to analyze up to 2 million whole genomes by 20 to the six. The scale of our genome database is massive, but it's really hard to manage the database at that scale, but we do it together with AWS. Together, we move 25 petabytes of data across the global network. We process whole genome across multiple region, generating 75 petabyte of data in their intermediate process. At a high level, we use AWS step function, AWS lumber for orchestration, AWS batch to provision optimal compute, Amazon S3 for storage. So the list is important, we all know that, but the impact is so much more critical. We can now run 110 billion statistical tests in under 30 hours, helping us provide genetic input to the AstraZeneca projects. Genomics give us the DNA blueprint, but as you know, it's not the only home. Beyond the genome, it's largely untapped repositories of rich data that if connected could give us valuable insights and we bring it together, together with AWS, into multi-omics. We bring the multi-omics data together and make it available to mine for actionable insights by the scientists. Having the bandwidth to process and maintain the data, multi-omics data, give us a possibility to take a step back. And the ads to understanding our disease by looking small at the data that we're at hand, the tumor scans, the medical images and patient data, and we pull it together to detect patterns. For example, in lung cancer studies, we need to measure the tumor scans, the CT scans, and we use a similar deep learning technology, self-driving cars used to understand the 3D environment to run them. Today, we use this for clinical trials, but in the future, this technology could be used to inform doctors about treatment decisions with the prediction of what's gonna happen next. As you can imagine, the quantity of the data at hand has grown exponentially. And we are accelerating to get a bit AWS, the pace that the scientists can unlock patterns by democratizing ML into an organization using Amazon SageMaker. We use AWS service catalog to stand up templated end-to-end MLOps environment in minutes. And we take every single step with extra care as we're managing patient data in a highly regulated industry. We can now run hundreds of concurrent data science ML projects concurrent to form insights into science. So we look small at the multio-ex-data, we looked at the data at hand, but one of the most exciting advancements in the industry right now is that patient can choose in clinical trials to share the data with us from their own homes. Today, the digital technology is able to collect the data from the patient's home on the daily or even continuous basis. And the data collected is as reliable as data could only be collected in clinical settings before. The data adds value and enables also to collect data from underdeveloped regions and remote locations. Moving us toward early diagnosis, the sees prediction for all people because our future depend on healthy people, a health society and a health planet. AWS help us to pull all the data together, the multi-omics, the data at hand with their medical images, the tumor scans, the remote data collection and help us to accelerate insights to science through data, AI and ML. Today, I raised my hand in the beginning. I've been impacted by cancer. I lost my father in 2018. This is my father and my mother in a year before he passed. And it reminds me every day that every data point that we handle is a patient, a loved one. I work at Dr. Sarnika and with my thousands of colleagues so you can spend every day possible with your loved ones and it's my privilege to do so. Thank you. (audience applauds) (upbeat music) - Wow, what a heartfelt and inspirational story. Thank you Anna. I'm truly amazed by how Astra Sarnika was able to democratize data and machine learning to enable these types of innovation in healthcare. This brings me to the third and final element of a strong data strategy, democratizing data. Since I joined Amazon 17 years ago, I have seen how data can spur innovation in all levels, right from being an intern to a product manager, to a business analyst with no technical expertise. But all these can happen only if you enable more employees to understand and make sense of data. With the workforce that is trained to organize, analyze, visualize and derive insights from your data, you get cast of wider net for your innovation. To accomplish this, you will need access to educated talent to fill the growing number of data in ML roles. You need professional development programs for your current employees and you need local tools that enable non-technical employees to do more with your data. Now, let's look at how AWS is preparing students, the future backbone of our data industry to implement these types of solutions we have discussed here today. This may surprise some of you, but I grew up in the outskirts in southern part of India outside the city, where we had one computer for the entire high school. Since I didn't come from an affluent family, I learned to code on this computer with only 10 minutes of access every week. And I was fascinated, but you don't have to grow up in a rural Indian village to experience limited access to computer science education. It's happening here every day in the United States. In fact, the US graduates only 54,000 CS students each year, and that is the dominant pathway to roles in AI and ML. Yet the AI workforce is expected to add 1 million jobs by 2029. This creates quite the gap, and the graduation pipeline is further hindered by a lack of diversity. This is where community colleges and minority serving institutions can't really help. They are the critical access point to higher education in the US with more than 4 million students enrolled just last year. While data and ML programs are available in many universities, they are really limited in community colleges and MSIs where lower income and underserved students are more likely to enroll. And the faculty members with limited resources, they simply cannot keep up with the necessary skills to teach data management, AI and ML. If we want to educate the next generation of data developers, then we need to make it easy for educators to do their jobs. We need to train the trainers. That's why today I'm personally very proud to announce a new educator program for community colleges and MSIs through AWS MLU. (audience applauding) This new train the trainer program includes the same content we use to train Amazon engineers, as well as the coursework we currently offer to institutions like UC Berkeley. Faculty can access free compute capacity, guided curriculum and ongoing support from tenured science educators. With all these resources, educators are now a quip to provide students with AI ML courses, certificates and degrees. We have currently onboarded 25 educators from 22 US community colleges and MSIs. And in 2023, we expect to train an additional 350 educators from up to 50 community colleges across the United States. We were able to bring an early version of this program to Houston Community College. Our team worked with HCC to create a tailored sequence of content for their students. And now they are the first community college to have this coursework accepted as a full bachelor's degree. With continued feedback, look up. (audience applauding) With continued feedback from educators, we will continue to remove barriers that educators face in this arena. My vision is that AWS will democratize access to data education programs just like we do across our organizations and we are making progress. We are building years of programmatic efforts to make student data education more accessible. Last year, we announced AWS AI and ML scholarship programs to provide $10 million to underserved and underrepresented students and we have awarded 2,000 scholarships today. We also provided students with hands-on training opportunities with AWS Academy, SageMaker Studio Lab and AWS DeepRacer, our 118 scale race card driven by reinforcement learning. I hope that these programs can enable students to create parts of their own just like I did. So democratizing access to education through data and ML programs is really critical. But it's clear we won't be able to fill this skill gap just through student education alone. That's why in addition to educating those entering the workforce, organizations must also focus on how to leverage their existing talent pool to support their future growth. Through our training programs, we are enabling organizations to build data literacy through ML tools, classroom-taining and certifications. As I mentioned, AWS DeepRacer helps us train students on ML through reinforcement learning. But DeepRacer is not just for students. In fact, more than 310,000 developers from over 150 countries have been educated on ML with AWS DeepRacer. It continues to be the fastest way to get hands-on with ML, literally. In addition, we now offer customers more than 150 professional development courses related to data analytics and ML with 18 new courses launched in 2022. And we will continue to add more. Now, while closing the cloud skills gap is critical, not every employee need to have the technical expertise to do data-driven innovation. In fact, you need individuals in your organization without coding experience to help you connect the dots with your data. That's why we provide no code and no code tools that helps data analysts and marketers, typically known as your data consumers, to visualize and derive insights from your data. QuickSight is our ML powered BI solution that allows users to connect to data sources like S3, Redshift, or Athena, and create interactive dashboards in just minutes. We are content to add new capabilities in QuickSight at a rapid flip, but with more than 18 new features introduced just in the past year alone. And this week, Adam touched on a new capability called QuickSight Pachinated Reports that makes it easier for customers to use multiple reporting systems to create print-friendly, highly formatted reports in QuickSight. He also shared new features for QuickSight Cube, which allows user to query the data in plain language without writing a single line of code. With these new capabilities, business users can ask why questions to better understand factors that are impacting their underlying data trends. They can ask and forecast metrics by saying something like forecast sales for the next 12 months and get an immediate response based on information like your past data and seasonality. With Amazon QuickSight, you can enable more employees to create and distribute insights. And now more than 100,000 customers use QuickSight to help them act on data. For example, BestVestern, a global hotel chain, use QuickSight to share data with 23,000 hotel managers and employees at more than 4,600 properties, enabling them to elevate their guest experience and drive ongoing business value. Another tool we offer in this arena is SageMaker Canvas, a no-code interface to build ML models with your data. Using Canvas, analysts can import data from various sources, automatically prepare data and build and analyze ML models with just a few clicks. And we are continuing to invest in no-code, no-code tools that features that enhance collaboration across technical and non-technical roles. With all these services, accesses no longer relegated to just one department in your organization. If you want to expand the number of ideas within your organization, you have to expand across different types of employees so that sparks can come from anywhere. Let's see how one customer, one of our other games, did just that. The current landscape of gaming is way more free to play, which means that the processing of data and those needs just grow and grow over time. Warner Brothers Games has worked with AWS since 2014. Because we worked with AWS, it's meant our business could scale easily. Peak volumes on a launch day, we pull in about three billion events per day. There's no reason to guess or just go purely off-gun instinct anymore. It's data, data drives all of the decisions. AWS is such an important partner because we don't have to worry about scale. We've tested up to 300,000 data events a second. We know when we launch a game, it's not gonna pull down on us. A specific example of how we use data to influence our strategies and multiverses. Multiverse is a 2v2 brawler game. Featuring all the best characters from Warner Brothers. People know these characters, they love these characters. And if the design team doesn't nail the visceral feel of these characters, it's going to show up in the data. We can find that through the data, see how many people are getting impacted and then propose solutions that will make the game better. One of the biggest light bulb moments that I encountered is bringing our analytics data back into our partner's line of business tools. Whether it's a game designer, designing a character or a world and seeing telemetry around how that world's behaved. Or bringing data into our CRM systems so folks that are marketing and interacting with our players can see what their experience with us has been historically. Anytime that we make a suggestion and it's changed in the game, I know why that change happened and what drove that decision making. Making the game better for the players, that's what it's really all about. Woo! (upbeat music) So, the three elements of a modern data strategy I shared with you this morning. Building future-proof data foundations, weaving connective tissue and democratizing data across your organization. All of them play a critical role in helping you do more with your data. But if I can leave you with one thing today, please remember, it's individuals who ultimately create these parts. But it is the responsibility of leaders to empower them with a data-driven culture to help them get there. Now, go create the next big invention. Thank you. (audience applauding) (upbeat music)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment