Created
October 10, 2024 08:18
-
-
Save natzir/c6ed5dad99645fe7984ab47a280c5593 to your computer and use it in GitHub Desktop.
YouTube_Video_Summarizer- GPT-4.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"provenance": [], | |
"authorship_tag": "ABX9TyNwy+hIjSrs7xUPiimbSkUT", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
}, | |
"widgets": { | |
"application/vnd.jupyter.widget-state+json": { | |
"c61857b65e5a49219b54790606c63cc6": { | |
"model_module": "@jupyter-widgets/controls", | |
"model_name": "ButtonModel", | |
"model_module_version": "1.5.0", | |
"state": { | |
"_dom_classes": [], | |
"_model_module": "@jupyter-widgets/controls", | |
"_model_module_version": "1.5.0", | |
"_model_name": "ButtonModel", | |
"_view_count": null, | |
"_view_module": "@jupyter-widgets/controls", | |
"_view_module_version": "1.5.0", | |
"_view_name": "ButtonView", | |
"button_style": "", | |
"description": "Generate Summary", | |
"disabled": false, | |
"icon": "", | |
"layout": "IPY_MODEL_b8fdf5d378704356b045805b736b32a8", | |
"style": "IPY_MODEL_e563f09de87c4b44b738b11cb2dddd8d", | |
"tooltip": "" | |
} | |
}, | |
"b8fdf5d378704356b045805b736b32a8": { | |
"model_module": "@jupyter-widgets/base", | |
"model_name": "LayoutModel", | |
"model_module_version": "1.2.0", | |
"state": { | |
"_model_module": "@jupyter-widgets/base", | |
"_model_module_version": "1.2.0", | |
"_model_name": "LayoutModel", | |
"_view_count": null, | |
"_view_module": "@jupyter-widgets/base", | |
"_view_module_version": "1.2.0", | |
"_view_name": "LayoutView", | |
"align_content": null, | |
"align_items": null, | |
"align_self": null, | |
"border": null, | |
"bottom": null, | |
"display": null, | |
"flex": null, | |
"flex_flow": null, | |
"grid_area": null, | |
"grid_auto_columns": null, | |
"grid_auto_flow": null, | |
"grid_auto_rows": null, | |
"grid_column": null, | |
"grid_gap": null, | |
"grid_row": null, | |
"grid_template_areas": null, | |
"grid_template_columns": null, | |
"grid_template_rows": null, | |
"height": null, | |
"justify_content": null, | |
"justify_items": null, | |
"left": null, | |
"margin": null, | |
"max_height": null, | |
"max_width": null, | |
"min_height": null, | |
"min_width": null, | |
"object_fit": null, | |
"object_position": null, | |
"order": null, | |
"overflow": null, | |
"overflow_x": null, | |
"overflow_y": null, | |
"padding": null, | |
"right": null, | |
"top": null, | |
"visibility": null, | |
"width": null | |
} | |
}, | |
"e563f09de87c4b44b738b11cb2dddd8d": { | |
"model_module": "@jupyter-widgets/controls", | |
"model_name": "ButtonStyleModel", | |
"model_module_version": "1.5.0", | |
"state": { | |
"_model_module": "@jupyter-widgets/controls", | |
"_model_module_version": "1.5.0", | |
"_model_name": "ButtonStyleModel", | |
"_view_count": null, | |
"_view_module": "@jupyter-widgets/base", | |
"_view_module_version": "1.2.0", | |
"_view_name": "StyleView", | |
"button_color": null, | |
"font_weight": "" | |
} | |
} | |
} | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/natzir/c6ed5dad99645fe7984ab47a280c5593/youtube_video_summarizer-gpt-4.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Summarize YouTube Transcripts with OpenAI\n", | |
"\n", | |
"\n", | |
"---\n", | |
"\n", | |
"**Author:** Natzir, Technical SEO / Data Scientist\n", | |
"<br>**Twitter:** [@natzir9](https://twitter.com/natzir9)\n", | |
"\n", | |
"---\n", | |
"\n", | |
"This Colab notebook is designed for individuals seeking to quickly understand the content of YouTube videos without watching them in their entirety. It is also useful for content creators and niche marketers.\n", | |
"\n", | |
"**Key Functions:**\n", | |
"- **Transcript Extraction**: Extracts transcripts from specified YouTube videos.\n", | |
"- **Language-Independent Summarization**: Summarizes these transcripts in any desired language, effectively eliminating language barriers.\n", | |
"- **Article Generation**: Converts the summary into an HTML article that can be directly embedded on your website.\n", | |
"- **Cost Control**: You can rest assured, a cost estimate, based on the model and video duration, will be provided during the execution. Opt in or out before generating.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Before you can use this document, please make sure to **create a copy** of it in your Google Drive (File > Save a copy in Drive) and secure an [OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key).\n" | |
], | |
"metadata": { | |
"id": "SYofDZqh9dSL" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"#@markdown # Step 1: \"Play\" this cell to install the the required libraries\n", | |
"#@markdown ---\n", | |
"!pip install openai==v0.28.1\n", | |
"!pip install openai youtube_transcript_api\n", | |
"import openai\n", | |
"from youtube_transcript_api import YouTubeTranscriptApi\n", | |
"import ipywidgets as widgets\n", | |
"from google.colab import files\n", | |
"import textwrap\n", | |
"import re" | |
], | |
"metadata": { | |
"id": "BdQcvTg4iW5N" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# @markdown # Step 2: Setup\n", | |
"# @markdown\n", | |
"openai.api_key = \"\" #@param {type: \"string\"}\n", | |
"\n", | |
"#@markdown ---\n", | |
"# @markdown ### Video configuration\n", | |
"video_url = \"https://www.youtube.com/watch?v=iJPu4vHETXw&t=439s\" #@param {type: \"string\"}\n", | |
"# @markdown **Time range in seconds (optional, leave as-is for full transcript)**\n", | |
"\n", | |
"start_time = \"0\" # @param {type: \"string\"}\n", | |
"end_time = \"None\" # @param {type: \"string\"}\n", | |
"\n", | |
"#@markdown ---\n", | |
"# @markdown ### Summary configuration\n", | |
"output_language = \"Spanish\" #@param [\"Afrikaans\",\"Albanian\",\"Amharic\",\"Arabic\",\"Armenian\",\"Azerbaijani\",\"Basque\",\"Belarusian\",\"Bengali\",\"Bihari\",\"Bosnian\",\"Breton\",\"Bulgarian\",\"Cambodian\",\"Catalan\",\"Chinese (Simplified)\",\"Chinese (Traditional)\",\"Corsican\",\"Croatian\",\"Croatian\",\"Czech\",\"Danish\",\"Dutch\",\"English\",\"Esperanto\",\"Estonian\",\"Faroese\",\"Filipino\",\"Finnish\",\"French\",\"Frisian\",\"Galician\",\"Georgian\",\"German\",\"Greek\",\"Guarani\",\"Gujarati\",\"Hausa\",\"Hebrew\",\"Hindi\",\"Hungarian\",\"Icelandic\",\"Indonesian\",\"Interlingua\",\"Irish\",\"Italian\",\"Japanese\",\"Javanese\",\"Kannada\",\"Kazakh\",\"Kinyarwanda\",\"Kirundi\",\"Korean\",\"Kurdish\",\"Kyrgyz\",\"Laothian\",\"Latin\",\"Latvian\",\"Lingala\",\"Lithuanian\",\"Macedonian\",\"Malagasy\",\"Malay\",\"Malayalam\",\"Maltese\",\"Maori\",\"Marathi\",\"Moldavian\",\"Mongolian\",\"Montenegrin\",\"Nepali\",\"Norwegian\",\"Norwegian (Nynorsk)\",\"Occitan\",\"Oriya\",\"Oromo\",\"Pashto\",\"Persian\",\"Polish\",\"Portuguese (Brazil)\",\"Portuguese (Portugal)\",\"Punjabi\",\"Quechua\",\"Romanian\",\"Romansh\",\"Russian\",\"Scots Gaelic\",\"Serbian\",\"Sesotho\",\"Shona\",\"Sindhi\",\"Sinhalese\",\"Slovak\",\"Slovenian\",\"Somali\",\"Spanish\",\"Sundanese\",\"Swahili\",\"Swedish\",\"Tajik\",\"Tamil\",\"Tatar\",\"Telugu\",\"Thai\",\"Tigrinya\",\"Tonga\",\"Turkish\",\"Turkmen\",\"Twi\",\"Uighur\",\"Ukrainian\",\"Urdu\",\"Uzbek\",\"Vietnamese\",\"Welsh\",\"Xhosa\",\"Yiddish\",\"Yoruba\",\"Zulu\"]\n", | |
"summary_model = \"gpt-4\" #@param [\"gpt-4\",\"gpt-3.5-turbo-16k\",\"gpt-3.5-turbo\"]\n", | |
"# @markdown **Article configuration (optional, check to enable article generation)**\n", | |
"article_generation = True # @param {type:\"boolean\"}\n", | |
"article_model = \"gpt-4\" #@param [\"gpt-4\",\"gpt-3.5-turbo-16k\"]\n", | |
"\n", | |
"# Token cost rates per model for input and output\n", | |
"model_conf = {\n", | |
" \"gpt-4\": {\"input\": 0.03, \"output\": 0.06, \"max_tokens_value\": 4000, \"text_chunk_size\": 14000 },\n", | |
" \"gpt-3.5-turbo-16k\": {\"input\": 0.003, \"output\": 0.004, \"max_tokens_value\": 4000, \"text_chunk_size\": 14000},\n", | |
" \"gpt-3.5-turbo\": {\"input\": 0.0015, \"output\": 0.002, \"max_tokens_value\": 2000, \"text_chunk_size\": 8000}\n", | |
"}\n", | |
"\n", | |
"start_time = None if start_time.lower() == \"none\" else int(start_time)\n", | |
"end_time = None if end_time.lower() == \"none\" else int(end_time)\n", | |
"\n", | |
"video_id = video_url.split(\"v=\")[1]\n", | |
"transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)\n", | |
"\n", | |
"try:\n", | |
" transcript = transcript_list.find_manually_created_transcript(transcript_list._manually_created_transcripts.keys())\n", | |
"except:\n", | |
" try:\n", | |
" transcript = transcript_list.find_generated_transcript(transcript_list._generated_transcripts.keys())\n", | |
" except Exception as e:\n", | |
" print(f\"Error: {e}\")\n", | |
"\n", | |
"# Initialize an empty string to hold the final transcript\n", | |
"text = \"\"\n", | |
"\n", | |
"if transcript:\n", | |
" fetched = transcript.fetch()\n", | |
" for entry in fetched:\n", | |
" if start_time is not None and entry[\"start\"] < start_time:\n", | |
" continue\n", | |
" if end_time is not None and entry[\"start\"] > end_time:\n", | |
" continue\n", | |
" text += \" \" + entry[\"text\"]\n", | |
"else:\n", | |
" text = \"No transcript available\"\n", | |
"\n", | |
"print(f\"Starting transcript extraction...\\n\\n{text}\\n\")\n", | |
"print(f\"-------------------\")\n", | |
"\n", | |
"\n", | |
"prompt = f\"\"\"Please provide a succinct and comprehensive summary in \"{output_language}\" of the following TEXT. Adhere to the following guidelines:\n", | |
"- Ensure the summary is clear, well-organized, and easy to read. Utilize headings and subheadings to guide the reader through each section.\n", | |
"- Accurately convey the author's intended meaning, capturing the main points and key details.\n", | |
"- The length should be adequate to encapsulate the main points and key details without being overly verbose or including extraneous information.\n", | |
"<<TEXT>>\n", | |
"\"\"\"\n", | |
"\n", | |
"article_prompt = f\"\"\"Generate a well-structured and engaging article based on the provided SUMMARY of a transcript. The article should be in \"{output_language}\" and adhere to the following guidelines:\n", | |
"\n", | |
"- Structure the article with a clear introduction, body, and conclusion. Employ headings and subheadings to segment the content logically.\n", | |
"- Begin with an introduction that provides context and draws the reader in, followed by the body that expands on the summary's main points, and conclude with a synthesizing summary or a call to action.\n", | |
"- Keep the language clear, concise, and engaging, ensuring a pleasant reading experience.\n", | |
"- Incorporate relevant data, quotes or examples from the summary to substantiate the points being made.\n", | |
"- Avoid repetition and ensure the flow of ideas is coherent and progresses logically.\n", | |
"- Adhere to a coherent style and tone that suits the intended audience and purpose of the article.\n", | |
"- Maintain accuracy and fidelity to the original summary's content and intent.\n", | |
"- Format the article in HTML.\n", | |
"- The article should be formatted as follows:\n", | |
"\n", | |
"<H1> Article Title </H1>\n", | |
"<H2> Introduction </H2>\n", | |
"...Introduction Text...\n", | |
"<H2> Descriptive body heading </H2>\n", | |
"...Body Text...\n", | |
"<H2> Conclusion </H2>\n", | |
"...Conclusion Text...\n", | |
"\n", | |
"<<SUMMARY>>\n", | |
"\"\"\"\n", | |
"\n", | |
"def estimate_cost(prompt, text, model, chars_per_token=4):\n", | |
"\n", | |
" # Selecting the rates based on the chosen model\n", | |
" input_rate = model_conf[model][\"input\"]\n", | |
" output_rate = model_conf[model][\"output\"]\n", | |
"\n", | |
" # Calculating the number of tokens\n", | |
" prompt_tokens = len(prompt) / chars_per_token\n", | |
" text_chunks = textwrap.wrap(text, model_conf[model][\"text_chunk_size\"])\n", | |
" text_tokens = sum(len(chunk) / chars_per_token for chunk in text_chunks)\n", | |
" total_tokens_input = prompt_tokens + text_tokens\n", | |
" total_tokens_output = text_tokens # Assuming the output tokens equal the input text tokens, adjust as needed\n", | |
"\n", | |
" # Calculating the cost\n", | |
" cost_input = total_tokens_input * input_rate / 1000 # cost rate is per 1000 tokens\n", | |
" cost_output = total_tokens_output * output_rate / 1000\n", | |
" total_cost = cost_input + cost_output\n", | |
"\n", | |
" return total_cost\n", | |
"\n", | |
"# Calling the estimate_cost function with the selected model\n", | |
"summary_estimated_cost = estimate_cost(prompt, text, summary_model)\n", | |
"print(f\"\\nEstimated cost of the summary: ${summary_estimated_cost:.2f}\")\n", | |
"print(\"Would you like to generate the summary?\\n\")\n", | |
"\n", | |
"def gpt_completion(prompt, model):\n", | |
" try:\n", | |
" response = openai.ChatCompletion.create(\n", | |
" model=model,\n", | |
" messages=[{\"role\": \"user\", \"content\": prompt}],\n", | |
" temperature=0.1,\n", | |
" max_tokens=model_conf[model][\"max_tokens_value\"],\n", | |
" top_p=1,\n", | |
" frequency_penalty=0.2,\n", | |
" presence_penalty=0.2\n", | |
" )\n", | |
" content = response[\"choices\"][0][\"message\"][\"content\"].strip()\n", | |
" return content\n", | |
" except openai.error.OpenAIError as error:\n", | |
" return f\"OpenAIError: {error}\"\n", | |
"\n", | |
"def generate_article(b, consolidated_summary):\n", | |
" print(f\"\\n-------------------\")\n", | |
" print(\"Starting article generation...\\n\")\n", | |
" adjusted_prompt_article = article_prompt.replace(\"<<SUMMARY>>\", consolidated_summary).encode(\"ASCII\", errors=\"ignore\").decode()\n", | |
" article = gpt_completion(adjusted_prompt_article, article_model)\n", | |
" print(f\"\\nArticle: {article}\")\n", | |
" article_filename = \"article.txt\"\n", | |
" with open(article_filename, \"w\", encoding=\"utf-8\") as file:\n", | |
" file.write(article)\n", | |
" files.download(article_filename)\n", | |
" print(\"\\nArticle generation complete.\")\n", | |
"\n", | |
"# Function to execute after the summary button is clicked\n", | |
"def on_button_click(a):\n", | |
" summaries = []\n", | |
" text_chunks = textwrap.wrap(text, model_conf[summary_model][\"text_chunk_size\"])\n", | |
" print(f\"\\n-------------------\")\n", | |
" print(\"Starting summary generation...\")\n", | |
"\n", | |
" for index, chunk in enumerate(text_chunks, start=1):\n", | |
" adjusted_prompt = prompt.replace(\"<<TEXT>>\", chunk).encode(\"ASCII\", errors=\"ignore\").decode()\n", | |
" summary_text = gpt_completion(adjusted_prompt, summary_model)\n", | |
" print(f\"\\nProcessing {index} of {len(text_chunks)} - Summary:\\n{summary_text}\\n\")\n", | |
" summaries.append(summary_text)\n", | |
"\n", | |
" print(\"\\nSummary generation complete.\")\n", | |
" summary_filename = \"summary.txt\"\n", | |
" with open(summary_filename, \"w\", encoding=\"utf-8\") as file:\n", | |
" file.write(\"\\n\\n\".join(summaries))\n", | |
" files.download(summary_filename)\n", | |
"\n", | |
" consolidated_summary = \"\".join(summaries)\n", | |
"\n", | |
" def on_article_button_click(b):\n", | |
" generate_article(b, consolidated_summary)\n", | |
"\n", | |
" if article_generation:\n", | |
"\n", | |
" article_estimated_cost = estimate_cost(article_prompt, consolidated_summary, article_model)\n", | |
"\n", | |
" print(f\"-------------------\")\n", | |
" print(f\"\\nEstimated cost of the article: ${article_estimated_cost:.2f}\")\n", | |
" print(\"Would you like to generate the article?\\n\")\n", | |
"\n", | |
" article_button = widgets.Button(description=\"Generate Article\")\n", | |
" article_button.on_click(on_article_button_click)\n", | |
" display(article_button)\n", | |
"\n", | |
"# Create and display summary button\n", | |
"summary_button = widgets.Button(description=\"Generate Summary\")\n", | |
"summary_button.on_click(on_button_click)\n", | |
"display(summary_button)\n", | |
"\n", | |
"# @markdown ### After configuring the parameters, click the \"Play\" button.\n", | |
"\n", | |
"#@markdown ---" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 277, | |
"referenced_widgets": [ | |
"c61857b65e5a49219b54790606c63cc6", | |
"b8fdf5d378704356b045805b736b32a8", | |
"e563f09de87c4b44b738b11cb2dddd8d" | |
] | |
}, | |
"id": "Y2ffztz60wGc", | |
"outputId": "f3409c78-91b5-423b-c316-1ecd3ed14f93" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Starting transcript extraction...\n", | |
"\n", | |
" so this session is how Google works a Google ranking engineers story we are really lucky to have Paul Harvey's a principal engineer at Google he has one of those titles that sounds very subdued but actually doesn't really reflect that he's part of the senior leadership from the Google's ranking team so it's real you're very lucky to have him out here talking today he's going to go through some slides and give you a sense of what it's like from his perspective working with the team that builds out the ranking process and then we should have some time for Q&A after that so if you please welcome Paul I'm Paul haar I've actually been working at rank on ranking at Google for 14 years as of tomorrow I told Danny what my claim to fame in this room should really be that I was Matt Cutts his office mate for about two years I've actually I've worked on retrieval I've worked on a large but different a lot of different parts of ranking I've worked on indexing these days I manage a couple of small teams I participate in our launch review process and I even still do some coding I want to talk about Google search today for just a couple minutes there are two themes I would say that are going through Google search today maybe an emerging one that I'm not going to talk as much about first of all we're really thinking about the world in a mobile-first way that we're just seeing much more of our traffic coming from mobile than it than it ever has that is changing the way we think about the search results page that is the way that is changing a lot about the way we think about search people want their information in a directly consumable way we need to get it to them very quickly despite the fact they probably have low bandwidth when you're searching on on a mobile device you're much less likely to type you're much more likely to use voice or just tap on a click target and your location matters a lot on mobile the other the other thing that's going on with Google search I'm sure everybody here has noticed there's a lot of search features we are doing spelling suggestions like we've always done an autocomplete is playing a much bigger role things like the knowledge graph and map and images are all over our search results and so that's sort of where I would characterize things we're going more and more into a world where search is being thought of as an assistant to all parts of your life and some of that is showing up as search reaching out to help people directly but I am going to talk about ranking which is a very specific sub problem of all this of the search problem which I would describe as the ten blue links problem and so I'm mostly going to be talking about classic search and and how we've been doing things for ages and ages now so everybody's used to ten blue links that's all there used to be a little while ago and I would reduce the ten blue links problem - what do we show and what order do we show them in and I should also mention I'm not going to be talking about ads at all in probably everybody in this room knows more about Google Ads than I do and my job ads are great they make us a lot of money they they work very well for for advertisers but my job we're explicitly told don't think about the effect on ads don't think about the effect on revenue just think about helping the user so I'm gonna start with talking about talking about what we call life of a query this is actually modeled on a class we do for every new engineer at Google whether they're working on search or Android or ads or self-driving cars every engineer coming into Google gets a half-day class on how our systems are put together I'm not gonna give you a half a half day class I'm going to use five a five minute version of it just to sort of understand what what our systems are like so there are two parts of a search engine there's what we do before we see a query and once we have the query so before we have a query we crawl the web everybody here is used to seeing Googlebot crawl their sites not much to say there because we Friday we try to crawl as comprehensive a part of the web as we can it's measured these days in the billions of pages I don't even have an exact number after we've gathered all those pages we analyze the crawled pages in in the old days analyzing crawled pages was we extracted the links there was a little bit else but it was basically just give me the links out of off the web these days we do a lot of semantic analysis an annotation some of this is linguistic some of this is related to the knowledge graph some of its things like address extraction and so on and then we also do content rendering and this is something I think Google's talked a lot about in the past couple of years it's new for us over the past few years it's a big deal for us that we are much closer to taking into account the JavaScript and CSS on your pages in most cases you should not have to do anything special and we get a suite we get to see the same version of the page with full rendering with CSS and all that that that your users say and that's a that's been a real benefit I think for both users and webmasters and then after that we built an index and everybody knows what an index in a book is the web index is very similar we it's for every word it is a list of the pages that that the word appears on as a practical matter to deal with the scale of the web we break the web up into groups of millions of pages which are distributed randomly and each of these millions of pages is called the it's called an index shard and we have thousands of shards for the web index and then we also have some as well as the list of words we have some per document metadata so that is the index building process these days it runs continuously it processes again some number of billions of pages a day I'm now I'm going to turn to the what happens at serving time when we actually get a query we have I'm gonna break that into three parts we do a query understanding part where we we try to figure out what the query means we do retrieval and scoring which we'll talk a little bit more about and then we do some after after we've done the retrieval part we do some adjustments so query understanding first first question is do we know any named entities in the query the San Jose Convention Center we know what that is Matt Cutts we know what that is and so we label those and then are there useful synonyms does General Motors in this context mean does GM mean General Motors does GM mean genetically modified and my point there is just that context matters we looked at the whole we look at the whole query for context once we have this expanded query we send this query to all the shards that I just talked about from the index and for each shard then we find something we find all the pages that match all is an exageration we find pages that match we compute a score for the query and the page and this computing the score is sort of the the heart of ranking and a lot in a lot of ways that's we come up with a number that represents how how good a match the query is for the page once we have that we send each of shard sends back the top end pages by the score it's a some small number of pages for each shard we've got a lot of shards the central server then combines all the top pages sorts by the score and that then we're almost done and then we do some post retrieval adjustments and this so this is looking at diversity by hosts looking at how much duplication there is spammed emotions kick in at this point and a whole bunch of other other phase little things come in at that point then we generate we generate snippets we've got our top ten we show we produce a search result page after we merge with other search features and send it back to the user so what I'm trying to convey it guess in this talk to some degree is what do ranking engineers do and the first version of is just we write code for the service that I just talked about now that's a very operational definition doesn't doesn't actually get at anything useful yet so we'll see if we get more useful talking a little bit more about the scoring process that is this computing this one number that represents the represents the match between a query and a page that's we base this on what we call scoring signals and a signal is just some piece of information that's used in scoring we break these down into two to two categories the ones that are just based around the page so your page rank your language or if the page is mobile-friendly or things that are query query depended so things that take into account both the page and and what the user is searching for so keywords hits and synonyms and proximity all factored into this and so version two of what ranking engineers do is we either look for new signals or we combine all the old signals in new ways and both of those turn out to be really hard and interesting would be my summary all right but that doesn't get out how we determine what we want to do that's just how we do it and metrics are really the what we use as our guide Lord kelvins supposedly said if you cannot measure it you cannot improve it he actually said something that was much more Victorian and publishable in a science journal but the sort of popular version of quote is much easier to understand so we we measure ourselves in a whole lot of different directions the key metrics that I want to talk about today relevance does the page is the page useful at answering what the user was looking for and this is our top-line metric this is the one that we cite all the time internally this is the one that we compare ourselves usually to other search engines with and so this is this is this is the this is the big internal metric but we also have other ones such as quality how good are the results that we show how good are the individual pages more time to result where faster is better and so we have a lot of metrics that we measure ourselves on and I should I should mention that all these metrics are based on looking at the whole search results page rather than just one results at a time and to do this whole search results page there's just a convention that basically everybody who does search uses something like this which is position one is worth the most position two is worth half of what position one is position three is worth one third this is normally known as reciprocal rank it rank waiting and it goes on from there but so what do i ranking engineers do we try to optimize the metrics we try to improve the scores that we show we we try to improve the scores that we get on our on our metrics and to compute our metrics I apologize if I hit anybody with the laser pointer I keep hitting the wrong button where do the metrics actually come from we have an evaluation process that's based on two separate things live experiments and ubin radar experiments now live experiments are probably familiar to just about everybody in the webmaster community and we do largely the same things that most websites do with with live experiments which is we do a/b experiments on real traffic and then we look for changes in click patterns and I should just mention that we run a lot of experiments it is very rare if you do a search on Google and you're not in at least one experiment now not all of these are ranking experiments famously Google tested the color blue that we used on for links and and other blue highlighting with 41 different blues and came to and came to the conclusion what was the perfect blue and this caused a lot of designers some angst because I just wanted to trust their instinct and you can argue both sides of that that case but anyway do we do a lot of experiments I want to mention that that interpreting live experiments is actually a really challenging thing often and I'm going to take you through an example I apologize for the subscripts here that's just the conventions that we tend to use these are this is the only subscripts on my on my slides but considering that you have two pages page 1 and page 2 that are that are possible pages to show for some query that a user gives for page 1 the answer is on the page user clicks they'll see the answer they're good for page 2 the answers on the page but our stiffening algorithm also pulls the answer into the snippet and now we have two algorithms a puts p1 before p2 so the user going down they see p1 it looks like it could it could be a good result they click on it they go to the page our live experiment analysis says that's good we got a click and it was high up on the page algorithm B puts p2 before p1 the users going down the page they see p p2 they see the answer in the snippet they don't bother clicking to us at least to a naive interpretation that looks bad do we really believe that because actually we think the p2 was at least as good as p1 in terms of answering the question on the page and getting that you the user result an earlier is should be good and the it's quite challenging just to wish the case between the user left because the answer wasn't in the snippet or the B they didn't think the answer was there at all and the user left because they got a good answer in the snippet so live experiments are challenging but but very useful nonetheless the other thing we do is human rater experiments and this is until the this is there's a long history of doing experiments like this in information retrieval what we do is we show real people experimental search results and we ask those people how good are those results and I'll talk a little bit more about how we do that we get some ratings we average them across the Raiders we do this for large query sets so we get lots of we get lots of ratings and we get something that we think is statistically significant tools support this doing this in an automated way it's very similar to Mechanical Turk II processes that people do outside Google and we actually published guidelines that explain the criteria to raters we really want humans in the loop on this one because people have good intuitions people search for themselves and they have experience and they can tell what's a good search result and what's bad but we also need to explain what we're looking for and what we think our users are looking for it I'm gonna be as a budget as people may know we published our human rater guidelines last year there are about 160 pages of great detail about how we think about what's good for users if you're wondering why Google is doing something often the answer is to make it look more like what the rater guidelines say I'm gonna take I'm gonna give you a bunch of examples from the Raider guidelines over the next bunch of slides so for example this is actually what our radar tool looks like for the Raiders - the red arrows this is again pulled from the Raider guidelines basically they got a set of search results they get they get told what the query is on the top there's actually some information there about where the user is or where we think the user is and there's some sliders there that the raters can play with here's something here's an example of an actual rating item so they have a slider for what's called the needs met rating and a slider for the page quality rating so they get the s this in the context of a query set the sliders to where you think they where they think where you think they belong so they're these two scales needs met which is our version of Relan relevance these days which is does this page address the users need and then there's page quality which is how good is the page and you should be saying but you said you said you were mobile first why aren't you asking is this page mobile-friendly and actually all of our needs met instructions are about mobile user needs and we've so we give this general prefix needs met rating test asked readers to focus on mobile user needs and think about how helpful and satisfying the result is for the mobile users so that's implicit but we also make it mobile centric by using many more mobile queries than desktop queries in samples we actually over sample mobile right now the traffic is mobile has just past desktop but we have basically more than twice as many mobile queries as desktop in our samples um we pay attention to the user's location you saw that in the in the tool sample I showed before the tool of also display a mobile user experience and the Raiders actually visit the websites on their smartphones not on their desktop computers so there the Raiders are really getting a very mobile centric experience okay needs met rating I'm gonna start with the with the best category oh well here are the categories it goes from fully meets through highly moderately and slightly meets all the way down to fails to meet I left an extra s in there obviously fully meets this great fails to meet as is awful and we've got things in in the middle so two examples of fully meets you search for CNN we give you cnn.com that's a great result the user who's searching on Google for CNN probably wants something like that on the other hand we're we know we're in a mobile era we know people like apps a lot so if you search for Yelp and you have the Yelp of app installed on your phone you probably actually want to open the Yelp app or at least at least as as likely for fully meet we really wanted the case of an unambiguous query and something that can wholly satisfy whatever a user wants to do about that query so in this so in either in either case I think showing the Yelp website would probably be fully meets showing the CNN app would also be fully meets as well I'm actually gonna skip this slide for one second I hope there's way I can go back and go to highly meets this is highly meets is this is an informational query and this is a good source of information two of these happen to be from Wikipedia I think what I think ESPN would have been probably better highly meets example here but but anyway the but the idea is this is a great source of information its authoritative it's got got an X it's got some expertise to it it's probably kind of comprehensive for the query in question and this is what highly meets it's meant to be we actually have a category but we actually give the raters slider bars where they can go anywhere they want on them and we have a sort of very highly meets in between highly and fully that's meant to capture the idea that this would be fully meets if there weren't another great interpretation so this is two examples of the query Trader Joe's the first one shows a a map with three nearby stores the second one shows the Trader Joe's website the user might want the website so showing the map is not quite adequate the user might want the map showing the website is not quite adequate so we want to get to the we wanted to have a distinction of this is better than just the Wikipedia page about Trader Joe's which seems like a pretty useless thing by and large but we didn't want to be able to say hey you totally nailed this query by getting the map there and not getting the and not getting the website or vice-versa more highly so more examples of highly meets would be showing pictures on a query where we think the user is looking for four pictures showing a map for an ambiguous query this the query here is turmeric which is yes a spice but it's also a restaurant in Sunnyvale if the user is in Sunnyvale and they're searching for turmeric the map is probably a good guess we'd give that it we'd want to give that highly meets moderately meets is it's good information for the query Shutterfly the CrunchBase page about Shutterfly it's interesting it's certainly not the first thing but it might be useful in the first page similarly for Tom Cruise a fansite about Tom Cruise or a a star a general star sight about Tom Cruise that seems like a good sight it's not the most authoritative but it'll it'll have useful information slightly meets less good information in this case one of the examples I think is really spot-on search for a Honda Odyssey you get the Kelley Blue Book page about the 2010 Honda Odyssey the user didn't say 2010 they're possibly interested in 2010 they're more likely interested in something more recent so that would be or that would be an example so again this is this is acceptable but not great information and we'd hope there's better fails to meet is where it starts getting laughable search for German cars and get Subaru probably didn't mean that search searching for a rodent removal company and getting one half the world away probably not too useful we all have horror stories of our own but I there were three bugs that acted in concert about ten years ago before United and continental had merged such that you search for United Airlines and you get continental at position one I was responsible for two of the three bugs that were working in concert there and that was very embarrassing I mean I heard from relevance to page quality we've after a lot of iteration we've ended up at three important concepts that we think of for describing the quality of a page its expertise authoritative nough sand trustworthiness so is the author here an expert on what they're talking about it's the website or the webpage authoritative about it and can you trust it and then there's clearly some some queries where's medical or financial information is involved buying a product where trustworthiness is probably the most important of those three the rating scale here is from high quality to low quality and it's certain it's sort of obvious what we're looking for for high quality pages it's a satisfying amount of high quality main content it's got the expertise Authority Authority and trustworthiness and it and the website is a good reputation is is sort of a key principle they're low quality it's the it's the opposite a couple of other things to throw in a website that has explicit negative reputation and we all know about those those sorts of sites or the secondary content is distracting or unhelpful secondary content is largely ads and other other things not that's necessarily tied to the users information need so optimizing our metrics how do we do that we've got a team of a few hundred software engineers they are focused on our metrics and our signals they run lots and lots of experiments and they make a lot of changes yeah so that's and that's what that's what we spend our time doing the process is usually we start with an idea sometimes the idea is there's this set of problems I really want to solve or sometimes the idea is I've got this new source of data and I think it's really useful and then you repeat over and over you write some code you generate some data you run a bunch of experiments and you analyze the results of those experiments this is a typical software development process can take weeks can take months in that stage if you get it's the statistic you know it might not pan out a lot of things never pan out when it does pan out we get a launch report written we run some final experiments a launch report is written by a quant tative an analysis analyst who is basically a statistician and someone who's an expert at analyzing our experiments these reports are really really great because they summarize the first of all they're from a mostly objective perspective relative to the team that was working on the experiment so we get there they keep us honest and then there's the launch review process which is for the most part a meeting every Thursday morning I came I came here from from our launch review meeting where the leads in the area hear about what the project is trying to do here a summary from the analyst in theory we've read all of the dozen or so reports in practice hopefully one of us has read at least one of us has read one of read each of the reports and we try to debate basically what you know is this good for the users is this good for the system architecture are we going to be able to keep improving the system after one of these changes is made I'd like to think that that's a really kind and fair process the teams that come before launch review might disagree they've been known to be quite contentious we did the videotape a few years ago of one of those meetings so you can you could see one or two one or two items discussed there I'm so oh and I should actually mention I didn't I didn't put it on the slide but after once review assuming something has approved getting into production can be an easy thing some teams will will ship it the same week sometimes you have to rewrite your code to actually make it suitable for our production architecture one of the making things fast enough making things clean enough that can that can take a while I've known it to take months in the worst case I know about it took just shy of two years between approval and and actually launching so what a ranking engineers do by and large we try to move results with good human ratings and live experiment ratings up and move results with bad down what goes wrong with this and how do we fix it I'm gonna talk about two kinds of problems that we have there are there are others but these are sort of at the core of most of the things we've seen one is if we get systematically bad ratings or if the metrics don't capture the things we care about so bad ratings here's an example Texas farm fertilizer that's actually a brand of fertilizer I only learned that by looking at the search what we showed at position one was a three pack of local results at a map it is very unlikely the user doing this search is going to want to go to the manufacturers headquarters now I looked at the Street View look through the pictures you can actually drive your 18-wheeler up to the loading dock and get fertilizer that way it is unlikely that the user doing the search actually wanted to do that you can actually buy this in stores like Home Depot and Lowe's so that seems like a much more much more likely route maybe if somebody wanted to order it online I don't know but the Raiders on average called this pretty close to highly meats and that seemed crazy to us and then we actually saw a pattern of losses that we were getting here which is that in a whole bunch in a series of experiments especially that were that were increasing the triggering of Maps Raiders were saying this looks great so the team was taking that as wow we should show more maps and there was a cycle going on of experiments that were showing more maps and the Raiders saying yeah we like them webs there and that was that was a pattern that we saw we realized that it was wrong so we what we started doing was creating some more examples for the Raider guidelines and here's exactly that query showing in this case I think we actually are showing the front end of the store you can't see the loading dock in that little picture and told the Raiders actually this is a fails to meet the user is not looking for this now you could argue maybe it should have been the the the next category up or somewhere there but but we really wanted to get the message that if you don't think the user is gonna go there don't show them that we were having this issue around things like radio stations nobody wants or very rarely do you want to go to the radio station or Golu newspaper or go to the state lottery office when you search for the state lottery you're looking for the numbers another problem case missing metrics we were having a an issue with quality and this was this was particularly in particularly bad we think of it as around 2008 2009 to 2011 lots we were getting lots of complaints about low-quality content and they were right we were seeing the same low-quality thing but our relevance metrics kept going up and that's because the low quality pages can be very relevant this is basically the definition of a content form in our in our vision of the world so we thought we were doing great our numbers were saying we were doing great and we were delivering a terrible user experience and turned out we weren't measuring what we needed to so what we ended up doing was defining an explicit quality metric which got directly at the issue of quality it's not the same as relevance this is why we have that second slider there now and it enabled us to develop quality related signals separate from relevant signal separate from our relevant signals and really improve them independently so when the metrics missed something what ranking engineers need to do is fix the rating guidelines or develop new metrics and so with that I'm gonna say thanks\n", | |
"\n", | |
"-------------------\n", | |
"\n", | |
"Estimated cost of the summary: $0.65\n", | |
"Would you like to generate the summary?\n", | |
"\n" | |
] | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/plain": [ | |
"Button(description='Generate Summary', style=ButtonStyle())" | |
], | |
"application/vnd.jupyter.widget-view+json": { | |
"version_major": 2, | |
"version_minor": 0, | |
"model_id": "c61857b65e5a49219b54790606c63cc6" | |
} | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"\n", | |
"-------------------\n", | |
"Starting summary generation...\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"\n", | |
"---\n", | |
"\n", | |
"**Author:** Natzir, Technical SEO / Data Scientist\n", | |
"<br>**Twitter:** [@natzir9](https://twitter.com/natzir9)\n", | |
"\n", | |
"---" | |
], | |
"metadata": { | |
"id": "Ka4KjD4C-QQi" | |
} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment