Skip to content

Instantly share code, notes, and snippets.

@jirislav
Created May 16, 2022 06:54
Show Gist options
  • Save jirislav/2603780a3aebe1028ce23c870f2c5fd0 to your computer and use it in GitHub Desktop.
Save jirislav/2603780a3aebe1028ce23c870f2c5fd0 to your computer and use it in GitHub Desktop.
colab-introduction.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "colab-introduction.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/jirislav/2603780a3aebe1028ce23c870f2c5fd0/colab-introduction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"## Welcome to Google Colab notebook!\n",
"\n",
"Colab is an extension of Jupyter notebooks that allows its users to run their code directly from a browser."
],
"metadata": {
"id": "QwmcCdc-qVnq"
}
},
{
"cell_type": "code",
"source": [
"print('Hello, wolrd!')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "E0gWtX1NqxXO",
"outputId": "7f3d0524-4ec1-4215-a072-5bcecc6c0930"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Hello, wolrd!\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"1 + 2 + 3"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "B1xysoVpq5jA",
"outputId": "b5c0f0c4-5b95-40fa-feb0-a827c769af47"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"6"
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "code",
"source": [
"names = ['Tomas', 'Eliska', 'Honza']\n",
"for name in names:\n",
" print(f'Hello, {name}!')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9R2kzpRaryMh",
"outputId": "7cb770ac-97bb-4d2d-b37c-faa2f3c3bafd"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Hello, Tomas!\n",
"Hello, Eliska!\n",
"Hello, Honza!\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### A simple warmup task\n",
"\n",
"One of the first benchmark tasks to measure performance of different system, mostly in distributed environments.\n",
"The objective is to compute the count of occurrences for each word in a set of documents.\n",
"\n",
"In order to get to know Colab better, we'll try to compute a word count on a single file."
],
"metadata": {
"id": "C8lhJoS7sclQ"
}
},
{
"cell_type": "markdown",
"source": [
"#### Load a text file\n",
"\n",
"Load a `big-data-wiki.txt` file and split it into words. Make it simple and use a single space as a separator."
],
"metadata": {
"id": "V9KBv1Ctt2Y3"
}
},
{
"cell_type": "code",
"source": [
"!test -f big-data-wiki.txt || wget https://github.com/seznam/IT-akademie-bigdata/tree/main/big-data/data/big-data-wiki.txt\n",
"!test -d big-data-wiki.txt\n",
"!ls -l"
],
"metadata": {
"id": "0n_MFSUToGUP"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Here you can write your code"
],
"metadata": {
"id": "0M8_29ZYoSFc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Compute the word count\n",
"\n",
"Create a map with all the words from the text and number of their occurrences. Print out the most used words with their counts."
],
"metadata": {
"id": "r2p-PCehu68G"
}
},
{
"cell_type": "code",
"source": [
"# Here you can write your code"
],
"metadata": {
"id": "uBMWfeo1oVDE"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment