marii-moe/Untitled1.ipynb

## Untitled1.ipynb
{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "My general take on the modules from the paper, im only on the start of section 3, so mostly sharing thoughts. \n\nConfigurator- A module that takes as input state from all other modules, and outputs some state for all other modules. This one mostly seems to be responsible for handling inter module communication, sort of filtering out and compressing relevant information for all the other modules. Kind of wonder if it just has one output for module, and each module \"cross-attends\" to this information. \n\nPerception-Primed by Configurator. Seems close to a cnn, but maybe each layer of activations are shared in some way. \n\nWorld Model- The big one. Needs to be able to both to make plausible predictions, and represent uncertainty. How do we train it for this? (the car example from slides). Needs to be able to predeict some latent representation that represents multiple plausible world states. \n\nCost - both differentiable btw (input world state)\nIntrinsic cost- immutable,non-trainable. Configured by configurator. Reward function or loss function. This is where we build in behaviors we are looking to get. Should be modeled by energy(idk why, because I don't understand energy models)\nCritic- Uses past states and short-term memory. Trains itself using past states to predict intrinsic costs of world states from short-term memory. (can be configured like everything)\n\nShort-term Memory- Can be accessed by the world model, and the critic uses it to train. Paper suggests it would act like key-value networks, that seems to give a weighted average based on a match with the keys of the values... [Key Value Memory Networks] https://arxiv.org/pdf/1606.03126.pdf\n\nActor- Proposes action sequences, the world model predicts the world states associated with these actions, and feeds to Cost module. This can be differentiated through.\n Mode-1 predicts directly from memory and short term, while mode 2 has world model(char limit)."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "",
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3 (ipykernel)",
      "language": "python"
    },
    "language_info": {
      "name": "python",
      "version": "3.9.16",
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "nbconvert_exporter": "python",
      "file_extension": ".py"
    },
    "gist": {
      "id": "",
      "data": {
        "description": "Untitled1.ipynb",
        "public": true
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
	{
	"cells": [
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "My general take on the modules from the paper, im only on the start of section 3, so mostly sharing thoughts. \n\nConfigurator- A module that takes as input state from all other modules, and outputs some state for all other modules. This one mostly seems to be responsible for handling inter module communication, sort of filtering out and compressing relevant information for all the other modules. Kind of wonder if it just has one output for module, and each module \"cross-attends\" to this information. \n\nPerception-Primed by Configurator. Seems close to a cnn, but maybe each layer of activations are shared in some way. \n\nWorld Model- The big one. Needs to be able to both to make plausible predictions, and represent uncertainty. How do we train it for this? (the car example from slides). Needs to be able to predeict some latent representation that represents multiple plausible world states. \n\nCost - both differentiable btw (input world state)\nIntrinsic cost- immutable,non-trainable. Configured by configurator. Reward function or loss function. This is where we build in behaviors we are looking to get. Should be modeled by energy(idk why, because I don't understand energy models)\nCritic- Uses past states and short-term memory. Trains itself using past states to predict intrinsic costs of world states from short-term memory. (can be configured like everything)\n\nShort-term Memory- Can be accessed by the world model, and the critic uses it to train. Paper suggests it would act like key-value networks, that seems to give a weighted average based on a match with the keys of the values... [Key Value Memory Networks] https://arxiv.org/pdf/1606.03126.pdf\n\nActor- Proposes action sequences, the world model predicts the world states associated with these actions, and feeds to Cost module. This can be differentiated through.\n Mode-1 predicts directly from memory and short term, while mode 2 has world model(char limit)."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "",
	"execution_count": null,
	"outputs": []
	}
	],
	"metadata": {
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3 (ipykernel)",
	"language": "python"
	},
	"language_info": {
	"name": "python",
	"version": "3.9.16",
	"mimetype": "text/x-python",
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"pygments_lexer": "ipython3",
	"nbconvert_exporter": "python",
	"file_extension": ".py"
	},
	"gist": {
	"id": "",
	"data": {
	"description": "Untitled1.ipynb",
	"public": true
	}
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}