Skip to content

Instantly share code, notes, and snippets.

@tnwei
Last active October 21, 2021 15:16
Show Gist options
  • Save tnwei/5d843bc1fdc20c6995ba5038ba1f8e83 to your computer and use it in GitHub Desktop.
Save tnwei/5d843bc1fdc20c6995ba5038ba1f8e83 to your computer and use it in GitHub Desktop.

Background

goal is to get a better understanding of how knowledge neurons develop through training. The motivating theory here is that by combining methods in an interactive tool, we can better understand specific circuits, or at least prune the search space of things to try. But there are a lot of ways to combine the methods, so I think another effective use of effort is to experiment with radically different interfaces.

Methods used

TODO: Brief explanation

Tool overview

Data

  • The current focus of the tool is to visualize how GPT-2 stores knowledge about specific topics.
  • The dataset used are descriptions of cities with more than 100k inhabitants, scraped from Wikipedia (link). Totals to ~4.4k cities.
  • The first 1k characters of each cities' Wikipedia page are scraped and cleaned, then ran through the encoder from the GPT-2 repo. Each prompt is truncated at the first 30 tokens, and serialized to disk for further use.
  • First 5 examples from the dataset:
0 / 4386 https://en.wikipedia.org/wiki/Kabul : Kabul is the capital and largest city of Afghanistan, located in the eastern section of the country. It is also a municipality, forming part of the
1 / 4386 https://en.wikipedia.org/wiki/Herat : Herāt is an oasis city and the third-largest city of Afghanistan. In 2020, it had an estimated population of 574,276
2 / 4386 https://en.wikipedia.org/wiki/Kandahar : Kandahar is a city in Afghanistan, located in the south of the country on the Arghandab River, at an elevation of 1,
3 / 4386 https://en.wikipedia.org/wiki/Mazar-i-Sharif : Mazār-i-Sharīf, also called Mazār-e Sharīf, or just Mazar, is the fourth
4 / 4386 https://en.wikipedia.org/wiki/Jalalabad : Jalalabad is the fifth-largest city of Afghanistan. It has a population of about 356,274, and serves as the capital of N

Extracting latent info

  • Each prompt is encoded and passed through GPT-2 to extract activations and (TODO: logit lens)
  • For each MLP (TODO: full name), kept track of neurons which fired at any point during the entire sequence, and at what activation
  • (TODO: Extract logit lens)
  • Clustering is done after compressing latent vectors down to 50 dims w/ PCA, then further compressed to 2 dims using tSNE

Tool interface

Using the explorer

Left pane

  • Search for a city in the dataset in the left pane. Relevant prompts from the dataset will appear for selection.

Middle pane

  • The middle pane will display the logits / activations per layer for the selected prompt.
  • Notice the slider at the bottom of the middle pane; the activations shown correspond to the current underlined token in the selected prompt.
  • The neurons are displayed layer-wise, with activation strength represented by a green hue.

TODO: Yet to figure out:

  • High probability token shown on the left side of the activations?
  • The cell numbers and network structure are different for each prompt?

Right pane

  • If a neuron is selected, the top portion of the right pane displays other relevant prompts that show strong activation on the same neuron. The attention strength (TODO: verify) of each token is similarly represented by a green hue.
  • The bottom portion of the right pane displays (TODO: activation weighted by self attention??) of the tokens in the selected sentence, wrt the selected neuron.

Using the neuron clustering view

The neuron cluster displays the latent vectors of the dataset condensed to 2 dimensions. When a neuron is selected, the right pane displays other relevant prompts that show strong activation on the same neuron.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment