Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save saburbutt/3b4919c5800248df695f6ec71b43f6bf to your computer and use it in GitHub Desktop.
Save saburbutt/3b4919c5800248df695f6ec71b43f6bf to your computer and use it in GitHub Desktop.
Model Parameters for extractive QA with SQuAD
SpanBert Parameter Details: attention probs dropout_prob: 0.1, directionality: bidi, hidden_act: gelu,
hidden dropout prob: 0.1, hidden size: 1024, initializer range: 0.02, intermediate size: 4096,
layer norm eps: 1e-12, max position embeddings: 512, num attention heads: 16, num hidden layers: 24,
pad token id: 0, pooler fc size: 768, pooler num attention heads: 12, pooler num fc layers: 3,
pooler size per head: 128, pooler type: "first_token_transform", type vocab size: 2, vocab size: 28996
Albert Parameter Details: $attentionprobs dropoutprob:0, bos token id:2, classifier dropout prob: 0.1,
down scale factor: 1, embedding size: 128, eos token id: 3, gap size: 0, hidden act: "gelu",
hidden dropout prob: 0, hidden size: 4096, initializer range: 0.02, inner group num: 1,
intermediate size: 16384, layer norm eps: 1e-12, max position embeddings: 512, net_structure_type: 0,
num_attention_heads: 64, num_hidden_groups: 1, num_hidden_layers: 12, num_memory_blocks: 0,
output_past: true, pad_token_id: 0, type_vocab_size: 2, vocab_size: 30000$
BERT Parameter Details: $attention probs dropout prob: 0.1, hidden act: gelu, hidden dropout prob: 0.1,
hidden size: 1024, initializer range: 0.02, intermediate size: 4096, layer norm eps: 1e-12,
max position embeddings: 512, num attention heads: 16,
num hidden layers: 24, pad token id: 0, type vocab size: 2, vocab size: 30522$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment