Skip to content

Instantly share code, notes, and snippets.

@toranb
Last active October 21, 2023 14:34
Show Gist options
  • Save toranb/74162d2e300d08ccc398bb04848bbf24 to your computer and use it in GitHub Desktop.
Save toranb/74162d2e300d08ccc398bb04848bbf24 to your computer and use it in GitHub Desktop.
Zephyr 7B with bumblebee when PR 264 lands
def serving() do
mistral = {:hf, "HuggingFaceH4/zephyr-7b-alpha"}
{:ok, spec} = Bumblebee.load_spec(mistral, module: Bumblebee.Text.Mistral, architecture: :for_causal_language_modeling)
{:ok, model_info} = Bumblebee.load_model(mistral, spec: spec, backend: {EXLA.Backend, client: :host})
{:ok, tokenizer} = Bumblebee.load_tokenizer(mistral, module: Bumblebee.Text.LlamaTokenizer)
{:ok, generation_config} = Bumblebee.load_generation_config(mistral, spec_module: Bumblebee.Text.Mistral)
generation_config = Bumblebee.configure(generation_config, max_new_tokens: 500)
Bumblebee.Text.generation(model_info, tokenizer, generation_config, defn_options: [compiler: EXLA])
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment