Skip to content

Instantly share code, notes, and snippets.

@aflah02
Created June 21, 2024 17:15
Show Gist options
  • Save aflah02/c49085bab78d420a424767ed02c1ba8b to your computer and use it in GitHub Desktop.
Save aflah02/c49085bab78d420a424767ed02c1ba8b to your computer and use it in GitHub Desktop.
- 2024-06-21T17:11:38.027+00:00 {"timestamp":"2024-06-21T17:11:38.027318Z","level":"INFO","fields":{"message":"Args {\n model_id: \"/repository\",\n revision: None,\n validation_workers: 2,\n sharded: None,\n num_shard: None,\n quantize: None,\n speculate: None,\n dtype: None,\n trust_remote_code: false,\n max_concurrent_requests: 128,\n max_best_of: 2,\n max_stop_sequences: 4,\n max_top_n_tokens: 5,\n max_input_tokens: None,\n max_input_length: Some(\n 1024,\n ),\n max_total_tokens: Some(\n 1512,\n ),\n waiting_served_ratio: 0.3,\n max_batch_prefill_tokens: Some(\n 2048,\n ),\n max_batch_total_tokens: None,\n max_waiting_tokens: 20,\n max_batch_size: None,\n cuda_graphs: None,\n hostname: \"r-tspeicher-llama-3-8b-bnb-4bit-non-ft-linjyfhb-4da22-ai0d4\",\n port: 80,\n shard_uds_path: \"/tmp/text-generation-server\",\n master_addr: \"localhost\",\n master_port: 29500,\n huggingface_hub_cache: Some(\n \"/repository/cache\",\n ),\n weights_cache_override: None,\n disable_custom_kernels: false,\n cuda_memory_fraction: 1.0,\n rope_scaling: None,\n rope_factor: None,\n json_output: true,\n otlp_endpoint: None,\n cors_allow_origin: [],\n watermark_gamma: None,\n watermark_delta: None,\n ngrok: false,\n ngrok_authtoken: None,\n ngrok_edge: None,\n tokenizer_config_path: None,\n disable_grammar_support: false,\n env: false,\n max_client_batch_size: 4,\n}"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:38.027+00:00 {"timestamp":"2024-06-21T17:11:38.027410Z","level":"INFO","fields":{"message":"Using default cuda graphs [1, 2, 4, 8, 16, 32]"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:38.027+00:00 {"timestamp":"2024-06-21T17:11:38.027509Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
- 2024-06-21T17:11:41.755+00:00 {"timestamp":"2024-06-21T17:11:41.755381Z","level":"INFO","fields":{"message":"Files are already present on the host. Skipping download.\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:42.732+00:00 {"timestamp":"2024-06-21T17:11:42.731891Z","level":"INFO","fields":{"message":"Successfully downloaded weights."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
- 2024-06-21T17:11:42.732+00:00 {"timestamp":"2024-06-21T17:11:42.732110Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
- 2024-06-21T17:11:46.401+00:00 {"timestamp":"2024-06-21T17:11:46.401457Z","level":"WARN","fields":{"message":"Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:46.891+00:00 {"timestamp":"2024-06-21T17:11:46.891342Z","level":"INFO","fields":{"message":"Unknown quantization method bitsandbytes\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:48.561+00:00 {"timestamp":"2024-06-21T17:11:48.561710Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 333, in get_model\n return FlashLlama(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py\", line 84, in __init__\n model = FlashLlamaForCausalLM(prefix, config, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 385, in __init__\n self.model = FlashLlamaModel(prefix, config, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 309, in __init__\n [\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 310, in <listcomp>\n FlashLlamaLayer(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 249, in __init__\n self.self_attn = FlashLlamaAttention(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 126, in __init__\n self.query_key_value = load_attention(config, prefix, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 43, in load_attention\n return _load_gqa(config, prefix, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 85, in _load_gqa\n assert list(weight.shape) == [\nAssertionError: [12582912, 1] != [6144, 4096]\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:49.551+00:00 {"timestamp":"2024-06-21T17:11:49.550928Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\nThe tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. \nThe tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. \nThe class this function is called from is 'LlamaTokenizer'.\nYou are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565\nSpecial tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.\n warnings.warn(\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 649, in run_until_complete\n return future.result()\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 333, in get_model\n return FlashLlama(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py\", line 84, in __init__\n model = FlashLlamaForCausalLM(prefix, config, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 385, in __init__\n self.model = FlashLlamaModel(prefix, config, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 309, in __init__\n [\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 310, in <listcomp>\n FlashLlamaLayer(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 249, in __init__\n self.self_attn = FlashLlamaAttention(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 126, in __init__\n self.query_key_value = load_attention(config, prefix, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 43, in load_attention\n return _load_gqa(config, prefix, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 85, in _load_gqa\n assert list(weight.shape) == [\n\nAssertionError: [12582912, 1] != [6144, 4096]\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
- 2024-06-21T17:11:49.650+00:00 {"timestamp":"2024-06-21T17:11:49.650194Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:49.650+00:00 {"timestamp":"2024-06-21T17:11:49.650241Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:49.650+00:00 Error: ShardCannotStart
- 2024-06-21T17:11:50.034+00:00 {"timestamp":"2024-06-21T17:11:50.034613Z","level":"INFO","fields":{"message":"Args {\n model_id: \"/repository\",\n revision: None,\n validation_workers: 2,\n sharded: None,\n num_shard: None,\n quantize: None,\n speculate: None,\n dtype: None,\n trust_remote_code: false,\n max_concurrent_requests: 128,\n max_best_of: 2,\n max_stop_sequences: 4,\n max_top_n_tokens: 5,\n max_input_tokens: None,\n max_input_length: Some(\n 1024,\n ),\n max_total_tokens: Some(\n 1512,\n ),\n waiting_served_ratio: 0.3,\n max_batch_prefill_tokens: Some(\n 2048,\n ),\n max_batch_total_tokens: None,\n max_waiting_tokens: 20,\n max_batch_size: None,\n cuda_graphs: None,\n hostname: \"r-tspeicher-llama-3-8b-bnb-4bit-non-ft-linjyfhb-4da22-ai0d4\",\n port: 80,\n shard_uds_path: \"/tmp/text-generation-server\",\n master_addr: \"localhost\",\n master_port: 29500,\n huggingface_hub_cache: Some(\n \"/repository/cache\",\n ),\n weights_cache_override: None,\n disable_custom_kernels: false,\n cuda_memory_fraction: 1.0,\n rope_scaling: None,\n rope_factor: None,\n json_output: true,\n otlp_endpoint: None,\n cors_allow_origin: [],\n watermark_gamma: None,\n watermark_delta: None,\n ngrok: false,\n ngrok_authtoken: None,\n ngrok_edge: None,\n tokenizer_config_path: None,\n disable_grammar_support: false,\n env: false,\n max_client_batch_size: 4,\n}"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:50.034+00:00 {"timestamp":"2024-06-21T17:11:50.034688Z","level":"INFO","fields":{"message":"Using default cuda graphs [1, 2, 4, 8, 16, 32]"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:50.034+00:00 {"timestamp":"2024-06-21T17:11:50.034781Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
- 2024-06-21T17:11:53.588+00:00 {"timestamp":"2024-06-21T17:11:53.588334Z","level":"INFO","fields":{"message":"Files are already present on the host. Skipping download.\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:54.348+00:00 {"timestamp":"2024-06-21T17:11:54.348617Z","level":"INFO","fields":{"message":"Successfully downloaded weights."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
- 2024-06-21T17:11:54.348+00:00 {"timestamp":"2024-06-21T17:11:54.348866Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
- 2024-06-21T17:11:57.921+00:00 {"timestamp":"2024-06-21T17:11:57.921570Z","level":"WARN","fields":{"message":"Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:11:58.398+00:00 {"timestamp":"2024-06-21T17:11:58.398216Z","level":"INFO","fields":{"message":"Unknown quantization method bitsandbytes\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:12:00.073+00:00 {"timestamp":"2024-06-21T17:12:00.072919Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 333, in get_model\n return FlashLlama(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py\", line 84, in __init__\n model = FlashLlamaForCausalLM(prefix, config, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 385, in __init__\n self.model = FlashLlamaModel(prefix, config, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 309, in __init__\n [\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 310, in <listcomp>\n FlashLlamaLayer(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 249, in __init__\n self.self_attn = FlashLlamaAttention(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 126, in __init__\n self.query_key_value = load_attention(config, prefix, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 43, in load_attention\n return _load_gqa(config, prefix, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 85, in _load_gqa\n assert list(weight.shape) == [\nAssertionError: [12582912, 1] != [6144, 4096]\n"},"target":"text_generation_launcher"}
- 2024-06-21T17:12:01.057+00:00 {"timestamp":"2024-06-21T17:12:01.057032Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\nThe tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. \nThe tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. \nThe class this function is called from is 'LlamaTokenizer'.\nYou are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565\nSpecial tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.\n warnings.warn(\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 649, in run_until_complete\n return future.result()\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 333, in get_model\n return FlashLlama(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py\", line 84, in __init__\n model = FlashLlamaForCausalLM(prefix, config, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 385, in __init__\n self.model = FlashLlamaModel(prefix, config, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 309, in __init__\n [\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 310, in <listcomp>\n FlashLlamaLayer(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 249, in __init__\n self.self_attn = FlashLlamaAttention(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 126, in __init__\n self.query_key_value = load_attention(config, prefix, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 43, in load_attention\n return _load_gqa(config, prefix, weights)\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 85, in _load_gqa\n assert list(weight.shape) == [\n\nAssertionError: [12582912, 1] != [6144, 4096]\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
- 2024-06-21T17:12:01.156+00:00 {"timestamp":"2024-06-21T17:12:01.156245Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
- 2024-06-21T17:12:01.156+00:00 {"timestamp":"2024-06-21T17:12:01.156284Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
- 2024-06-21T17:12:01.156+00:00 Error: ShardCannotStart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment