Skip to content

Instantly share code, notes, and snippets.

@AmosLewis
Created December 16, 2023 00:20
Show Gist options
  • Save AmosLewis/ba2823cc8d3b79574e4b766dd4d2f4f6 to your computer and use it in GitHub Desktop.
Save AmosLewis/ba2823cc8d3b79574e4b766dd4d2f4f6 to your computer and use it in GitHub Desktop.
(shark.venv) ➜ SHARK git:(main) ✗ python nan/qwen_compile.py
shark_tank local cache is located at /home/chi/.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 174/174 [00:00<00:00, 641kB/s]
tokenization_qwen.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.62k/9.62k [00:00<00:00, 36.3MB/s]
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat:
- tokenization_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
qwen.tiktoken: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.56M/2.56M [00:00<00:00, 13.8MB/s]
[DEBUG] generating mlir on device
/nodclouddata/chi/src/SHARK/nan/qwen_compile.py:104: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
input_ids = torch.tensor(input_ids)
[DEBUG] generating torchscript graph
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 911/911 [00:00<00:00, 4.69MB/s]
configuration_qwen.py: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.35k/2.35k [00:00<00:00, 12.8MB/s]
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat:
- configuration_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_qwen.py: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55.6k/55.6k [00:00<00:00, 125MB/s]
Traceback (most recent call last):
File "/nodclouddata/chi/src/SHARK/nan/qwen_compile.py", line 111, in <module>
model = Qwen(precision)
^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/nan/qwen_compile.py", line 55, in __init__
self.model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
model_class = get_class_from_dynamic_module(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 485, in get_class_from_dynamic_module
final_module = get_cached_module_file(
^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 313, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 179, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: transformers_stream_generator. Run `pip install transformers_stream_generator`
(shark.venv) ➜ SHARK git:(main) ✗ pip install transformers_stream_generator
Collecting transformers_stream_generator
Downloading transformers-stream-generator-0.0.4.tar.gz (12 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: transformers>=4.26.1 in ./shark.venv/lib/python3.11/site-packages (from transformers_stream_generator) (4.33.3)
Requirement already satisfied: filelock in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (3.13.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.15.1 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (0.19.4)
Requirement already satisfied: numpy>=1.17 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (1.26.2)
Requirement already satisfied: packaging>=20.0 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (23.2)
Requirement already satisfied: pyyaml>=5.1 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (2023.10.3)
Requirement already satisfied: requests in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (2.31.0)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (0.13.3)
Requirement already satisfied: safetensors>=0.3.1 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (0.3.1)
Requirement already satisfied: tqdm>=4.27 in ./shark.venv/lib/python3.11/site-packages (from transformers>=4.26.1->transformers_stream_generator) (4.66.1)
Requirement already satisfied: fsspec>=2023.5.0 in ./shark.venv/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.15.1->transformers>=4.26.1->transformers_stream_generator) (2023.12.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./shark.venv/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.15.1->transformers>=4.26.1->transformers_stream_generator) (4.9.0rc1)
Requirement already satisfied: charset-normalizer<4,>=2 in ./shark.venv/lib/python3.11/site-packages (from requests->transformers>=4.26.1->transformers_stream_generator) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./shark.venv/lib/python3.11/site-packages (from requests->transformers>=4.26.1->transformers_stream_generator) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./shark.venv/lib/python3.11/site-packages (from requests->transformers>=4.26.1->transformers_stream_generator) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in ./shark.venv/lib/python3.11/site-packages (from requests->transformers>=4.26.1->transformers_stream_generator) (2023.11.17)
Building wheels for collected packages: transformers_stream_generator
Building wheel for transformers_stream_generator (setup.py) ... done
Created wheel for transformers_stream_generator: filename=transformers_stream_generator-0.0.4-py3-none-any.whl size=12316 sha256=37dd3dd9180c5b57feb06c2b710cbc7bf3590aa826304543fdca2f603018b3c9
Stored in directory: /home/chi/.cache/pip/wheels/5a/c5/92/43af1fc20acdb3cbc708992aed015ae2992fd922898c4bef31
Successfully built transformers_stream_generator
Installing collected packages: transformers_stream_generator
Successfully installed transformers_stream_generator-0.0.4
(shark.venv) ➜ SHARK git:(main) ✗ python nan/qwen_compile.py
shark_tank local cache is located at /home/chi/.local/shark_tank/ . You may change this by setting the --local_tank_cache= flag
[DEBUG] generating mlir on device
/nodclouddata/chi/src/SHARK/nan/qwen_compile.py:104: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
input_ids = torch.tensor(input_ids)
[DEBUG] generating torchscript graph
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
cpp_kernels.py: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.92k/1.92k [00:00<00:00, 7.60MB/s]
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat:
- cpp_kernels.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
qwen_generation_utils.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14.6k/14.6k [00:00<00:00, 44.9MB/s]
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat:
- qwen_generation_utils.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat:
- cpp_kernels.py
- qwen_generation_utils.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
model.safetensors.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 19.5k/19.5k [00:00<00:00, 58.4MB/s]
model-00001-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1.96G/1.96G [00:14<00:00, 139MB/s]
model-00002-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02G/2.02G [00:13<00:00, 151MB/s]
model-00003-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02G/2.02G [00:13<00:00, 151MB/s]
model-00004-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02G/2.02G [00:15<00:00, 134MB/s]
model-00005-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02G/2.02G [00:13<00:00, 151MB/s]
model-00006-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02G/2.02G [00:13<00:00, 153MB/s]
model-00007-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02G/2.02G [00:15<00:00, 133MB/s]
model-00008-of-00008.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1.33G/1.33G [00:08<00:00, 152MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:47<00:00, 13.46s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:31<00:00, 3.93s/it]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 273/273 [00:00<00:00, 1.36MB/s]
Model before quantization: QWenLMHeadModel(
(transformer): QWenModel(
(wte): Embedding(151936, 4096)
(drop): Dropout(p=0.0, inplace=False)
(rotary_emb): RotaryEmbedding()
(h): ModuleList(
(0-31): 32 x QWenBlock(
(ln_1): RMSNorm()
(attn): QWenAttention(
(c_attn): Linear(in_features=4096, out_features=12288, bias=True)
(c_proj): Linear(in_features=4096, out_features=4096, bias=False)
(attn_dropout): Dropout(p=0.0, inplace=False)
)
(ln_2): RMSNorm()
(mlp): QWenMLP(
(w1): Linear(in_features=4096, out_features=11008, bias=False)
(w2): Linear(in_features=4096, out_features=11008, bias=False)
(c_proj): Linear(in_features=11008, out_features=4096, bias=False)
)
)
)
(ln_f): RMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=151936, bias=False)
)
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/brevitas/backport/fx/immutable_collections.py:114: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(immutable_dict, _immutable_dict_flatten, _immutable_dict_unflatten)
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/brevitas/backport/fx/immutable_collections.py:115: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_register_pytree_node(immutable_list, _immutable_list_flatten, _immutable_list_unflatten)
Applying weight quantization..
Weight quantization applied.
Model after quantization: QWenLMHeadModel(
(transformer): QWenModel(
(wte): Embedding(151936, 4096)
(drop): Dropout(p=0.0, inplace=False)
(rotary_emb): RotaryEmbedding()
(h): ModuleList(
(0-31): 32 x QWenBlock(
(ln_1): RMSNorm()
(attn): QWenAttention(
(c_attn): QuantLinear(
in_features=4096, out_features=12288, bias=True
(input_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(output_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(weight_quant): WeightQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
(tensor_quant): RescalingIntQuant(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
(scaling_impl): ExpandReshapeScalingWrapper(
(wrapped_scaling_impl): ParameterFromStatsFromParameterScaling(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): AbsMinMax(
(zero): StatelessBuffer()
)
)
)
(stats_scaling_impl): _StatsScaling(
(affine_rescaling): Identity()
(restrict_clamp_scaling): _RestrictClampValue(
(clamp_min_ste): ScalarClampMinSte()
(restrict_value_impl): FloatRestrictValue()
)
(restrict_scaling_pre): Identity()
)
(restrict_inplace_preprocess): Identity()
)
(slice_tensor): SliceTensor()
)
(int_scaling_impl): IntScaling()
(zero_point_impl): ExpandReshapeZeroPointWrapper(
(wrapped_zero_point_impl): ParameterFromStatsFromParameterZeroPoint(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): NegativeMinOrZero(
(zero): StatelessBuffer()
)
)
)
(scale_shift_zero_point): _ScaleShiftZeroPoint(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
)
)
(slice_tensor): SliceTensor()
)
(msb_clamp_bit_width_impl): BitWidthConst(
(bit_width): StatelessBuffer()
)
)
)
(bias_quant): BiasQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
)
(c_proj): QuantLinear(
in_features=4096, out_features=4096, bias=False
(input_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(output_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(weight_quant): WeightQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
(tensor_quant): RescalingIntQuant(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
(scaling_impl): ExpandReshapeScalingWrapper(
(wrapped_scaling_impl): ParameterFromStatsFromParameterScaling(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): AbsMinMax(
(zero): StatelessBuffer()
)
)
)
(stats_scaling_impl): _StatsScaling(
(affine_rescaling): Identity()
(restrict_clamp_scaling): _RestrictClampValue(
(clamp_min_ste): ScalarClampMinSte()
(restrict_value_impl): FloatRestrictValue()
)
(restrict_scaling_pre): Identity()
)
(restrict_inplace_preprocess): Identity()
)
(slice_tensor): SliceTensor()
)
(int_scaling_impl): IntScaling()
(zero_point_impl): ExpandReshapeZeroPointWrapper(
(wrapped_zero_point_impl): ParameterFromStatsFromParameterZeroPoint(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): NegativeMinOrZero(
(zero): StatelessBuffer()
)
)
)
(scale_shift_zero_point): _ScaleShiftZeroPoint(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
)
)
(slice_tensor): SliceTensor()
)
(msb_clamp_bit_width_impl): BitWidthConst(
(bit_width): StatelessBuffer()
)
)
)
(bias_quant): BiasQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
)
(attn_dropout): Dropout(p=0.0, inplace=False)
)
(ln_2): RMSNorm()
(mlp): QWenMLP(
(w1): QuantLinear(
in_features=4096, out_features=11008, bias=False
(input_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(output_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(weight_quant): WeightQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
(tensor_quant): RescalingIntQuant(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
(scaling_impl): ExpandReshapeScalingWrapper(
(wrapped_scaling_impl): ParameterFromStatsFromParameterScaling(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): AbsMinMax(
(zero): StatelessBuffer()
)
)
)
(stats_scaling_impl): _StatsScaling(
(affine_rescaling): Identity()
(restrict_clamp_scaling): _RestrictClampValue(
(clamp_min_ste): ScalarClampMinSte()
(restrict_value_impl): FloatRestrictValue()
)
(restrict_scaling_pre): Identity()
)
(restrict_inplace_preprocess): Identity()
)
(slice_tensor): SliceTensor()
)
(int_scaling_impl): IntScaling()
(zero_point_impl): ExpandReshapeZeroPointWrapper(
(wrapped_zero_point_impl): ParameterFromStatsFromParameterZeroPoint(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): NegativeMinOrZero(
(zero): StatelessBuffer()
)
)
)
(scale_shift_zero_point): _ScaleShiftZeroPoint(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
)
)
(slice_tensor): SliceTensor()
)
(msb_clamp_bit_width_impl): BitWidthConst(
(bit_width): StatelessBuffer()
)
)
)
(bias_quant): BiasQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
)
(w2): QuantLinear(
in_features=4096, out_features=11008, bias=False
(input_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(output_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(weight_quant): WeightQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
(tensor_quant): RescalingIntQuant(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
(scaling_impl): ExpandReshapeScalingWrapper(
(wrapped_scaling_impl): ParameterFromStatsFromParameterScaling(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): AbsMinMax(
(zero): StatelessBuffer()
)
)
)
(stats_scaling_impl): _StatsScaling(
(affine_rescaling): Identity()
(restrict_clamp_scaling): _RestrictClampValue(
(clamp_min_ste): ScalarClampMinSte()
(restrict_value_impl): FloatRestrictValue()
)
(restrict_scaling_pre): Identity()
)
(restrict_inplace_preprocess): Identity()
)
(slice_tensor): SliceTensor()
)
(int_scaling_impl): IntScaling()
(zero_point_impl): ExpandReshapeZeroPointWrapper(
(wrapped_zero_point_impl): ParameterFromStatsFromParameterZeroPoint(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): NegativeMinOrZero(
(zero): StatelessBuffer()
)
)
)
(scale_shift_zero_point): _ScaleShiftZeroPoint(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
)
)
(slice_tensor): SliceTensor()
)
(msb_clamp_bit_width_impl): BitWidthConst(
(bit_width): StatelessBuffer()
)
)
)
(bias_quant): BiasQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
)
(c_proj): QuantLinear(
in_features=11008, out_features=4096, bias=False
(input_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(output_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(weight_quant): WeightQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
(tensor_quant): RescalingIntQuant(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
(scaling_impl): ExpandReshapeScalingWrapper(
(wrapped_scaling_impl): ParameterFromStatsFromParameterScaling(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): AbsMinMax(
(zero): StatelessBuffer()
)
)
)
(stats_scaling_impl): _StatsScaling(
(affine_rescaling): Identity()
(restrict_clamp_scaling): _RestrictClampValue(
(clamp_min_ste): ScalarClampMinSte()
(restrict_value_impl): FloatRestrictValue()
)
(restrict_scaling_pre): Identity()
)
(restrict_inplace_preprocess): Identity()
)
(slice_tensor): SliceTensor()
)
(int_scaling_impl): IntScaling()
(zero_point_impl): ExpandReshapeZeroPointWrapper(
(wrapped_zero_point_impl): ParameterFromStatsFromParameterZeroPoint(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): NegativeMinOrZero(
(zero): StatelessBuffer()
)
)
)
(scale_shift_zero_point): _ScaleShiftZeroPoint(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
)
)
(slice_tensor): SliceTensor()
)
(msb_clamp_bit_width_impl): BitWidthConst(
(bit_width): StatelessBuffer()
)
)
)
(bias_quant): BiasQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
)
)
)
)
(ln_f): RMSNorm()
)
(lm_head): QuantLinear(
in_features=4096, out_features=151936, bias=False
(input_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(output_quant): ActQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
(weight_quant): WeightQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
(tensor_quant): RescalingIntQuant(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
(scaling_impl): ExpandReshapeScalingWrapper(
(wrapped_scaling_impl): ParameterFromStatsFromParameterScaling(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): AbsMinMax(
(zero): StatelessBuffer()
)
)
)
(stats_scaling_impl): _StatsScaling(
(affine_rescaling): Identity()
(restrict_clamp_scaling): _RestrictClampValue(
(clamp_min_ste): ScalarClampMinSte()
(restrict_value_impl): FloatRestrictValue()
)
(restrict_scaling_pre): Identity()
)
(restrict_inplace_preprocess): Identity()
)
(slice_tensor): SliceTensor()
)
(int_scaling_impl): IntScaling()
(zero_point_impl): ExpandReshapeZeroPointWrapper(
(wrapped_zero_point_impl): ParameterFromStatsFromParameterZeroPoint(
(parameter_list_stats): _ParameterListStats(
(first_tracked_param): _ViewParameterWrapper(
(view_shape_impl): OverSubChannelBlockView(
(permute_impl): Identity()
)
)
(stats): _Stats(
(stats_impl): NegativeMinOrZero(
(zero): StatelessBuffer()
)
)
)
(scale_shift_zero_point): _ScaleShiftZeroPoint(
(int_quant): IntQuant(
(float_to_int_impl): RoundSte()
(tensor_clamp_impl): TensorClampSte()
(delay_wrapper): DelayWrapper(
(delay_impl): _NoDelay()
)
)
)
)
(slice_tensor): SliceTensor()
)
(msb_clamp_bit_width_impl): BitWidthConst(
(bit_width): StatelessBuffer()
)
)
)
(bias_quant): BiasQuantProxyFromInjector(
(_zero_hw_sentinel): StatelessBuffer()
)
)
)
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/brevitas/backport/fx/experimental/proxy_tensor.py:97: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
pytree._register_pytree_node(torch.Size, lambda x: (list(x), None), lambda xs, _: tuple(xs))
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/torch/utils/_pytree.py:254: UserWarning: <class 'torch.Size'> is already registered as pytree node. Overwriting the previous registration.
warnings.warn(
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/torch/_tensor.py:1394: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1908.)
return super().rename(names)
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
found an upcasting block let's upcast it.
[DEBUG] Compiling torchscript graph
[DEBUG] Lowering Torch -> Linalg
[DEBUG] Successfully Generated mlir on device
[DEBUG] converting to bytecode
Saved falcon mlir at qwen-7b-int4.mlir
Compiling for device : cpu-task
Configuring for device:cpu-task
Target triple found:x86_64-linux-gnu
Traceback (most recent call last):
File "/nodclouddata/chi/src/SHARK/nan/qwen_compile.py", line 170, in <module>
path = shark_module.save_module(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark/shark_inference.py", line 213, in save_module
return export_iree_module_to_vmfb(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark/iree_utils/compile_utils.py", line 554, in export_iree_module_to_vmfb
flatbuffer_blob = compile_module_to_flatbuffer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark/iree_utils/compile_utils.py", line 338, in compile_module_to_flatbuffer
flatbuffer_blob = ireec.compile_file(
^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/core.py", line 255, in compile_file
result = invoke_immediate(cl)
^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/binaries.py", line 198, in invoke_immediate
raise CompilerToolError(process)
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: -11
Diagnostics:
Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.
Stack dump:
0. Program arguments: /nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-compile qwen-7b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true
#0 0x00007fa686a32cbd llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:11
#1 0x00007fa686a331ab PrintStackTraceSignalHandler(void*) /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:798:1
#2 0x00007fa686a31236 llvm::sys::RunSignalHandlers() /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/lib/Support/Signals.cpp:105:5
#3 0x00007fa686a33935 SignalHandler(int) /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
#4 0x00007fa67b666420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
#5 0x00007fa686864874 llvm::detail::PunnedPointer<mlir::Type>::asInt() const /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:41:5
#6 0x00007fa6868647d5 llvm::detail::PunnedPointer<mlir::Type>::operator long() const /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:45:41
#7 0x00007fa686864775 llvm::PointerIntPair<mlir::Type, 3u, mlir::detail::ValueImpl::Kind, llvm::PointerLikeTypeTraits<mlir::Type>, llvm::PointerIntPairInfo<mlir::Type, 3u, llvm::PointerLikeTypeTraits<mlir::Type>>>::getPointer() const /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:94:58
#8 0x00007fa686864449 mlir::detail::ValueImpl::getType() const /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/include/mlir/IR/Value.h:63:45
#9 0x00007fa686899ec8 mlir::Value::getType() const /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/include/mlir/IR/Value.h:125:39
#10 0x00007fa688fc6990 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::QuantizedMatmulRewriter::precondition() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:330:61
#11 0x00007fa688fc62de mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:767:18
#12 0x00007fa688fc5aaa mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::FuseDequantizationMatmulPass::runOnOperation() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:843:18
#13 0x00007fa686e2f47b mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1::operator()() const /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:17
#14 0x00007fa686e2f415 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:5
#15 0x00007fa6869622c9 llvm::function_ref<void ()>::operator()() const /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:5
#16 0x00007fa686e32385 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:276:3
#17 0x00007fa686e2ac33 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:509:17
#18 0x00007fa686e2b1b4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:569:16
#19 0x00007fa686e306c8 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:789:36
#20 0x00007fa686e30349 mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo>>>, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo>>>, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo>>>, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:46:18
#21 0x00007fa686e2c46b mlir::LogicalResult mlir::failableParallelForEach<std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo>>&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&>(mlir::MLIRContext*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo>>&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:92:10
#22 0x00007fa686e2bd79 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:799:14
#23 0x00007fa686e2b8a7 mlir::detail::OpToOpPassAdaptor::runOnOperation(bool) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:690:5
#24 0x00007fa686e2f466 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1::operator()() const /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:501:11
#25 0x00007fa686e2f415 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:5
#26 0x00007fa6869622c9 llvm::function_ref<void ()>::operator()() const /nodclouddata/chi/src/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:5
#27 0x00007fa686e32385 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:276:3
#28 0x00007fa686e2ac33 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:509:17
#29 0x00007fa686e2b1b4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:569:16
#30 0x00007fa686e2cbf9 mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:880:10
#31 0x00007fa686e2cb22 mlir::PassManager::run(mlir::Operation*) /nodclouddata/chi/src/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:860:60
#32 0x00007fa6868baacf mlir::iree_compiler::embed::(anonymous namespace)::Invocation::runPipeline(iree_compiler_pipeline_t) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:958:27
#33 0x00007fa6868ba3a3 ireeCompilerInvocationPipeline /nodclouddata/chi/src/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:1388:3
#34 0x00007fa686daeb40 mlir::iree_compiler::runIreecMain(int, char**)::$_0::operator()(iree_compiler_source_t*) const /nodclouddata/chi/src/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:247:11
#35 0x00007fa686dae087 mlir::iree_compiler::runIreecMain(int, char**) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:348:9
#36 0x00007fa6868faa5b ireeCompilerRunMain /nodclouddata/chi/src/iree/compiler/src/iree/compiler/API/Internal/IREECompileToolEntryPoint.cpp:12:3
#37 0x00005599fcd2d7f2 main /nodclouddata/chi/src/iree/compiler/bindings/python/IREECompileTool.c:9:35
#38 0x00007fa67b484083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#39 0x00005599fcd2d70e _start (/nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-compile+0x170e)
Invoked with:
iree-compile /nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-compile qwen-7b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true
Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment