Skip to content

Instantly share code, notes, and snippets.

@aldur
Last active March 11, 2026 18:10
Show Gist options
  • Select an option

  • Save aldur/94378954caa0829bc5cb5dcca6962379 to your computer and use it in GitHub Desktop.

Select an option

Save aldur/94378954caa0829bc5cb5dcca6962379 to your computer and use it in GitHub Desktop.
Qwen3.5-27B
bash-5.3$ ./sandbox.sh llm chat
Chatting with llama-server
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
Type '!edit' to open your default editor and modify the prompt
Type '!fragment <my_fragment> [<another_fragment> ...]' to insert one or more fragments
> !edit
Here is a detailed proofread of your blog post. Overall, it's a solid technical piece with a clear value proposition. However, there are several grammatical errors, numbering inconsistencies, and a few technical phrasing issues that should be addressed to make it polished.
### 🔴 Critical Errors & Inconsistencies
1. **Model Name Consistency:**
* **Issue:** You refer to the model as **Qwen3.5** in the title, excerpt, and intro, but switch to **Qwen3** in the Conclusion.
* **Fix:** Standardize on `Qwen3.5` throughout the post (or `Qwen 3.5` for readability).
* **Location:** Conclusion ("On a MacBook Pro... Qwen3 `35B-A3B`...").
2. **List Numbering:**
* **Issue:** In the "Sandboxing" and "Local clients" sections, your Markdown lists use `1.` for every item, rendering as `1, 1, 1`.
* **Fix:** Change the second and third items in these lists to `2.` and `3.`.
* **Location:** "Sandboxing" section (list of 3 items) and "Local clients" section (list of 2 items).
3. **HTTP Protocol Typo:**
* **Issue:** In the QEMU VM code block, the curl command is missing a slash in the protocol.
* **Fix:** Change `http:/10.0.2.2` to `http://10.0.2.2`.
* **Location:** "In a QEMU VM" code block.
4. **Technical Inaccuracy (Bytecode):**
* **Issue:** You mention allowing access to "executable bytecode". On macOS (and Linux), native executables are typically Mach-O or ELF binaries, not "bytecode" (which implies Java, Python, etc.).
* **Fix:** Change to "the executable binary" or just "the executable".
* **Location:** "Server" section ("...allows access to the executable bytecode...").
### 🟠 Grammar & Syntax
1. **Verb Agreement:**
* **Original:** "The `sandbox.sh` scripts automatically **takes** care..."
* **Correction:** "The `sandbox.sh` scripts automatically **take** care..." (Plural subject "scripts" needs plural verb).
* **Original:** "Now that the big AI companies **using** it..."
* **Correction:** "Now that the big AI companies **are using** it..."
2. **Spelling & Word Choice:**
* **Original:** "...has **lead** to [code execution vulnerabilities]..."
* **Correction:** "...has **led** to..." (Past participle of lead).
* **Original:** "...guarantees both integrity and **confidentially**..."
* **Correction:** "...guarantees both integrity and **confidentiality**..." (Noun form needed).
* **Original:** "...64GB of **RAMs**..."
* **Correction:** "...64GB of **RAM**..." (RAM is uncountable).
* **Original:** "...these relatively small **model** are able..."
* **Correction:** "...these relatively small **models** are able..."
* **Original:** "...preventing `opencode` **to leak** your prompts..."
* **Correction:** "...preventing `opencode` **from leaking** your prompts..."
3. **Sentence Structure:**
* **Original:** "...so `llama-server` from the host **available** at `10.0.2.2:8080`..."
* **Correction:** "...so `llama-server` from the host **is available** at `10.0.2.2:8080`..." (Missing verb).
* **Original:** "YOLO **modo**"
* **Correction:** "YOLO **mode**" (Unless you specifically want the Italian/Spanish flair, "mode" is the standard English term for this context).
### 🟡 Flow & Clarity
1. **Excerpt Clarity:**
* **Current:** "...plus local Qwen3.5 from a QEMU VM."
* **Critique:** This implies the model runs *inside* the VM. Your post explains the model runs on the host and is accessed *by* the VM.
* **Suggestion:** "...plus local Qwen3.5 **accessed via** a QEMU VM."
2. **Confusing Parenthetical:**
* **Current:** "...preventing `opencode` to leak your prompts (that's how it gives a name to sessions)."
* **Critique:** This parenthetical is confusing. Does `opencode` name sessions based on the prompts? If the sandbox prevents leaking, how does it name them?
* **Suggestion:** Clarify or remove. If the point is that the sandbox stops the prompts from leaving the machine, the session naming part might be distracting.
3. **The "Intel CPUs" Non-Sequitur:**
* **Current:** "Plus, their OCR capabilities are impressive, even on Intel CPUs!"
* **Critique:** You are testing on an M3 Max MacBook. Mentioning Intel CPUs here feels like a non-sequitur or a typo. Did you test it elsewhere?
* **Suggestion:** If you haven't tested on Intel, remove the reference to avoid confusing readers. If you mean the model *architecture* supports Intel, rephrase: "...impressive across different architectures, including Intel CPUs."
4. **Code Block Artifact:**
* **Current:** `{"status":"ok"}⏎`
* **Critique:** The `⏎` symbol looks like an artifact from a terminal copy-paste.
* **Suggestion:** Remove the symbol to keep the code block clean.
5. **Footnote Link:**
* **Current:** `manpath=macOS+26.3` (Footnote 3)
* **Critique:** macOS version numbers are currently around 15 (Sequoia). `26.3` looks suspicious (possibly a FreeBSD version number mixed in?).
* **Suggestion:** Verify the link or remove the version number if it's dynamic.
### 🟢 Specific Edits (Before/After)
**Section: Sandboxing**
> *Before:* "when they can wreak havoc in YOLO modo (`--dangerously-skip-permissions`)."
> *After:* "when they can wreak havoc in YOLO mode (`--dangerously-skip-permissions`)."
> *Before:* "since parsing the GGUF files representing the models has lead to [code execution vulnerabilities][10]";"
> *After:* "since parsing GGUF files has led to [code execution vulnerabilities][10]";"
**Section: Local clients**
> *Before:* "The `sandbox.sh` scripts automatically takes care of the configuration files..."
> *After:* "The `sandbox.sh` script automatically takes care of the configuration files..." (Or "scripts... take").
**Section: In a QEMU VM**
> *Before:* "so `llama-server` from the host available at `10.0.2.2:8080`:"
> *After:* "so `llama-server` from the host is available at `10.0.2.2:8080`:"
**Section: Conclusion**
> *Before:* "On a MacBook Pro with an M3 Max and 64GB of RAMs, Qwen3 `35B-A3B` is quick enough..."
> *After:* "On a MacBook Pro with an M3 Max and 64GB of RAM, Qwen3.5 `35B-A3B` is quick enough..."
> *Before:* "Unsurprisingly, the quality of results is lower than frontier hosted models, but it's a big step forward: these relatively small model are able to make small, interactive changes."
> *After:* "Unsurprisingly, the quality of results is lower than frontier hosted models, but it's a big step forward: these relatively small models are capable of making small, interactive edits."
### Summary
The content is excellent and the "default-deny" approach is a great hook. Once you fix the grammar, list numbering, and model name consistency, this will be a very high-quality technical post. Good luck!
srv params_from_: Chat format: peg-native
slot get_availabl: id 3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 3 | task 0 | processing task, is_child = 0
slot update_slots: id 3 | task 0 | new prompt, n_ctx_slot = 262144, n_keep = 0, task.n_tokens = 11
slot update_slots: id 3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot init_sampler: id 3 | task 0 | init sampler, took 0.00 ms, tokens: text = 11, total = 11
slot update_slots: id 3 | task 0 | prompt processing done, n_tokens = 11, batch.n_tokens = 11
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
slot print_timing: id 3 | task 0 |
prompt eval time = 1229.25 ms / 11 tokens ( 111.75 ms per token, 8.95 tokens per second)
eval time = 18463.71 ms / 194 tokens ( 95.17 ms per token, 10.51 tokens per second)
total time = 19692.96 ms / 205 tokens
slot release: id 3 | task 0 | stop processing: n_tokens = 204, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-native
slot get_availabl: id 2 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 2 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 2 | task 195 | processing task, is_child = 0
slot update_slots: id 2 | task 195 | new prompt, n_ctx_slot = 262144, n_keep = 0, task.n_tokens = 2573
slot update_slots: id 2 | task 195 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 2 | task 195 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.795958
slot update_slots: id 2 | task 195 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id 2 | task 195 | prompt processing progress, n_tokens = 2061, batch.n_tokens = 13, progress = 0.801010
slot update_slots: id 2 | task 195 | n_tokens = 2061, memory_seq_rm [2061, end)
slot init_sampler: id 2 | task 195 | init sampler, took 0.23 ms, tokens: text = 2573, total = 2573
slot update_slots: id 2 | task 195 | prompt processing done, n_tokens = 2573, batch.n_tokens = 512
slot update_slots: id 2 | task 195 | created context checkpoint 1 of 32 (pos_min = 2060, pos_max = 2060, n_tokens = 2061, size = 149.626 MiB)
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
slot print_timing: id 2 | task 195 |
prompt eval time = 14795.45 ms / 2573 tokens ( 5.75 ms per token, 173.90 tokens per second)
eval time = 688980.45 ms / 5453 tokens ( 126.35 ms per token, 7.91 tokens per second)
total time = 703775.90 ms / 8026 tokens
slot release: id 2 | task 195 | stop processing: n_tokens = 8025, truncated = 0
srv update_slots: all slots are idle
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000
llama_model_loader: - kv 5: general.name str = Qwen3.5-27B
llama_model_loader: - kv 6: general.basename str = Qwen3.5-27B
llama_model_loader: - kv 7: general.quantized_by str = Unsloth
llama_model_loader: - kv 8: general.size_label str = 27B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-2...
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.base_model.count u32 = 1
llama_model_loader: - kv 13: general.base_model.0.name str = Qwen3.5 27B
llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3.5-27B
llama_model_loader: - kv 16: general.tags arr[str,3] = ["qwen3_5_moe", "unsloth", "image-tex...
llama_model_loader: - kv 17: qwen35.block_count u32 = 64
llama_model_loader: - kv 18: qwen35.context_length u32 = 262144
llama_model_loader: - kv 19: qwen35.embedding_length u32 = 5120
llama_model_loader: - kv 20: qwen35.feed_forward_length u32 = 17408
llama_model_loader: - kv 21: qwen35.attention.head_count u32 = 24
llama_model_loader: - kv 22: qwen35.attention.head_count_kv u32 = 4
llama_model_loader: - kv 23: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 24: qwen35.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 25: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 26: qwen35.attention.key_length u32 = 256
llama_model_loader: - kv 27: qwen35.attention.value_length u32 = 256
llama_model_loader: - kv 28: qwen35.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 29: qwen35.ssm.state_size u32 = 128
llama_model_loader: - kv 30: qwen35.ssm.group_count u32 = 16
llama_model_loader: - kv 31: qwen35.ssm.time_step_rank u32 = 48
llama_model_loader: - kv 32: qwen35.ssm.inner_size u32 = 6144
llama_model_loader: - kv 33: qwen35.full_attention_interval u32 = 4
llama_model_loader: - kv 34: qwen35.rope.dimension_count u32 = 64
llama_model_loader: - kv 35: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 36: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 37: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 38: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 39: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 40: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 41: tokenizer.ggml.padding_token_id u32 = 248055
llama_model_loader: - kv 42: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 43: general.quantization_version u32 = 2
llama_model_loader: - kv 44: general.file_type u32 = 7
llama_model_loader: - kv 45: quantize.imatrix.file str = Qwen3.5-27B-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv 46: quantize.imatrix.dataset str = unsloth_calibration_Qwen3.5-27B.txt
llama_model_loader: - kv 47: quantize.imatrix.entries_count u32 = 496
llama_model_loader: - kv 48: quantize.imatrix.chunks_count u32 = 80
llama_model_loader: - type f32: 353 tensors
llama_model_loader: - type q8_0: 498 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 26.62 GiB (8.50 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load: - 248044 ('<|endoftext|>')
load: - 248046 ('<|im_end|>')
load: - 248063 ('<|fim_pad|>')
load: - 248064 ('<|repo_name|>')
load: - 248065 ('<|file_sep|>')
load: special tokens cache size = 33
load: token to piece cache size = 1.7581 MB
print_info: arch = qwen35
print_info: vocab_only = 0
print_info: no_alloc = 0
print_info: n_ctx_train = 262144
print_info: n_embd = 5120
print_info: n_embd_inp = 5120
print_info: n_layer = 64
print_info: n_head = 24
print_info: n_head_kv = 4
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 256
print_info: n_embd_head_v = 256
print_info: n_gqa = 6
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 17408
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: n_expert_groups = 0
print_info: n_group_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 40
print_info: rope scaling = linear
print_info: freq_base_train = 10000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 262144
print_info: rope_yarn_log_mul = 0.0000
print_info: rope_finetuned = unknown
print_info: mrope sections = [11, 11, 10, 0]
print_info: ssm_d_conv = 4
print_info: ssm_d_inner = 6144
print_info: ssm_d_state = 128
print_info: ssm_dt_rank = 48
print_info: ssm_n_group = 16
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 27B
print_info: model params = 26.90 B
print_info: general.name = Qwen3.5-27B
print_info: vocab type = BPE
print_info: n_vocab = 248320
print_info: n_merges = 247587
print_info: BOS token = 11 ','
print_info: EOS token = 248046 '<|im_end|>'
print_info: EOT token = 248046 '<|im_end|>'
print_info: PAD token = 248055 '<|vision_pad|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 248060 '<|fim_prefix|>'
print_info: FIM SUF token = 248062 '<|fim_suffix|>'
print_info: FIM MID token = 248061 '<|fim_middle|>'
print_info: FIM PAD token = 248063 '<|fim_pad|>'
print_info: FIM REP token = 248064 '<|repo_name|>'
print_info: FIM SEP token = 248065 '<|file_sep|>'
print_info: EOG token = 248044 '<|endoftext|>'
print_info: EOG token = 248046 '<|im_end|>'
print_info: EOG token = 248063 '<|fim_pad|>'
print_info: EOG token = 248064 '<|repo_name|>'
print_info: EOG token = 248065 '<|file_sep|>'
print_info: max token length = 256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment