Samba-1 Turbo 25.7.1-MP1

Release version: 25.7.1-MP1 | Release date: 08/13/2025


The Samba-1 Turbo 25.7.1-MP1 (Model Pack 1) release expands integration, delivers key fixes, and introduces new model options and improvements.

New and updated model versions

New models

Developer/Model ID Type Mode Context length (batch size) Features and optimizations RDU architecture RDU count View on Hugging Face

DeepSeek-V3-0324

Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

QwQ-32B

Text

Inference

  • 4096 (1,2,4)

  • 8192 (1, 2, 4)

  • 16384 (1, 2, 4)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Whisper-Large-v3

Audio

Inference

448 (1, 16, 32)

  • Endpoint: Translation, Transcription

  • Capabilities: None

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Meta-Llama-3.1-70B-SD-Llama-3.2-1B-16k

Text

Inference

16384 (1, 2, 4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16-16k

Text

Inference

16384 (1, 2, 4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Updated models

Developer/Model ID Type Mode Context length (batch size) Features and optimizations RDU architecture RDU count View on Hugging Face

Meta-Llama-3.1-405B-Instruct

Text

Inference

  • 4096 (1, 2, 4)

  • 8192 (1)

  • 16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Meta-Llama-3.1-70B-Instruct

Text

Inference

  • 4096 (1, 2, 4, 8)

  • 8192 (1, 2, 4, 8)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-8

8

Model card

Meta-Llama-3.1-70B-Instruct

Text

Inference

  • 4096 (1, 2, 4, 8)

  • 8192 (1, 2, 4, 8)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

8

Model card

Meta-Llama-3.1-70B-Instruct

Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Meta-Llama-3.3-70B-Instruct

Text

Inference

  • 4096 (1, 2, 4, 8)

  • 8192 (1, 2, 4, 8)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-8

8

Model card

Meta-Llama-3.3-70B-Instruct

Text

Inference

  • 4096 (1, 2, 4, 8)

  • 8192 (1, 2, 4, 8)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

8

Model card

Meta-Llama-3.3-70B-Instruct

Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Performance and quality improvements

  • Fixed an issue where function calling was not working correctly for Llama 3.1, Llama 3.3, Llama 4, and Deepseek-V3.

  • Fixed an issue where the models in the QwQ-32B-SD-Qwen-2.5-QWQ-0.5B group were not generating accurate responses.

  • Fixed an issue where a CoE bundle with multiple different Llama 3 8B checkpoints negatively affected response accuracy.

  • Fixed an issue where models only supporting non-chat mode appeared as selectable experts in the Playground UI.

List of renamed models

The following table lists the models that were renamed in this release.

Old name New name

DeepSeek-V3

DeepSeek-V3-0324

QwQ-32B-Preview

QwQ-32B

QwQ-32B-Preview-SD-Qwen-2.5-QWQ-0.5B

QwQ-32B-SD-Qwen-2.5-QWQ-0.5B

SD pair changes

The following table summarizes sequence length changes for key SD pairs in this release.

SD pair Max supported sequence length

Meta-Llama-3.1-70B-SD-Llama-3.2-1B

8k

Meta-Llama-3.1-70B-SD-Llama-3.2-1B-16k

16k

Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16

8k

Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16-16k

16k

Custom SD pairs with Llama 3.1/3.3 70B target models

8k

Known issues

  • CoE bundles can fail if they contain a standalone model that uses the same PEF as another draft model of an SD pair in the bundle.

  • Function calling results may be inaccurate for DeepSeek R1 and Qwen 3.

  • Whisper:

    • UI only shows chat/completions API; it should offer audio_transcribe and audio_translate options.

    • No UI playground support.

    • Translation not working.

    • Transcription results may be poor with large audio files.

  • TP8 SD pairs might not work.