Samba-1 Turbo 25.7.1-MP1

Release version: 25.7.1-MP1 | Release date: 08/13/2025

The Samba-1 Turbo 25.7.1-MP1 (Model Pack 1) release expands integration, delivers key fixes, and introduces new model options and improvements.

New and updated model versions

New models

Developer/Model ID	Type	Mode	Context length (batch size)	Features and optimizations	RDU architecture	RDU count	View on Hugging Face
DeepSeek-V3-0324	Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
QwQ-32B	Text	Inference	4096 (1,2,4) 8192 (1, 2, 4) 16384 (1, 2, 4)	Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
Whisper-Large-v3	Audio	Inference	448 (1, 16, 32)	Endpoint: Translation, Transcription Capabilities: None Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
Meta-Llama-3.1-70B-SD-Llama-3.2-1B-16k	Text	Inference	16384 (1, 2, 4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	Model card
Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16-16k	Text	Inference	16384 (1, 2, 4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	Model card

Developer/Model ID

Type

Mode

Context length (batch size)

Features and optimizations

RDU architecture

RDU count

View on Hugging Face

DeepSeek-V3-0324

Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

Model card

QwQ-32B

Text

Inference

4096 (1,2,4)
8192 (1, 2, 4)
16384 (1, 2, 4)

Endpoint: Chat completions
Capabilities: None
Import checkpoint: No
Optimizations: None

SN40L-16

Model card

Whisper-Large-v3

Audio

Inference

448 (1, 16, 32)

Endpoint: Translation, Transcription
Capabilities: None
Import checkpoint: No
Optimizations: None

SN40L-16

Model card

Meta-Llama-3.1-70B-SD-Llama-3.2-1B-16k

Text

Inference

16384 (1, 2, 4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

Model card

Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16-16k

Text

Inference

16384 (1, 2, 4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

Model card

Updated models

Developer/Model ID	Type	Mode	Context length (batch size)	Features and optimizations	RDU architecture	RDU count	View on Hugging Face
Meta-Llama-3.1-405B-Instruct	Text	Inference	4096 (1, 2, 4) 8192 (1) 16384 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	Model card
Meta-Llama-3.1-70B-Instruct	Text	Inference	4096 (1, 2, 4, 8) 8192 (1, 2, 4, 8)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-8	8	Model card
Meta-Llama-3.1-70B-Instruct	Text	Inference	4096 (1, 2, 4, 8) 8192 (1, 2, 4, 8)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	8	Model card
Meta-Llama-3.1-70B-Instruct	Text	Inference	4096 (2, 4, 8, 16, 32) 8192 (1, 2, 4, 8, 16)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	Model card
Meta-Llama-3.3-70B-Instruct	Text	Inference	4096 (1, 2, 4, 8) 8192 (1, 2, 4, 8)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-8	8	Model card
Meta-Llama-3.3-70B-Instruct	Text	Inference	4096 (1, 2, 4, 8) 8192 (1, 2, 4, 8)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	8	Model card
Meta-Llama-3.3-70B-Instruct	Text	Inference	4096 (2, 4, 8, 16, 32) 8192 (1, 2, 4, 8, 16)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	Model card

Developer/Model ID

Type

Mode

Context length (batch size)

Features and optimizations

RDU architecture

RDU count

View on Hugging Face

Meta-Llama-3.1-405B-Instruct

Text

Inference

4096 (1, 2, 4)
8192 (1)
16384 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

Model card

Meta-Llama-3.1-70B-Instruct

Text

Inference

4096 (1, 2, 4, 8)
8192 (1, 2, 4, 8)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-8

Model card

Meta-Llama-3.1-70B-Instruct

Text

Inference

4096 (1, 2, 4, 8)
8192 (1, 2, 4, 8)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

Model card

Meta-Llama-3.1-70B-Instruct

Text

Inference

4096 (2, 4, 8, 16, 32)
8192 (1, 2, 4, 8, 16)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

Model card

Meta-Llama-3.3-70B-Instruct

Text

Inference

4096 (1, 2, 4, 8)
8192 (1, 2, 4, 8)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-8

Model card

Meta-Llama-3.3-70B-Instruct

Text

Inference

4096 (1, 2, 4, 8)
8192 (1, 2, 4, 8)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

Model card

Meta-Llama-3.3-70B-Instruct

Text

Inference

4096 (2, 4, 8, 16, 32)
8192 (1, 2, 4, 8, 16)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

Model card

Performance and quality improvements

Fixed an issue where function calling was not working correctly for Llama 3.1, Llama 3.3, Llama 4, and Deepseek-V3.
Fixed an issue where the models in the QwQ-32B-SD-Qwen-2.5-QWQ-0.5B group were not generating accurate responses.
Fixed an issue where a CoE bundle with multiple different Llama 3 8B checkpoints negatively affected response accuracy.
Fixed an issue where models only supporting non-chat mode appeared as selectable experts in the Playground UI.

List of renamed models

The following table lists the models that were renamed in this release.

Old name	New name
DeepSeek-V3	DeepSeek-V3-0324
QwQ-32B-Preview	QwQ-32B
QwQ-32B-Preview-SD-Qwen-2.5-QWQ-0.5B	QwQ-32B-SD-Qwen-2.5-QWQ-0.5B

Old name

New name

DeepSeek-V3

DeepSeek-V3-0324

QwQ-32B-Preview

QwQ-32B

QwQ-32B-Preview-SD-Qwen-2.5-QWQ-0.5B

QwQ-32B-SD-Qwen-2.5-QWQ-0.5B

SD pair changes

The following table summarizes sequence length changes for key SD pairs in this release.

SD pair	Max supported sequence length
Meta-Llama-3.1-70B-SD-Llama-3.2-1B	8k
Meta-Llama-3.1-70B-SD-Llama-3.2-1B-16k	16k
Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16	8k
Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16-16k	16k
Custom SD pairs with Llama 3.1/3.3 70B target models	8k

SD pair

Max supported sequence length

Meta-Llama-3.1-70B-SD-Llama-3.2-1B

Meta-Llama-3.1-70B-SD-Llama-3.2-1B-16k

16k

Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16

Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16-16k

16k

Custom SD pairs with Llama 3.1/3.3 70B target models

Known issues

CoE bundles can fail if they contain a standalone model that uses the same PEF as another draft model of an SD pair in the bundle.
Function calling results may be inaccurate for DeepSeek R1 and Qwen 3.
Whisper:
- UI only shows chat/completions API; it should offer audio_transcribe and audio_translate options.
- No UI playground support.
- Translation not working.
- Transcription results may be poor with large audio files.
TP8 SD pairs might not work.