Samba-1 Turbo 25.10.1-MP1

Release version: 25.10.1-MP1 | Release date: 10/25/2025


The Samba-1 Turbo 25.10.1-MP1 (Model Pack 1) release enhances model functionality and deployment efficiency, enabling key features and consolidating model variants for streamlined operation, while addressing known stability issues for a more reliable AI platform experience.

Prerequisite

The prerequisite for this release is:

  • Studio Version 25.6.2-RC1

New and updated model versions

New models

Developer/Model ID Type Mode Context length (batch size) Features and optimizations RDU architecture RDU count View on Hugging Face

DeepSeek-R1-0528

Reasoning, Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-16384

Reasoning, Text

Inference

16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-32768

Reasoning, Text

Inference

32768 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-4096

Reasoning, Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-8192

Reasoning, Text

Inference

8192 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-16384

Text

Inference

16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-32768

Text

Inference

32768 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-4096

Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-8192

Text

Inference

8192 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1

Reasoning, Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-16384

Reasoning, Text

Inference

16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-32768

Reasoning, Text

Inference

32768 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-4096

Reasoning, Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-8192

Reasoning, Text

Inference

8192 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus

Reasoning, Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-16384

Reasoning, Text

Inference

16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-32768

Reasoning, Text

Inference

32768 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-4096

Reasoning, Text

Inference

4096 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-8192

Reasoning, Text

Inference

8192 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Updated models

Developer/Model ID Type Mode Context length (batch size) Features and optimizations RDU architecture RDU count Speculative decoding View on Hugging Face Requires new endpoint?

DeepSeek-R1-Distill-Llama-70B

Reasoning, Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8)

  • 32768 (1, 2, 4)

  • 65536 (1)

  • 131072 (1)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

e5-mistral-7B-instruct

Embedding

Inference

  • 8192 (1, 4, 8)

  • 32768 (1, 4, 8)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-8

8

False

Model card

Yes*

e5-mistral-7B-instruct

Embedding

Inference

  • 8192 (1, 4, 8)

  • 32768 (1, 4, 8)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

e5-mistral-7B-instruct

Embedding

Inference

  • 4096 (1, 4, 8, 16, 32)

  • 8192 (1, 4, 8, 16, 32)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

e5-mistral-7b-instruct-8192

Embedding

Inference

  • 8192 (1, 4, 8)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-8

8

False

Model card

Yes*

e5-mistral-7b-instruct-8192

Embedding

Inference

  • 8192 (1, 4, 8)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

8

False

Model card

Yes*

e5-mistral-7b-instruct-8192

Embedding

Inference

  • 4096 (1, 4, 8, 16, 32)

  • 8192 (1, 4, 8, 16, 32)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

Llama-3.1-Tulu-3-405B

Text

Inference

  • 4096 (1, 2, 4)

  • 8192 (1)

  • 16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

True

Model card

Yes*

Llama-4-Maverick-17B-128E-Instruct

Image, Text

Inference

  • 8192 (1)

  • 16384 (1)

  • 32768 (1)

  • 65536 (1)

  • 131072 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

Llama-4-Maverick-17B-128E-Instruct-bs4

Image, Text

Inference

8192 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

False

Model card

No

Meta-Llama-3-8B

Text

Inference

  • 4096 (1, 2, 4, 8)

  • 8192 (1, 2, 4, 8, 16)

  • Endpoint: Completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

False

Model card

No

Meta-Llama-3-8B-Instruct

Text

Inference

  • 4096 (1, 2, 4, 8)

  • 8192 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

False

Model card

No

Meta-Llama-3.1-405B-Instruct

Text

Inference

  • 4096 (1, 2, 4)

  • 8192 (1)

  • 16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

Meta-Llama-3.1-70B-Instruct

Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8)

  • 16384 (1, 2, 4)

  • 32768 (1, 2, 4)

  • 65536 (1)

  • 131072 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

Meta-Llama-3.2-1B-Instruct

Text

Inference

  • 4096 (1, 4, 8, 16, 32)

  • 8192 (1, 4, 8, 16, 32)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: No

  • Optimizations: None

SN40L-8

8

False

Model card

Yes*

Meta-Llama-3.2-1B-Instruct

Text

Inference

  • 4096 (1, 4, 8, 16, 32)

  • 8192 (1, 4, 8, 16, 32)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

Meta-Llama-3.2-1B-Instruct

Text

Inference

  • 4096 (1, 2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • 16384 (1, 2, 4, 8, 10)

  • 32768 (1, 2, 4)

  • 65536 (1)

  • 131072 (1)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

Meta-Llama-3.2-3B-Instruct-TP16

Text

Inference

  • 4096 (1, 2, 4, 8, 10, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • 16384 (1, 2, 4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

Meta-Llama-3.3-70B-Instruct

Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8)

  • 16384 (1, 2, 4)

  • 32768 (1, 2, 4)

  • 65536 (1)

  • 131072 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

Qwen3-32B

Text

Inference

  • 8192 (1, 4)

  • 16384 (1)

  • 32768 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

False

Model card

Yes*

If a model update is listed as requiring a new endpoint, endpoints already active during the upgrade will continue functioning normally. However, any endpoints that are inactive at the time of upgrade, or stopped afterward, must be re-created once the new version of the model is downloaded. This applies equally to SambaNova-provided and user-created CoEs containing the updated model. Prebuilt CoEs from SambaNova are automatically refreshed to the new version, while user-created CoEs must be rebuilt after downloading the updated model.

Deprecated Models

Model checkpoint name Model version RDU architecture Model parallel RDUs Speculative decoding Mode Note

DeepSeek-R1

2

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528

DeepSeek-R1-16384

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-16384

DeepSeek-R1-32768

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-32768

DeepSeek-R1-4096

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-4096

DeepSeek-R1-8192

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-8192

DeepSeek-V3-16384

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-16384

DeepSeek-V3-32768

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-32768

DeepSeek-V3-4096

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-4096

DeepSeek-V3-8192

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-8192

Llama-4-Maverick-17B-128E-Instruct-131072

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Llama-4-Maverick-17B-128E-Instruct-16384

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Llama-4-Maverick-17B-128E-Instruct-32768

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Llama-4-Maverick-17B-128E-Instruct-65536

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Performance and quality improvements

  • Function calling is now enabled for DeepSeek-R1-0528 and Qwen3-32B.

  • Llama-4-Maverick-17B-128E-Instruct batch size 1 (8k, 16k, 32k, 64k, 128k) models are consolidated into a single model, instead of separate CoEs for each sequence length.

  • Llama-4-Maverick-17B-128E-Instruct batch size 4 remains a separate model currently.

Known issues

  • Llama 3.2 1B does not support function calling.

  • CoE bundles may fail if containing a standalone model using the same PEF as another draft model of an SD pair in the bundle.

  • Whisper UI shows only chat/completions API; it should show audio_transcribe and audio_translate options.

    • No UI playground support for Whisper.

    • Whisper translation is currently not working.

    • Whisper transcription quality may be poor for large audio files.

  • TP8 SD pairs might not work.

  • DP training does not work on SN40-16 with RHEL 8.10.