Samba-1 Turbo 25.9.1-MP1

Release version: 25.9.1-MP1 | Release date: 09/24/2025


The Samba-1 Turbo 25.9.1-MP1 (Model Pack 1) release introduces key fixes for configuration registration, compatibility updates for RHEL 8.10, and improves endpoint stability and function call accuracy, while maintaining support for advanced CoE bundles and Whisper audio features.

Prerequisite

The prerequisite for this release is:

  • Studio Version 25.6.1-RC9

New and updated model versions

New models

Developer/Model ID Type Mode Context length (batch size) Features and optimizations RDU architecture RDU count View on Hugging Face

gpt-oss-120b

Reasoning, Text

Inference

8192 (2)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

gpt-oss-120b-131072

Reasoning, Text

Inference

131072 (2)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

gpt-oss-120b-65536

Reasoning, Text

Inference

65536 (2)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Llama-4-Maverick-17B-128E-Instruct-32768

Vision, Text

Inference

32768 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Llama-4-Maverick-17B-128E-Instruct-bs4

Vision, Text

Inference

8192 (4)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Updated models

Developer/Model ID Type Mode Context length (batch size) Features and optimizations RDU architecture RDU count View on Hugging Face Update

e5-mistral-7B-instruct

Embedding

Inference

  • 4096 (1, 4, 8, 16, 32)

  • 8192 (1, 4, 8, 16, 32)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

Model card

Minor fix

e5-mistral-7b-instruct-8192

Embedding

Inference

  • 4096 (1, 4, 8, 16, 32)

  • 8192 (1, 4, 8, 16, 32)

  • Endpoint: Embeddings

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

Model card

Minor fix

DeepSeek-R1-Distill-Llama-70B

Reasoning, Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Minor fix

Meta-Llama-3.1-70B-Instruct

Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: Yes

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Minor fix

DeepSeek-R1-Distill-Llama-8B

Text

Inference

  • 4096 (1, 2, 4, 8, 16)

  • 8192 (1, 2, 4, 8, 16, 32)

  • 16384 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

Model card

Minor fix

Llama-3.1-Tulu-3-8B

Text

Inference

  • 4096 (1, 2, 4, 8, 16)

  • 8192 (1, 2, 4, 8, 16, 32)

  • 16384 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

Model card

Minor fix

Meta-Llama-3.1-8B-Instruct

Text

Inference

  • 4096 (1, 2, 4, 8, 16)

  • 8192 (1, 2, 4, 8, 16, 32)

  • 16384 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

Model card

Minor fix

Meta-Llama-Guard-3-8B

Text

Inference

  • 4096 (1, 2, 4, 8, 16)

  • 8192 (1, 2, 4, 8, 16, 32)

  • 16384 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: Function calling

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

Model card

Minor fix

Meta-Llama-3.2-3B-Instruct-TP16

Text

Inference

  • 4096 (1, 2, 4, 8, 10, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • 16384 (1, 2, 4)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

16

Model card

Minor fix

Meta-Llama-3.3-70B-Instruct

Text

Inference

  • 4096 (2, 4, 8, 16, 32)

  • 8192 (1, 2, 4, 8, 16)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: Yes

  • Optimizations: Speculative decoding

SN40L-16

16

Model card

Minor fix

Llama-4-Maverick-17B-128E-Instruct

Vision, Text

Inference

8192 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Checkpoint updated

Llama-4-Maverick-17B-128E-Instruct-131072

Vision, Text

Inference

131072 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Checkpoint updated

Llama-4-Maverick-17B-128E-Instruct-16384

Vision, Text

Inference

16384 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Checkpoint updated

Llama-4-Maverick-17B-128E-Instruct-65536

Vision, Text

Inference

65536 (1)

  • Endpoint: Chat completions

  • Capabilities: Function calling, JSON mode

  • Import checkpoint: No

  • Optimizations: None

SN40L-16

16

Model card

Checkpoint updated

Mistral-7B-Instruct-V0.2

Text

Inference

  • 512 (1, 4, 8, 16, 32)

  • 2048 (1, 4, 8, 16, 32)

  • 4096 (1, 4, 8, 16, 32)

  • 32768 (1, 4, 8)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-8

8

Model card

Minor fix

Mistral-7B-Instruct-V0.2

Text

Inference

  • 512 (1, 4, 8, 16, 32)

  • 2048 (1, 4, 8, 16, 32)

  • 4096 (1, 4, 8, 16, 32)

  • 32768 (1, 4, 8)

  • Endpoint: Chat completions

  • Capabilities: None

  • Import checkpoint: Yes

  • Optimizations: None

SN40L-16

8

Model card

Minor fix

Performance and quality improvements

  • Fixed an issue where e5-instruct 4k and 8k configurations were incorrectly registered.

  • Compatible with RHEL8.10.

Known issues

  • Checkpoint download may be marked complete before the file finishes downloading; starting the endpoint before the download completes can cause crashes. Once the download completes, the endpoint should set up normally. Download times vary by checkpoint size and bandwidth.

  • CoE bundles may fail if they contain a standalone model using the same PEF as a draft model of an SD pair in the bundle.

  • Function calling results may be inaccurate for DeepSeek R1 and Qwen 3.

  • Whisper: the UI only shows chat/completions API and should include audio_transcribe and audio_translate options.

  • No UI playground support for Whisper.

  • Translation using Whisper is not working.

  • Whisper transcription results may be poor with large audio files.

  • TP8 SD pairs might not work.

  • DP training does not work on SN40-16 with RHEL 8.10.