Samba-1 Turbo 25.10.1-MP1 :: SambaNova Documentation

Prerequisite

The prerequisite for this release is:

Studio Version 25.6.2-RC1

New and updated model versions

New models

Developer/Model ID	Type	Mode	Context length (batch size)	Features and optimizations	RDU architecture	RDU count	View on Hugging Face
DeepSeek-R1-0528	Reasoning, Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-R1-0528-16384	Reasoning, Text	Inference	16384 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-R1-0528-32768	Reasoning, Text	Inference	32768 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-R1-0528-4096	Reasoning, Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-R1-0528-8192	Reasoning, Text	Inference	8192 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3-0324-16384	Text	Inference	16384 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3-0324-32768	Text	Inference	32768 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3-0324-4096	Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3-0324-8192	Text	Inference	8192 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1	Reasoning, Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-16384	Reasoning, Text	Inference	16384 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-32768	Reasoning, Text	Inference	32768 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-4096	Reasoning, Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-8192	Reasoning, Text	Inference	8192 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-Terminus	Reasoning, Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-Terminus-16384	Reasoning, Text	Inference	16384 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-Terminus-32768	Reasoning, Text	Inference	32768 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-Terminus-4096	Reasoning, Text	Inference	4096 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card
DeepSeek-V3.1-Terminus-8192	Reasoning, Text	Inference	8192 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	Model card

Developer/Model ID

Type

Mode

Context length (batch size)

Features and optimizations

RDU architecture

RDU count

View on Hugging Face

DeepSeek-R1-0528

Reasoning, Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-16384

Reasoning, Text

Inference

16384 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-32768

Reasoning, Text

Inference

32768 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-4096

Reasoning, Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-R1-0528-8192

Reasoning, Text

Inference

8192 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-16384

Text

Inference

16384 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-32768

Text

Inference

32768 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-4096

Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3-0324-8192

Text

Inference

8192 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1

Reasoning, Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-16384

Reasoning, Text

Inference

16384 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-32768

Reasoning, Text

Inference

32768 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-4096

Reasoning, Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-8192

Reasoning, Text

Inference

8192 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus

Reasoning, Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-16384

Reasoning, Text

Inference

16384 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-32768

Reasoning, Text

Inference

32768 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-4096

Reasoning, Text

Inference

4096 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

DeepSeek-V3.1-Terminus-8192

Reasoning, Text

Inference

8192 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

Model card

Updated models

Developer/Model ID	Type	Mode	Context length (batch size)	Features and optimizations	RDU architecture	RDU count	Speculative decoding	View on Hugging Face	Requires new endpoint?
DeepSeek-R1-Distill-Llama-70B	Reasoning, Text	Inference	4096 (2, 4, 8, 16, 32) 8192 (1, 2, 4, 8) 32768 (1, 2, 4) 65536 (1) 131072 (1)	Endpoint: Chat completions Capabilities: None Import checkpoint: Yes Optimizations: Speculative decoding	SN40L-16	16	True	Model card	No
e5-mistral-7B-instruct	Embedding	Inference	8192 (1, 4, 8) 32768 (1, 4, 8)	Endpoint: Embeddings Capabilities: None Import checkpoint: Yes Optimizations: None	SN40L-8	8	False	Model card	Yes*
e5-mistral-7B-instruct	Embedding	Inference	8192 (1, 4, 8) 32768 (1, 4, 8)	Endpoint: Embeddings Capabilities: None Import checkpoint: Yes Optimizations: None	SN40L-16	16	False	Model card	Yes*
e5-mistral-7B-instruct	Embedding	Inference	4096 (1, 4, 8, 16, 32) 8192 (1, 4, 8, 16, 32)	Endpoint: Embeddings Capabilities: None Import checkpoint: Yes Optimizations: None	SN40L-16	16	False	Model card	Yes*
e5-mistral-7b-instruct-8192	Embedding	Inference	8192 (1, 4, 8)	Endpoint: Embeddings Capabilities: None Import checkpoint: Yes Optimizations: None	SN40L-8	8	False	Model card	Yes*
e5-mistral-7b-instruct-8192	Embedding	Inference	8192 (1, 4, 8)	Endpoint: Embeddings Capabilities: None Import checkpoint: Yes Optimizations: None	SN40L-16	8	False	Model card	Yes*
e5-mistral-7b-instruct-8192	Embedding	Inference	4096 (1, 4, 8, 16, 32) 8192 (1, 4, 8, 16, 32)	Endpoint: Embeddings Capabilities: None Import checkpoint: Yes Optimizations: None	SN40L-16	16	False	Model card	Yes*
Llama-3.1-Tulu-3-405B	Text	Inference	4096 (1, 2, 4) 8192 (1) 16384 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	True	Model card	Yes*
Llama-4-Maverick-17B-128E-Instruct	Image, Text	Inference	8192 (1) 16384 (1) 32768 (1) 65536 (1) 131072 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	False	Model card	Yes*
Llama-4-Maverick-17B-128E-Instruct-bs4	Image, Text	Inference	8192 (4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	False	Model card	No
Meta-Llama-3-8B	Text	Inference	4096 (1, 2, 4, 8) 8192 (1, 2, 4, 8, 16)	Endpoint: Completions Capabilities: None Import checkpoint: Yes Optimizations: None	SN40L-16	16	False	Model card	No
Meta-Llama-3-8B-Instruct	Text	Inference	4096 (1, 2, 4, 8) 8192 (1, 2, 4, 8, 16)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: Yes Optimizations: None	SN40L-16	16	False	Model card	No
Meta-Llama-3.1-405B-Instruct	Text	Inference	4096 (1, 2, 4) 8192 (1) 16384 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	True	Model card	No
Meta-Llama-3.1-70B-Instruct	Text	Inference	4096 (2, 4, 8, 16, 32) 8192 (1, 2, 4, 8) 16384 (1, 2, 4) 32768 (1, 2, 4) 65536 (1) 131072 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	True	Model card	No
Meta-Llama-3.2-1B-Instruct	Text	Inference	4096 (1, 4, 8, 16, 32) 8192 (1, 4, 8, 16, 32)	Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: None	SN40L-8	8	False	Model card	Yes*
Meta-Llama-3.2-1B-Instruct	Text	Inference	4096 (1, 4, 8, 16, 32) 8192 (1, 4, 8, 16, 32)	Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: None	SN40L-16	16	False	Model card	Yes*
Meta-Llama-3.2-1B-Instruct	Text	Inference	4096 (1, 2, 4, 8, 16, 32) 8192 (1, 2, 4, 8, 16) 16384 (1, 2, 4, 8, 10) 32768 (1, 2, 4) 65536 (1) 131072 (1)	Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: None	SN40L-16	16	False	Model card	Yes*
Meta-Llama-3.2-3B-Instruct-TP16	Text	Inference	4096 (1, 2, 4, 8, 10, 16, 32) 8192 (1, 2, 4, 8, 16) 16384 (1, 2, 4)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	False	Model card	Yes*
Meta-Llama-3.3-70B-Instruct	Text	Inference	4096 (2, 4, 8, 16, 32) 8192 (1, 2, 4, 8) 16384 (1, 2, 4) 32768 (1, 2, 4) 65536 (1) 131072 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	SN40L-16	16	True	Model card	No
Qwen3-32B	Text	Inference	8192 (1, 4) 16384 (1) 32768 (1)	Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	SN40L-16	16	False	Model card	Yes*

Developer/Model ID

Type

Mode

Context length (batch size)

Features and optimizations

RDU architecture

RDU count

Speculative decoding

View on Hugging Face

Requires new endpoint?

DeepSeek-R1-Distill-Llama-70B

Reasoning, Text

Inference

4096 (2, 4, 8, 16, 32)
8192 (1, 2, 4, 8)
32768 (1, 2, 4)
65536 (1)
131072 (1)

Endpoint: Chat completions
Capabilities: None
Import checkpoint: Yes
Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

e5-mistral-7B-instruct

Embedding

Inference

8192 (1, 4, 8)
32768 (1, 4, 8)

Endpoint: Embeddings
Capabilities: None
Import checkpoint: Yes
Optimizations: None

SN40L-8

8

False

Model card

Yes*

e5-mistral-7B-instruct

Embedding

Inference

8192 (1, 4, 8)
32768 (1, 4, 8)

Endpoint: Embeddings
Capabilities: None
Import checkpoint: Yes
Optimizations: None

SN40L-16

16

False

Model card

Yes*

e5-mistral-7B-instruct

Embedding

Inference

4096 (1, 4, 8, 16, 32)
8192 (1, 4, 8, 16, 32)

Endpoint: Embeddings
Capabilities: None
Import checkpoint: Yes
Optimizations: None

SN40L-16

16

False

Model card

Yes*

e5-mistral-7b-instruct-8192

Embedding

Inference

8192 (1, 4, 8)

Endpoint: Embeddings
Capabilities: None
Import checkpoint: Yes
Optimizations: None

SN40L-8

8

False

Model card

Yes*

e5-mistral-7b-instruct-8192

Embedding

Inference

8192 (1, 4, 8)

Endpoint: Embeddings
Capabilities: None
Import checkpoint: Yes
Optimizations: None

SN40L-16

8

False

Model card

Yes*

e5-mistral-7b-instruct-8192

Embedding

Inference

4096 (1, 4, 8, 16, 32)
8192 (1, 4, 8, 16, 32)

Endpoint: Embeddings
Capabilities: None
Import checkpoint: Yes
Optimizations: None

SN40L-16

16

False

Model card

Yes*

Llama-3.1-Tulu-3-405B

Text

Inference

4096 (1, 2, 4)
8192 (1)
16384 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

16

True

Model card

Yes*

Llama-4-Maverick-17B-128E-Instruct

Image, Text

Inference

8192 (1)
16384 (1)
32768 (1)
65536 (1)
131072 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

False

Model card

Yes*

Llama-4-Maverick-17B-128E-Instruct-bs4

Image, Text

Inference

8192 (4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

False

Model card

No

Meta-Llama-3-8B

Text

Inference

4096 (1, 2, 4, 8)
8192 (1, 2, 4, 8, 16)

Endpoint: Completions
Capabilities: None
Import checkpoint: Yes
Optimizations: None

SN40L-16

16

False

Model card

No

Meta-Llama-3-8B-Instruct

Text

Inference

4096 (1, 2, 4, 8)
8192 (1, 2, 4, 8, 16)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: Yes
Optimizations: None

SN40L-16

16

False

Model card

No

Meta-Llama-3.1-405B-Instruct

Text

Inference

4096 (1, 2, 4)
8192 (1)
16384 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

Meta-Llama-3.1-70B-Instruct

Text

Inference

4096 (2, 4, 8, 16, 32)
8192 (1, 2, 4, 8)
16384 (1, 2, 4)
32768 (1, 2, 4)
65536 (1)
131072 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

Meta-Llama-3.2-1B-Instruct

Text

Inference

4096 (1, 4, 8, 16, 32)
8192 (1, 4, 8, 16, 32)

Endpoint: Chat completions
Capabilities: None
Import checkpoint: No
Optimizations: None

SN40L-8

8

False

Model card

Yes*

Meta-Llama-3.2-1B-Instruct

Text

Inference

4096 (1, 4, 8, 16, 32)
8192 (1, 4, 8, 16, 32)

Endpoint: Chat completions
Capabilities: None
Import checkpoint: No
Optimizations: None

SN40L-16

16

False

Model card

Yes*

Meta-Llama-3.2-1B-Instruct

Text

Inference

4096 (1, 2, 4, 8, 16, 32)
8192 (1, 2, 4, 8, 16)
16384 (1, 2, 4, 8, 10)
32768 (1, 2, 4)
65536 (1)
131072 (1)

Endpoint: Chat completions
Capabilities: None
Import checkpoint: No
Optimizations: None

SN40L-16

16

False

Model card

Yes*

Meta-Llama-3.2-3B-Instruct-TP16

Text

Inference

4096 (1, 2, 4, 8, 10, 16, 32)
8192 (1, 2, 4, 8, 16)
16384 (1, 2, 4)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

False

Model card

Yes*

Meta-Llama-3.3-70B-Instruct

Text

Inference

4096 (2, 4, 8, 16, 32)
8192 (1, 2, 4, 8)
16384 (1, 2, 4)
32768 (1, 2, 4)
65536 (1)
131072 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: Speculative decoding

SN40L-16

16

True

Model card

No

Qwen3-32B

Text

Inference

8192 (1, 4)
16384 (1)
32768 (1)

Endpoint: Chat completions
Capabilities: Function calling, JSON mode
Import checkpoint: No
Optimizations: None

SN40L-16

16

False

Model card

Yes*

If a model update is listed as requiring a new endpoint, endpoints already active during the upgrade will continue functioning normally. However, any endpoints that are inactive at the time of upgrade, or stopped afterward, must be re-created once the new version of the model is downloaded. This applies equally to SambaNova-provided and user-created CoEs containing the updated model. Prebuilt CoEs from SambaNova are automatically refreshed to the new version, while user-created CoEs must be rebuilt after downloading the updated model.

Deprecated Models

Model checkpoint name	Model version	RDU architecture	Model parallel RDUs	Speculative decoding	Mode	Note
DeepSeek-R1	2	SN40L-16	16	False	Inference	Renamed to DeepSeek-R1-0528
DeepSeek-R1-16384	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-R1-0528-16384
DeepSeek-R1-32768	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-R1-0528-32768
DeepSeek-R1-4096	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-R1-0528-4096
DeepSeek-R1-8192	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-R1-0528-8192
DeepSeek-V3-16384	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-V3-0324-16384
DeepSeek-V3-32768	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-V3-0324-32768
DeepSeek-V3-4096	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-V3-0324-4096
DeepSeek-V3-8192	1	SN40L-16	16	False	Inference	Renamed to DeepSeek-V3-0324-8192
Llama-4-Maverick-17B-128E-Instruct-131072	2	SN40L-16	16	False	Inference	Now included in Llama-4-Maverick-17B-128E-Instruct
Llama-4-Maverick-17B-128E-Instruct-16384	2	SN40L-16	16	False	Inference	Now included in Llama-4-Maverick-17B-128E-Instruct
Llama-4-Maverick-17B-128E-Instruct-32768	2	SN40L-16	16	False	Inference	Now included in Llama-4-Maverick-17B-128E-Instruct
Llama-4-Maverick-17B-128E-Instruct-65536	2	SN40L-16	16	False	Inference	Now included in Llama-4-Maverick-17B-128E-Instruct

Model checkpoint name

Model version

RDU architecture

Model parallel RDUs

Speculative decoding

Mode

Note

DeepSeek-R1

2

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528

DeepSeek-R1-16384

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-16384

DeepSeek-R1-32768

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-32768

DeepSeek-R1-4096

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-4096

DeepSeek-R1-8192

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-R1-0528-8192

DeepSeek-V3-16384

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-16384

DeepSeek-V3-32768

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-32768

DeepSeek-V3-4096

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-4096

DeepSeek-V3-8192

1

SN40L-16

16

False

Inference

Renamed to DeepSeek-V3-0324-8192

Llama-4-Maverick-17B-128E-Instruct-131072

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Llama-4-Maverick-17B-128E-Instruct-16384

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Llama-4-Maverick-17B-128E-Instruct-32768

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Llama-4-Maverick-17B-128E-Instruct-65536

2

SN40L-16

16

False

Inference

Now included in Llama-4-Maverick-17B-128E-Instruct

Performance and quality improvements

Function calling is now enabled for DeepSeek-R1-0528 and Qwen3-32B.
Llama-4-Maverick-17B-128E-Instruct batch size 1 (8k, 16k, 32k, 64k, 128k) models are consolidated into a single model, instead of separate CoEs for each sequence length.
Llama-4-Maverick-17B-128E-Instruct batch size 4 remains a separate model currently.

Known issues

Llama 3.2 1B does not support function calling.
CoE bundles may fail if containing a standalone model using the same PEF as another draft model of an SD pair in the bundle.
Whisper UI shows only chat/completions API; it should show audio_transcribe and audio_translate options.
- No UI playground support for Whisper.
- Whisper translation is currently not working.
- Whisper transcription quality may be poor for large audio files.
TP8 SD pairs might not work.
DP training does not work on SN40-16 with RHEL 8.10.