SLNG Speech-to-Text API

Deepgram Nova 3 Multi-Language

Endpointhttps://api.slng.ai

Real-time speech-to-text transcription with ultra-low latency using Deepgram's Nova model. Optimized for streaming audio with intelligent Voice Activity Detection (VAD) and speaker diarization.

deepgram/nova:3-multi - websocket

GET

https://api.slng.ai

/v1/stt/slng/deepgram/nova:3-multi

Speech-to-Text API for converting audio files to text using SLNG deepgram/nova. Real-time speech-to-text transcription with ultra-low latency using Deepgram's Nova model. Optimized for streaming audio with intelligent Voice Activity Detection (VAD) and speaker diarization.

WebSocket Endpoint

Establishes a WebSocket connection for real-time speech-to-text.

Connection URL: wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-multi

deepgram/nova:3-multi - websocket › Headers

Authorization

string · required

The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Upgrade

string · enum · required

Enum values:

websocket

Connection

string · enum · required

Enum values:

Upgrade

X-Region-Override

string · enum

Optional. Specify a target region for this model. If not provided, the system will automatically select an appropriate region.

Enum values:

ap-southeast

ap-southeast-2

eu-north

eu-north-1

me-south

deepgram/nova:3-multi - websocket › Request Body

oneOf

Exactly one variant must match.

Decision Table

Variant	Matching Criteria
	type = object · requires: type
	type = object · requires: type, data
	type = object · requires: type

Properties for Init Message:

Initialize a session with recognition configuration before streaming audio.

type

const · required

Const value: init

model

string

Model to use for transcription

Default: nova-3

object

Recognition configuration options

deepgram/nova:3-multi - websocket › Responses

Switching Protocols

oneOf

Exactly one variant must match.

Decision Table

Variant	Matching Criteria
	type = object · requires: type, session_id
	type = object · requires: type, transcript
	type = object · requires: type, transcript
	type = object · requires: type, code, message

Properties for Ready Message:

Indicates the session is ready to receive audio.

type

const · required

Const value: ready

session_id

string · required

Unique session identifier

GET/v1/stt/slng/deepgram/nova:3-multi

curl --request GET \
  --url https://api.slng.ai/v1/stt/slng/deepgram/nova:3-multi \
  --header 'Authorization: <string>' \
  --header 'Connection: <string>' \
  --header 'Content-Type: application/json' \
  --header 'Upgrade: <string>' \
  --data '
{
  "type": "init",
  "config": {
    "language": "en",
    "sample_rate": 16000,
    "encoding": "linear16",
    "punctuate": true,
    "keyterm": [
      "Barcelona",
      "sunny"
    ],
    "enable_partials": true
  }
}
'

shell

Example Request Body

{
  "type": "init",
  "config": {
    "language": "en",
    "sample_rate": 16000,
    "encoding": "linear16",
    "punctuate": true,
    "keyterm": [
      "Barcelona",
      "sunny"
    ],
    "enable_partials": true
  }
}

json

application/json

Example Responses

{
  "type": "ready",
  "session_id": "session_abc123"
}

json

application/json

deepgram/nova:3-multi - http

POST

https://api.slng.ai

/v1/stt/slng/deepgram/nova:3-multi

deepgram/nova:3-multi - http › Headers

Authorization

string · required

The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

X-Region-Override

string · enum

Optional. Specify a target region for this model. If not provided, the system will automatically select an appropriate region.

Enum values:

ap-southeast

ap-southeast-2

eu-north

eu-north-1

me-south

deepgram/nova:3-multi - http › Request Body

audio

string · binary

Audio file to transcribe

url

string · uri

URL to audio file

language

string · enum

Allowed language codes for deepgram/nova:3-multi.

Enum values:

multi

en-au

en-ca

en-gb

en-in

en-nz

Default: multi

model

string

AI model used to process submitted audio (nova-3, nova-2, etc.)

punctuate

boolean

Add punctuation and capitalization to the transcript

Default: false

diarize

boolean

Recognize speaker changes. Each word assigned a speaker number starting at 0

Default: false

smart_format

boolean

Apply formatting to improve transcript readability

Default: false

utterances

boolean

Segment speech into meaningful semantic units

Default: false

utt_split

number

Seconds to wait before detecting a pause between words

Default: 0.8

paragraphs

boolean

Split audio into paragraphs to improve transcript readability

Default: false

numerals

boolean

Convert numbers from written format to numerical format

Default: false

profanity_filter

boolean

Convert profanity to nearest non-profane word or remove it

Default: false

redact

string[]

Remove sensitive information (pci, pii, numbers) from transcripts

Enum values:

pci

pii

numbers

search

string[]

Search for terms or phrases in submitted audio

replace

string[]

Search and replace terms in transcript (format "term:replacement")

keywords

string[]

Boost or suppress terminology (Nova-2 and earlier, with intensifier like "word:5")

keyterm

string[]

Keyterm prompting for specialized terminology (Nova-3 only)

multichannel

boolean

Transcribe each audio channel independently

Default: false

alternatives

integer

Number of alternative transcripts to return

filler_words

boolean

Include filler words like "uh" and "um" in transcript

Default: false

dictation

boolean

Dictation mode for controlling formatting with dictated speech

Default: false

measurements

boolean

Convert spoken measurements to abbreviations

Default: false

encoding

string · enum

Expected encoding of submitted audio

Enum values:

linear16

flac

mulaw

amr-nb

amr-wb

opus

speex

g729

sample_rate

integer

Sample rate of submitted audio in Hz

channels

integer

Number of audio channels

detect_entities

boolean

Identifies and extracts key entities from content in submitted audio

Default: false

Identifies the dominant language spoken in submitted audio

Default: false

sentiment

boolean

Recognizes the sentiment throughout a transcript

Default: false

Summarize content. Supports string version option (v2) or boolean.

Default: false

topics

boolean

Detect topics throughout a transcript

Default: false

custom_topic

string[] · maxItems: 100

Custom topics you want the model to detect within your input audio

custom_topic_mode

string · enum

Sets how the model will interpret custom_topic param

Enum values:

extended

strict

Default: extended

intents

boolean

Recognizes speaker intent throughout a transcript

Default: false

custom_intent

string[] · maxItems: 100

Custom intents you want the model to detect within your input audio

custom_intent_mode

string · enum

Sets how the model will interpret custom_intent param

Enum values:

extended

strict

Default: extended

callback

string · uri

URL to which we'll make the callback request

callback_method

string · enum

HTTP method by which the callback request will be made

Enum values:

POST

PUT

Default: POST

Label your requests for the purpose of identification during usage reporting

Arbitrary key-value pairs attached to the API response for downstream processing

deepgram/nova:3-multi - http › Responses

Successful transcription

text

string · required

Transcribed text.

language

string · required

Detected or specified language code.

duration

number

Audio duration in seconds.

confidence

number

Confidence score (0.0-1.0).

object[]

Word-level transcription with timing and confidence.

request_id

string · uuid

Unique request identifier.

model

string

Model used for transcription.

POST/v1/stt/slng/deepgram/nova:3-multi

curl --request POST \
  --url https://api.slng.ai/v1/stt/slng/deepgram/nova:3-multi \
  --header 'Authorization: <string>' \
  --header 'Content-Type: multipart/form-data' \
  --form audio=micro-machines-16k-mono.wav \
  --form language=multi

shell

Example Request Body

{
  "audio": "micro-machines-16k-mono.wav",
  "language": "multi"
}

plain

Example Responses

{
  "text": "hello from sunny barcelona",
  "language": "en",
  "transcript": "hello from sunny barcelona",
  "confidence": 0.9819336,
  "duration": 1.8959374,
  "metadata": {
    "request_id": "8b22cd59-2007-464b-8006-b928c5d35558",
    "model": "nova-3",
    "duration": 1.8959374,
    "channels": 1
  }
}

json

application/json

Deepgram Nova 3 Hindi Deepgram Nova 3 Spanish

deepgram/nova:3-multi - websocket

GET

https://api.slng.ai

/v1/stt/slng/deepgram/nova:3-multi

WebSocket Endpoint

Establishes a WebSocket connection for real-time speech-to-text.

Connection URL: wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-multi

deepgram/nova:3-multi - websocket › Headers

Authorization

string · required

The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Upgrade

string · enum · required

Enum values:

websocket

Connection

string · enum · required

Enum values:

Upgrade

X-Region-Override

string · enum

Optional. Specify a target region for this model. If not provided, the system will automatically select an appropriate region.

Enum values:

ap-southeast

ap-southeast-2

eu-north

eu-north-1

me-south

deepgram/nova:3-multi - websocket › Request Body

oneOf

Exactly one variant must match.

Decision Table

Variant	Matching Criteria
	type = object · requires: type
	type = object · requires: type, data
	type = object · requires: type

Properties for Init Message:

Initialize a session with recognition configuration before streaming audio.

type

const · required

Const value: init

model

string

Model to use for transcription

Default: nova-3

object

Recognition configuration options

deepgram/nova:3-multi - websocket › Responses

Switching Protocols

oneOf

Exactly one variant must match.

Decision Table

Variant	Matching Criteria
	type = object · requires: type, session_id
	type = object · requires: type, transcript
	type = object · requires: type, transcript
	type = object · requires: type, code, message

Properties for Ready Message:

Indicates the session is ready to receive audio.

type

const · required

Const value: ready

session_id

string · required

Unique session identifier

curl --request GET \ --url https://api.slng.ai/v1/stt/slng/deepgram/nova:3-multi \ --header 'Authorization: <string>' \ --header 'Connection: <string>' \ --header 'Content-Type: application/json' \ --header 'Upgrade: <string>' \ --data ' { "type": "init", "config": { "language": "en", "sample_rate": 16000, "encoding": "linear16", "punctuate": true, "keyterm": [ "Barcelona", "sunny" ], "enable_partials": true } } '

{ "type": "init", "config": { "language": "en", "sample_rate": 16000, "encoding": "linear16", "punctuate": true, "keyterm": [ "Barcelona", "sunny" ], "enable_partials": true } }

deepgram/nova:3-multi - http

POST

https://api.slng.ai

/v1/stt/slng/deepgram/nova:3-multi

deepgram/nova:3-multi - http › Headers

Authorization

string · required

The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

X-Region-Override

string · enum

Optional. Specify a target region for this model. If not provided, the system will automatically select an appropriate region.

Enum values:

ap-southeast

ap-southeast-2

eu-north

eu-north-1

me-south

deepgram/nova:3-multi - http › Request Body

audio

string · binary

Audio file to transcribe

url

string · uri

URL to audio file

language

string · enum

Allowed language codes for deepgram/nova:3-multi.

Enum values:

multi

en-au

en-ca

en-gb

en-in

en-nz

Default: multi

model

string

AI model used to process submitted audio (nova-3, nova-2, etc.)

punctuate

boolean

Add punctuation and capitalization to the transcript

Default: false

diarize

boolean

Recognize speaker changes. Each word assigned a speaker number starting at 0

Default: false

smart_format

boolean

Apply formatting to improve transcript readability

Default: false

utterances

boolean

Segment speech into meaningful semantic units

Default: false

utt_split

number

Seconds to wait before detecting a pause between words

Default: 0.8

paragraphs

boolean

Split audio into paragraphs to improve transcript readability

Default: false

numerals

boolean

Convert numbers from written format to numerical format

Default: false

profanity_filter

boolean

Convert profanity to nearest non-profane word or remove it

Default: false

redact

string[]

Remove sensitive information (pci, pii, numbers) from transcripts

Enum values:

pci

pii

numbers

search

string[]

Search for terms or phrases in submitted audio

replace

string[]

Search and replace terms in transcript (format "term:replacement")

keywords

string[]

Boost or suppress terminology (Nova-2 and earlier, with intensifier like "word:5")

keyterm

string[]

Keyterm prompting for specialized terminology (Nova-3 only)

multichannel

boolean

Transcribe each audio channel independently

Default: false

alternatives

integer

Number of alternative transcripts to return

filler_words

boolean

Include filler words like "uh" and "um" in transcript

Default: false

dictation

boolean

Dictation mode for controlling formatting with dictated speech

Default: false

measurements

boolean

Convert spoken measurements to abbreviations

Default: false

encoding

string · enum

Expected encoding of submitted audio

Enum values:

linear16

flac

mulaw

amr-nb

amr-wb

opus

speex

g729

sample_rate

integer

Sample rate of submitted audio in Hz

channels

integer

Number of audio channels

detect_entities

boolean

Identifies and extracts key entities from content in submitted audio

Default: false

Identifies the dominant language spoken in submitted audio

Default: false

sentiment

boolean

Recognizes the sentiment throughout a transcript

Default: false

Summarize content. Supports string version option (v2) or boolean.

Default: false

topics

boolean

Detect topics throughout a transcript

Default: false

custom_topic

string[] · maxItems: 100

Custom topics you want the model to detect within your input audio

custom_topic_mode

string · enum

Sets how the model will interpret custom_topic param

Enum values:

extended

strict

Default: extended

intents

boolean

Recognizes speaker intent throughout a transcript

Default: false

custom_intent

string[] · maxItems: 100

Custom intents you want the model to detect within your input audio

custom_intent_mode

string · enum

Sets how the model will interpret custom_intent param

Enum values:

extended

strict

Default: extended

callback

string · uri

URL to which we'll make the callback request

callback_method

string · enum

HTTP method by which the callback request will be made

Enum values:

POST

PUT

Default: POST

Label your requests for the purpose of identification during usage reporting

Arbitrary key-value pairs attached to the API response for downstream processing

deepgram/nova:3-multi - http › Responses

Successful transcription

text

string · required

Transcribed text.

language

string · required

Detected or specified language code.

duration

number

Audio duration in seconds.

confidence

number

Confidence score (0.0-1.0).

object[]

Word-level transcription with timing and confidence.

request_id

string · uuid

Unique request identifier.

model

string

Model used for transcription.

curl --request POST \ --url https://api.slng.ai/v1/stt/slng/deepgram/nova:3-multi \ --header 'Authorization: <string>' \ --header 'Content-Type: multipart/form-data' \ --form audio=micro-machines-16k-mono.wav \ --form language=multi

{ "text": "hello from sunny barcelona", "language": "en", "transcript": "hello from sunny barcelona", "confidence": 0.9819336, "duration": 1.8959374, "metadata": { "request_id": "8b22cd59-2007-464b-8006-b928c5d35558", "model": "nova-3", "duration": 1.8959374, "channels": 1 } }