Transcribe an audio file

1. Introduction

Speech recognition API is a service that converts user-provided voice data into text. This API operates asynchronously, which enable the development of applications or the establishment of automatic voice-to-text conversion services using text obtained from voice data.

Key features

Voice data conversion: Converts voice data provided through API into text.
Supporting a wide range of audio formats: Supports a wide range of audio formats such as WAV and MP3 for flexible use.
High Accuracy: Converts speech data to text with high accuracy through a state-of-the-art speech recognition algorithm.

WARNING

⚠ Songs or audio with loud background music are not supported.

Use Cases

You can utilize our speech recognition API to effectively convert speech data into text. You can enhance the user experience by developing the following voice-based applications or implementing automatic speech-to-text conversion services.

Voice Memo/Journal: When a user records a voice memo, the API converts it into text, and the text can be automatically savedin a memo application.
Voice-based search engine: When a user delivers a search term by voice, the API converts it into text, and the search engine returns relevant results.
AI Contact Center: When customer's phone call content is converted to text via the API, the automatic response system generates the appropriate response based on the content.
Automatic Meeting minutes service: When voice data is converted to text via the API during a meeting, minutes are automatically written and provided to participants.
Video Subtitles: You can get transcription results as subtitles from the API.

2. Example

Below is a simple usage example, which sends a long voice file to the API in an asynchronous manner.

POST

shell

curl -X POST 'https://apis.daglo.ai/stt/v1/async/transcripts' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
    "audio": {
        "source": {
            "url": "https://storage.googleapis.com/bkt-actionpower-examples/audio/actionpower_hello.wav"
        }
    }
}'

POST-Response

text

{"rid":"12345678-abcd-efgh-1234-abcdefghijkl"}

GET

shell

curl 'https://apis.daglo.ai/stt/v1/async/transcripts/<RID>' \
  --header 'Authorization: Bearer <API_TOKEN>'

GET-Response

text

{
    "rid": "12345678-abcd-efgh-1234-abcdefghijkl",
    "status": "transcribed",
    "sttResults": [
        {
            "transcript": "안녕하세요. 액션 파워입니다.",
            "words": [
                {
                    "word": "안녕하세요. ",
                    "startTime": {
                        "nanos": 60000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 860000000,
                        "seconds": "0"
                    },
                    "segmentId": "1"
                },
                {
                    "word": "액션 ",
                    "startTime": {
                        "nanos": 820000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 419999999,
                        "seconds": "1"
                    },
                    "segmentId": "2"
                },
                {
                    "word": "파워입니다. ",
                    "startTime": {
                        "nanos": 379999999,
                        "seconds": "1"
                    },
                    "endTime": {
                        "nanos": 20000000,
                        "seconds": "2"
                    },
                    "segmentId": "2"
                }
            ],
            "keywords": null
        }
    ]
}

3. Instructions

1) Getting an API key

Create an account in the API console.
Go to the token menu and issue a new token.
Copy the issued token information and use it as an authentication token when requested.

2)Sending a request

This example sends a long voice file.
Request to the specified endpoint (Send long audio to transcribe) with the required parameters.

shell

POST https://apis.daglo.ai/stt/v1/async/transcripts

For more detailed API parameter information, please refer to API Reference.

3) Verifying audio format is supported

File size: Up to 2GB
Audio duration: 4 hours or less
Supported file formats

🔊 audio

.3gp, .3gpp .ac3, .aac, .aiff, .amr, .au, .flac, .m4a, .mp3, .mxf, .opus, .ra, .wav, .weba

📹 video

.asx, .avi, .ogm, .ogv, .m4v, .mov, .mp4, .mpeg, .mpg, .wmv

WARNING

⚠ ️ Even if the format is the same, transcription may not proceed if the actual content (encoding) is different.

4) Getting a response

a. POST Send long audio to transcribe

Success

View response codes

200
204
- No Content. The request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty.
:::
Error

View response codes

400: Bad Request. Invalid format.
401: Unauthorized.
403: Forbidden. Unauthorized access.
413: Payload Too Large. Request too large.
415: Unsupported Media Type.
429: Too Many Requests.
500: Internal Server Error.
503: We are processing so many requests that we are temporarily unable to respond. Please try again in a moment.
:::

b. GET Get long audio transcription

Endpoint

shell

GET https://apis.daglo.ai/stt/v1/async/transcripts/{rid}

Success

View response code

200
- ai_requested: Request has been started.
- uploaded: File upload completed.
- file_processing: Pre-processing file.
- transcribing: Transcription in-progress.
- post_processing: Post-processing.
- transcribed: Transcription completed.
- transcript_error: An error occurred while transcribing. Please wait a moment and request again.
- file_error: There is an error in the file. Please check the file and request it again.
204
- No Content. Request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty.
:::
- Error
View response code
- 401: Unauthorized.
- 403: Forbidden. Unauthorized access.
- 404: Not Found.
- 429: Too Many Requests.
- 500: Internal Server Error.

c. Receiving the response as a callback

If you want your request to be processed by a callback function, upon completion the API server can send the completion status to a URL the user has specified. For more information, please refer to the Get(Polling) and Callback document.

Update history

20240902 ver1.0 API document has been created.

Transcribe an audio file ​

1. Introduction ​

Key features ​

Use Cases ​

2. Example ​

3. Instructions ​

1) Getting an API key ​

2)Sending a request ​

3) Verifying audio format is supported ​

4) Getting a response ​

Transcribe an audio file

1. Introduction

Key features

Use Cases

2. Example

3. Instructions

1) Getting an API key

2)Sending a request

3) Verifying audio format is supported

4) Getting a response