Transcribe an audio file
1. Introduction
Speech recognition API is a service that converts user-provided voice data into text. This API operates asynchronously, which enable the development of applications or the establishment of automatic voice-to-text conversion services using text obtained from voice data.
Key features
- Voice data conversion: Converts voice data provided through API into text.
- Supporting a wide range of audio formats: Supports a wide range of audio formats such as WAV and MP3 for flexible use.
- High Accuracy: Converts speech data to text with high accuracy through a state-of-the-art speech recognition algorithm.
WARNING
⚠ Songs or audio with loud background music are not supported.
Use Cases
You can utilize our speech recognition API to effectively convert speech data into text. You can enhance the user experience by developing the following voice-based applications or implementing automatic speech-to-text conversion services.
- Voice Memo/Journal: When a user records a voice memo, the API converts it into text, and the text can be automatically savedin a memo application.
- Voice-based search engine: When a user delivers a search term by voice, the API converts it into text, and the search engine returns relevant results.
- AI Contact Center: When customer's phone call content is converted to text via the API, the automatic response system generates the appropriate response based on the content.
- Automatic Meeting minutes service: When voice data is converted to text via the API during a meeting, minutes are automatically written and provided to participants.
- Video Subtitles: You can get transcription results as subtitles from the API.
2. Example
Below is a simple usage example, which sends a long voice file to the API in an asynchronous manner.
POST
curl -X POST 'https://apis.daglo.ai/stt/v1/async/transcripts' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
"audio": {
"source": {
"url": "https://storage.googleapis.com/bkt-actionpower-examples/audio/actionpower_hello.wav"
}
}
}'
POST-Response
{"rid":"12345678-abcd-efgh-1234-abcdefghijkl"}
GET
curl 'https://apis.daglo.ai/stt/v1/async/transcripts/<RID>' \
--header 'Authorization: Bearer <API_TOKEN>'
GET-Response
{
"rid": "12345678-abcd-efgh-1234-abcdefghijkl",
"status": "transcribed",
"sttResults": [
{
"transcript": "안녕하세요. 액션 파워입니다.",
"words": [
{
"word": "안녕하세요. ",
"startTime": {
"nanos": 60000000,
"seconds": "0"
},
"endTime": {
"nanos": 860000000,
"seconds": "0"
},
"segmentId": "1"
},
{
"word": "액션 ",
"startTime": {
"nanos": 820000000,
"seconds": "0"
},
"endTime": {
"nanos": 419999999,
"seconds": "1"
},
"segmentId": "2"
},
{
"word": "파워입니다. ",
"startTime": {
"nanos": 379999999,
"seconds": "1"
},
"endTime": {
"nanos": 20000000,
"seconds": "2"
},
"segmentId": "2"
}
],
"keywords": null
}
]
}
3. Instructions
1) Getting an API key
- Create an account in the API console.
- Go to the token menu and issue a new token.
- Copy the issued token information and use it as an authentication token when requested.
2)Sending a request
- This example sends a long voice file.
- Request to the specified endpoint (Send long audio to transcribe) with the required parameters.
POST https://apis.daglo.ai/stt/v1/async/transcripts
- For more detailed API parameter information, please refer to API Reference.
3) Verifying audio format is supported
- File size: Up to 2GB
- Audio duration: 4 hours or less
- Supported file formats
🔊 audio
.3gp, .3gpp .ac3, .aac, .aiff, .amr, .au, .flac, .m4a, .mp3, .mxf, .opus, .ra, .wav, .weba
📹 video
.asx, .avi, .ogm, .ogv, .m4v, .mov, .mp4, .mpeg, .mpg, .wmv
WARNING
⚠ ️ Even if the format is the same, transcription may not proceed if the actual content (encoding) is different.
4) Getting a response
a. POST
Send long audio to transcribe
- Success
View response codes
200
204
- No Content. The request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty.
:::
Error
View response codes
400
: Bad Request. Invalid format.401
: Unauthorized.403
: Forbidden. Unauthorized access.413
: Payload Too Large. Request too large.415
: Unsupported Media Type.429
: Too Many Requests.500
: Internal Server Error.503
: We are processing so many requests that we are temporarily unable to respond. Please try again in a moment.:::
b. GET
Get long audio transcription
Endpoint
shellGET https://apis.daglo.ai/stt/v1/async/transcripts/{rid}
Success
View response code
200
ai_requested
: Request has been started.uploaded
: File upload completed.file_processing
: Pre-processing file.transcribing
: Transcription in-progress.post_processing
: Post-processing.transcribed
: Transcription completed.transcript_error
: An error occurred while transcribing. Please wait a moment and request again.file_error
: There is an error in the file. Please check the file and request it again.
204
- No Content. Request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty.
:::
- Error
View response code
401
: Unauthorized.403
: Forbidden. Unauthorized access.404
: Not Found.429
: Too Many Requests.500
: Internal Server Error.
c. Receiving the response as a callback
If you want your request to be processed by a callback function, upon completion the API server can send the completion status to a URL the user has specified. For more information, please refer to the Get(Polling) and Callback document.
Update history
- 20240902 ver1.0 API document has been created.