Additional Features in Speech Recognition

1. Speaker Diarization

Introduction

The speaker diarization feature separates each speaker's utterances from an audio file of multiple speakers and converts them into text. For example, you can clearly distinguish who said what in a meeting recording. This is useful for taking minutes or analyzing interviews.

Speaker diarization may be less accurate as the number of speakers increases. Additionally, it also has limitations in cases of overlapping voices, poor audio quality, or background noise.

Usage

You can request and use speaker diarization by setting the parameters of Speech Recognition (Async).
Endpoint

shell

POST https://apis.daglo.ai/stt/v1/async/transcripts

Request Body

text

    {
        "sttConfig": {
            "speakerDiarization": {
                "enable": true
            }
        }
    }

Example Result

text

{
    // ...
    "sttResults": [
        {
            // ...
            "words": [
                {
                    "speaker": "1",
                    "word": "안녕 ",
                    "startTime": {
                        "nanos": 560000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 950000000,
                        "seconds": "1"
                    },
                    "segmentId": "1"
                },
                {
                    "speaker": "2",
                    "word": "네 ",
                    "startTime": {
                        "nanos": 909999999,
                        "seconds": "1"
                    },
                    "endTime": {
                        "nanos": 229999999,
                        "seconds": "2"
                    },
                    "segmentId": "2"
                }
            ]
        }
    ]
}

2. Sentiment Analysis

Introduction

The sentiment analysis feature analyzes the emotional state of text and classifies it as positive, negative, neutral, etc. This is useful for understanding customer sentiment in customer service calls or analyzing the sentiment of a consultation.

Usage

You can request and use sentiment analysis by setting the parameters of Speech Recognition (Async).
Endpoint

shell

POST https://apis.daglo.ai/stt/v1/async/transcripts

Request Body

text

    {
        // ...
        "nlpConfig": {
            "sentimentAnalysis": {
                "enable": true
            }
        }
    }

Example Result

text

{
    // ...
    "sttResults": [
        {
            // ...
            "sentiment": "Negative",
            "sentimentScore": [
                "neutral": 22.559999465942383,
                "negative": 61.209999084472656,
                "positive": 16.229999542236328
            ]
        }
    ]
}

Sentiment Analysis Results Response Table

Sentiment	Explanation
positive	Expresses positive feelings.Contains content that indicates the speaker feels positive or happy.
negative	Expresses negative feelings.Contains content that indicates the speaker feels dissatisfied or negative.
neutral	Expresses neutral feelings.Contains neutral content in which the speaker does not strongly express any particular feelings.

3. Keyword Extraction

Introduction

The keyword extraction feature extracts important keywords from text. This is useful for extracting important keywords from meeting minutes or summarizing lecture content.

Usage

You can request and use it by setting the parameters of Speech Recognition (Async).
Endpoint

shell

POST https://apis.daglo.ai/stt/v1/async/transcripts

Request Body

text

    {
        // ...
        "nlpConfig": {
            "keywordExtraction": {
                "enable": true
            }
        }
    }

Example Result

text

{
    // ...
    "sttResults": [
        {
            // ...
            "keywords": [
                "선택",
                "다글로",
                "인생"
            ]
        }
    ]
}

4. Keyword Boosting

Introduction

Keyword boosting is a technique that weighs certain words or phrases so that they are recognized more accurately by the speech recognition system. This allows for better recognition of important keywords, brand names, technical terms, etc., and can improve the efficiency of the speech recognition system by ensuring that important words are recognized more accurately.

Keyword boosting may be ineffective or counterproductive if the entered keyword does not appear in the audio.Recognition rates may be poor if the keyword is an uncommon or unique pronunciation. Also, the effectiveness of keyword boosting may be limited in cases of high background noise or low audio quality.

Usage

You can request and use it by setting the parameters of Speech Recognition (Async), Speech Recognition for short audio files (Sync), or Real-time Speech Recognition (Streaming).
Endpoint:

shell

POST https://apis.daglo.ai/stt/v1/async/transcripts

Request Body

text

    {
        // ...
        "sttConfig": {
            "keywordBoost": {
                "enable": true,
                "keywords": ["다글로", "클라우드"]
            }
        }
    }

Example Result

Before Keyword Boosting : '이번 다그러 AI 크라운 프로젝트는 매우 성공적이었습니다.'
After Keyword Boosting

text

{
    // ...
    "sttResults": [
        {
            // ...
            "transcript": "이번 다글로 AI 클라우드 프로젝트는 매우 성공적이었습니다."
        }
    ]
}

Update History

20240902 ver1.0 API document has been created.

Additional Features in Speech Recognition ​

1. Speaker Diarization ​

2. Sentiment Analysis ​

3. Keyword Extraction ​

4. Keyword Boosting ​

Additional Features in Speech Recognition

1. Speaker Diarization

2. Sentiment Analysis

3. Keyword Extraction

4. Keyword Boosting