Skip to content

Additional Features in Speech Recognition

1. Speaker Diarization

Introduction

The speaker diarization feature separates each speaker's utterances from an audio file of multiple speakers and converts them into text. For example, you can clearly distinguish who said what in a meeting recording. This is useful for taking minutes or analyzing interviews.

Speaker diarization may be less accurate as the number of speakers increases. Additionally, it also has limitations in cases of overlapping voices, poor audio quality, or background noise.

Usage

shell
POST https://apis.daglo.ai/stt/v1/async/transcripts
  • Request Body
text
    {
        "sttConfig": {
            "speakerDiarization": {
                "enable": true
            }
        }
    }

Example Result

text
{
    // ...
    "sttResults": [
        {
            // ...
            "words": [
                {
                    "speaker": "1",
                    "word": "안녕 ",
                    "startTime": {
                        "nanos": 560000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 950000000,
                        "seconds": "1"
                    },
                    "segmentId": "1"
                },
                {
                    "speaker": "2",
                    "word": "네 ",
                    "startTime": {
                        "nanos": 909999999,
                        "seconds": "1"
                    },
                    "endTime": {
                        "nanos": 229999999,
                        "seconds": "2"
                    },
                    "segmentId": "2"
                }
            ]
        }
    ]
}

2. Sentiment Analysis

Introduction

The sentiment analysis feature analyzes the emotional state of text and classifies it as positive, negative, neutral, etc. This is useful for understanding customer sentiment in customer service calls or analyzing the sentiment of a consultation.

Usage

shell
POST https://apis.daglo.ai/stt/v1/async/transcripts
  • Request Body
text
    {
        // ...
        "nlpConfig": {
            "sentimentAnalysis": {
                "enable": true
            }
        }
    }

Example Result

text
{
    // ...
    "sttResults": [
        {
            // ...
            "sentiment": "Negative",
            "sentimentScore": [
                "neutral": 22.559999465942383,
                "negative": 61.209999084472656,
                "positive": 16.229999542236328
            ]
        }
    ]
}

Sentiment Analysis Results Response Table

SentimentExplanation
positiveExpresses positive feelings.Contains content that indicates the speaker feels positive or happy.
negativeExpresses negative feelings.Contains content that indicates the speaker feels dissatisfied or negative.
neutralExpresses neutral feelings.Contains neutral content in which the speaker does not strongly express any particular feelings.

3. Keyword Extraction

Introduction

The keyword extraction feature extracts important keywords from text. This is useful for extracting important keywords from meeting minutes or summarizing lecture content.

Usage

shell
POST https://apis.daglo.ai/stt/v1/async/transcripts
  • Request Body
text
    {
        // ...
        "nlpConfig": {
            "keywordExtraction": {
                "enable": true
            }
        }
    }

Example Result

text
{
    // ...
    "sttResults": [
        {
            // ...
            "keywords": [
                "선택",
                "다글로",
                "인생"
            ]
        }
    ]
}

4. Keyword Boosting

Introduction

Keyword boosting is a technique that weighs certain words or phrases so that they are recognized more accurately by the speech recognition system. This allows for better recognition of important keywords, brand names, technical terms, etc., and can improve the efficiency of the speech recognition system by ensuring that important words are recognized more accurately.

Keyword boosting may be ineffective or counterproductive if the entered keyword does not appear in the audio.Recognition rates may be poor if the keyword is an uncommon or unique pronunciation. Also, the effectiveness of keyword boosting may be limited in cases of high background noise or low audio quality.

Usage

shell
POST https://apis.daglo.ai/stt/v1/async/transcripts
  • Request Body
text
    {
        // ...
        "sttConfig": {
            "keywordBoost": {
                "enable": true,
                "keywords": ["다글로", "클라우드"]
            }
        }
    }

Example Result

  • Before Keyword Boosting : '이번 다그러 AI 크라운 프로젝트는 매우 성공적이었습니다.'
  • After Keyword Boosting
text
{
    // ...
    "sttResults": [
        {
            // ...
            "transcript": "이번 다글로 AI 클라우드 프로젝트는 매우 성공적이었습니다."
        }
    ]
}

Update History

  • 20240902 ver1.0 API document has been created.