Skip to content

Transcribe streaming audio

This guide allows you to run a sample application that converts speech to text in real-time. Voice input through the microphone is immediately converted to text and displayed on the screen. Anyone with a web browser can easily get started, and you can implement real-time voice recognition directly in the web browser using the JavaScript module provided by Daglo.

1. Prerequisites

  • Latest web browser (Chrome, Edge, Firefox, etc.)
  • Device with a connected microphone
  • API key (available from the API Console)

WARNING

⚠️ Important Notes

  • Real-time speech recognition is only available in JavaScript on a web browser environment.
  • Real-time speech recognition currently only supports ‘Korean’ language.
  • Conversion is not supported for songs or audio with loud background music.

2. Quickly Check Real-time Speech Recognition Results Using the Example Code

  1. Save the example code below as an index.html file.

  2. When you open the saved index.html file through a web browser, you will see a screen like the one below.

    stream-stt-index-html.png

  3. Enter the Token you received from the API Console in the API Token input box.

  4. Click the 'Microphone ON' button, then allow microphone access permission in the popup window.

    allow-mic-perm.png

  5. When you start speaking through the microphone, you can immediately see the text converted in real-time.

Example Code

html
<!DOCTYPE html>
<html lang="ko">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />


    <title>dagloAPI stream STT Example</title>
    <link href="./index.css" rel="stylesheet">
  </head>
  <body>
    <div id="liveView" class="videoView">
      <label>API Token</label>
      <input id="token" placeholder="YOUR TOKEN">
      <br>
      <button id="enableButton" class="enable-btn">
        <span class="enable-btn-label">Microphone ON</span>
      </button>

      <p id="result"></p>
      <div id="transcripts"></div>
      <div id="speech-list"></div>
    </div>

    <script type="module">
      import { DagloAPI } from 'https://actionpower.github.io/dagloapi-js-beta/lib/daglo-api.module.js';

      document.getElementById('enableButton').addEventListener('click', async (event) => {
        const dagloToken = document?.getElementById('token').value?.trim();

        let client = new DagloAPI({
          apiToken: dagloToken
        });
        let transcriber = client.stream.transcriber();

        transcriber.on('transcript', (data) => {
          console.log('[#] onTranscript', data);

          if (data?.text) {
            const span = document.createElement('span');
            span.textContent = data?.text;
            document.getElementById('transcripts').append(span);
          }
        })

        let stream;

        try {
          // capture the microphone
          stream = await navigator.mediaDevices.getUserMedia({ audio: true });
        }
        catch (err) {
          console.log("The following error occured: " + err);
          return alert("getUserMedia not supported on your browser");
        }

        if (stream) {
          transcriber.connect(stream);
        }
      });
    </script>
  </body>
</html>

Example Result Screen

stream-stt-result-html.png

3. Additional Information

For detailed API documentation about the JavaScript module, please refer to this document.

4. Troubleshooting

  • If microphone access permission is denied, check the microphone permissions in your browser settings.
  • If the API Token is incorrect, verify that the token is valid in the API Console.
  • If voice recognition quality is poor, try again in a quiet environment or check your microphone settings.