Skip to Content

Add Speech-to-Text to Your Chatbot (with recorder)

Let users talk with your SAP Conversational AI chatbot inside an SAPUI5 app using the speech-to-text APIs of the Web Client bridge, including the media recorder feature, and an external speech-to-text service.
You will learn
  • How to add speech-to-text to your chatbot inside SAPUI5
  • How to use the built-in media recorder functionality of the chatbot
thecodesterDaniel WroblewskiMarch 23, 2022
Created by
thecodester
March 23, 2022
Contributors
thecodester

Prerequisites

The speech-to-text capabilities of SAP Conversational AI include:

  • Adding a microphone button and capturing the user’s click.
  • Automatically handling capturing voice from the browser (Media Record)
  • Creating a small area to view in real-time the transcription of the voice
  • Enabling developers to add the interim transcription to this transcription area
  • Adding hooks for, among other events, recognizing when the speech has stopped and adding the transcribed text as a message in the chatbot conversation
Example chatbot

 

Ways to implement speech to text

Basically, there are 4 ways to implement speech to text for SAP Conversational AI:

  • Without using the speech-to-text features of the chatbot, handling the voice recognition outside the chatbot, and sending a message to the chatbot using the Web Client APIs.
  • Using the speech-to-text features, but without the Media Recorder and interim text features.
  • Using the speech-to-text features, but without the Media Recorder feature.
  • Using all the speech-to-text features.

This tutorial will show an example of #4 – all the features from SAP Conversational AI, including having the chatbot handle capturing the audio via the browser.

The speech-to-text APIs are documented in the SAPConversationalAI / WebClientDevGuide GitHub repo.

  • Step 1

    The speech-to-text features include displaying a microphone within the chatbot in order to start talking with the bot (1).

    Starting microphone

    SAP Conversational AI lets you capture when a user clicks on the microphone, and lets you start the speech recognition and send interim transcriptions to the temporary transcription area of the chatbot (2).

    While talking, the user can either end the session and keep the text or abort and throw away the transcription.

    Once the session is over, because the user stopped talking long enough or because the user clicked the microphone button again, the text is sent as an utterance into the conversation. (3)

    Other features
  • Step 2

    In your SAPUI5 view controller, in the onAfterRendering method, add the script tag for adding your Web Client to a web page. The script is available in the Connect tab when developing your chatbot.

    javascript
    Copy
    onAfterRendering: function () {
      // Set up chatbot
      // this.renderCAIChatBot()
            let s = document.createElement("script");
            s.setAttribute("src", "https://cdn.cai.tools.sap/webclient/bootstrap.js");
            s.setAttribute("id", "cai-webclient-custom");
            s.setAttribute("data-expander-preferences",data_expander_preferences);
            s.setAttribute("data-channel-id",data_channel_id);
            s.setAttribute("data-token",data_token);
            s.setAttribute("data-expander-type","CAI");
            document.body.appendChild(s);
         },
    

    For the specific IDs and tokens for your chatbot, place a new file webclient.js inside the controller folder (since you may want to change the chatbot from time to time) and add just the following few lines. Make sure to enter values for your chatbot.

    JavaScript
    Copy
    const data_expander_preferences = "<my preferences>";
    const data_channel_id = "<my channel ID>";
    const data_token = "<my token>";
    

    To load the file with your details, add a dependency at the start of the controller file.

    JavaScript
    Copy
    sap.ui.define([
        "profilePic/controller/BaseController",
        "sap/m/MessageToast",
        "sap/m/MessageBox",
        "sap/ui/core/Core",
        "sap/ui/model/json/JSONModel",
        "sap/ui/Device",
        "sap/suite/ui/commons/library",
        "sap/m/Dialog",
        "sap/m/DialogType",
        "sap/m/Button",
        "sap/m/ButtonType",
        "sap/m/TextArea",
        "./webclient"
    ],
        function (BaseController, MessageToast, MessageBox, oCore, JSONModel, Device, SuiteLibrary,
            Dialog, DialogType, Button, ButtonType, TextArea,
            webclient) {
    

    At this point you should already be able to use the chatbot in your app – just without speech to text.

    Run the SAPUI5 app and use your chatbot.

  • Step 3

    Create a file called webclientBridge.js and add it to your controller folder.

    Add the following code:

    JavaScript
    Copy
    const webclientBridge = {
    
        // ALL THE STT METHODS
        //--------------------
        callImplMethod: async (name, ...args) => {
            if (window.webclientBridgeImpl && window.webclientBridgeImpl[name]) {
                return window.webclientBridgeImpl[name](...args)
            }
        },
    
        // if this function returns an object, WebClient will enable the microphone button and assume STT is enabled.
        sttGetConfig: async (...args) => {
            return webclientBridge.callImplMethod('sttGetConfig', ...args)
        },
    
        sttStartListening: async (...args) => {
            return webclientBridge.callImplMethod('sttStartListening', ...args)
        },
    
        sttStopListening: async (...args) => {
            return webclientBridge.callImplMethod('sttStopListening', ...args)
        },
    
        sttAbort: async (...args) => {
            return webclientBridge.callImplMethod('sttAbort', ...args)
        },
    
        // only called if useMediaRecorder = true in sttGetConfig
        sttOnFinalAudioData: async (...args) => {
            return webclientBridge.callImplMethod('sttOnFinalAudioData', ...args)
        },
    
        // only called if useMediaRecorder = true in sttGetConfig
        sttOnInterimAudioData: async (...args) => {
            // send interim blob to STT service
            return webclientBridge.callImplMethod('sttOnInterimAudioData', ...args)
        }
    }
    
    window.sapcai = {
        webclientBridge,
    }
    

    SAP Conversational AI expects these speech-to-text functions in the object window.sapcai.webclientBridge.

    This JavaScript code must be loaded before the chatbot is rendered, since the chatbot checks that this file is loaded to determine if the microphone should be displayed.

    You could write your code in a single file. Separating the implementation can help if you will not load the implementation until later, that is, after the chatbot. It also helps modularize the code: 1 file for Web Client bridge methods, including the ones not related to speech-to-text, and than another for the speech-to-text implementation that I can change in and out.

  • Step 4

    Create a file called webclientBridgeImpl.js and add it to your controller folder. For the implementation, we are using IBM’s Speech to Text service to do the transcription.

    You need an account with IBM Cloud to use speech to text and to be able to complete this tutorial.

    Add the following code to the file:

    JavaScript
    Copy
    /* eslint-disable require-await */
    const _ = require('lodash')
    
    const IBM_URL = '<service URL>'
    const access_token = '<OAuth token>'
    
    let wsclient = null
    const sttIBMWebsocket = {
    
      sttGetConfig: async () => {
        return {
          useMediaRecorder: true,
          interimResultTime: 50,
        }
      },
    
      sttStartListening: async (params) => {
        const [metadata] = params
        const sttConfig = await sttIBMWebsocket.sttGetConfig()
        const interim_results = _.get(sttConfig, 'interimResultTime', 0) > 0
        wsclient = new WebSocket(`wss://${IBM_URL}?access_token=${access_token}`)
        wsclient.onopen = (event) => {
          wsclient.send(JSON.stringify({
            action: 'start',
            interim_results,
            'content-type': `audio/${metadata.audioMetadata.fileFormat}`,
          }))
        }
    
        wsclient.onmessage = (event) => {
          const data = JSON.parse(event.data)
          const results = _.get(data, 'results', [])
          if (results.length > 0) {
            const lastresult = _.get(results, `[${results.length - 1}]`)
            const m = {
              text: _.get(lastresult, 'alternatives[0].transcript', ''),
              final: _.get(lastresult, 'final'),
            }
            window.sap.cai.webclient.onSTTResult(m)
          }
        }
    
        wsclient.onclose = (event) => {
          console.log('OnClose')
        }
    
        wsclient.onerror = (event) => {
          console.log('OnError', JSON.stringify(event.data))
        }
      },
    
      sttStopListening: async () => {
        console.log('StopListening')
        const client = wsclient
        setTimeout(() => {
          if (client) {
            client.close()
          }
        }, 5000)
      },
    
      sttAbort: async () => {
        if (wsclient) {
          wsclient.close()
          wsclient = null
        }
      },
    
      sttOnInterimAudioData: async (params) => {
        if (wsclient) {
          const [blob, metadata] = params
          wsclient.send(blob)
        }
      },
    
      sttOnFinalAudioData: async (params) => {
        if (wsclient) {
          const [blob, metadata] = params
          console.log('Sending final blob')
          wsclient.send(blob)
          console.log('Sending stop')
          wsclient.send(JSON.stringify({
            action: 'stop',
          }))
        }
      },
    }
    
    window.sapcai = {
      webclientBridge: sttIBMWebsocket,
    }
    

    SAP Conversational AI expects these speech-to-text functions in the object window.sapcai.webclientBridge:

    • sttGetConfig: When starting the media recorder.
    • sttStartListening: When user clicks the microphone. This starts a websocket with the IBM Cloud service, and creates callbacks for the service when speech is transcribed.
    • sttStopListening: When user stops the recording or the recording times out because there is silence.
    • sttAbort: When the user clicks the abort button.
    • sttOnInterimAudioData: When chatbot receives interim recording.
    • sttOnFinalAudioData: When recording is over and final, full recording is received.

    We could have simplified everything and used a single file for all our code, without calls for the implementation.

  • Step 5

    In the code in webclientBridgeImpl.js, add the service URL and token based on your account at IBM.

    You can check your account for your key and base API URL:

    Service key
  • Step 6

    The JavaScript files we just added need to be loaded by the app, so we add dependencies in the SAPUI5 view controller to these files, too.

    JavaScript
    Copy
    sap.ui.define([
        "profilePic/controller/BaseController",
        "sap/m/MessageToast",
        "sap/m/MessageBox",
        "sap/ui/core/Core",
        "sap/ui/model/json/JSONModel",
        "sap/ui/Device",
        "sap/suite/ui/commons/library",
        "sap/m/Dialog",
        "sap/m/DialogType",
        "sap/m/Button",
        "sap/m/ButtonType",
        "sap/m/TextArea",
        "./webclient",
        "./webclientBridge",
        "./webclientBridgeImpl"
    ],
        function (BaseController, MessageToast, MessageBox, oCore, JSONModel, Device, SuiteLibrary,
            Dialog, DialogType, Button, ButtonType, TextArea,
            webclient,webclientBridge,webclientBridgeImpl) {
    
  • Step 7

    The app should be ready to run with speech-to-text in your chatbot.

    1. Run the app.

    2. Open the chatbot.

      Open chat
    3. Click the microphone, and allow the browser to use the microphone.

      Start microphone
    4. Say something.

      The text goes in the temporary transcription area, as you say the words:

      Transcribing text

      When you stop talking, the temporary transcription area closes, the full text is transferred to the conversation, and the chatbot answers.

      Link text e.g., Destination screen
  • Step 8

    Which of the following tasks does the SAP Conversational AI speech-to-text feature handle?

Back to top