3 Reasons You Need AI Voice Agents in Your Business - EP 014

Show notes

In this episode of the AI Boardroom, we discuss AI voice agents, their challenges, and various applications. Voice agents allow for natural conversations with users, replacing the need for text input. There are two approaches to voice agents: synthesizing text and converting it to audio, or directly processing audio input and generating audio output. Challenges include latency, lack of interruption capability, and the loss of emotional context when transforming audio to text. We also explore use cases such as sales follow-ups, customer service, and healthcare appointment scheduling.

Show transcript

00:00:02: Hello and welcome to this week's episode of the AI Boardroom.

00:00:09: And yeah, so Lana, it's been a minute, summer break.

00:00:13: Yeah, there's been a minute, but we're back into it.

00:00:17: So here to talk about AI voice agents.

00:00:20: So what are they?

00:00:21: What are some challenges organizations are experiencing by introducing them and some

00:00:26: different applications?

00:00:27: So we're going to go through

00:00:29: couple of practical business application, how different business verticals

00:00:34: industries are using them.

00:00:35: So I'm really excited about this one because yeah, I think it's been trending a

00:00:40: lot, especially with the voice agents and more and more organizations.

00:00:44: I think more in the US than here, like here people still trying to get chatbots

00:00:47: working.

00:00:50: And here I think he means Germany.

00:00:53: So we are kind of on two different sides of the world.

00:00:57: Yeah, I think the voice agents, I think, as the voice technology is becoming better

00:01:03: and better, and we're kind of seeing it from different angles, music is now using

00:01:09: it.

00:01:10: I think there's, you know, Chad GPT came out with, what is the name of...

00:01:16: Wanted to come out with like, so still...

00:01:18: Want to, I'm sorry.

00:01:20: Still trying to, yeah.

00:01:23: But I think that the voice technology is really...

00:01:26: kind of improving, which, you know, kind of makes it very lucrative for

00:01:31: organizations to be like, Hey, is this something that is ready now for us to

00:01:36: replace agents, kind of the call center agents and other places, which, yeah,

00:01:42: we'll talk about more, but do you want to just dive in and maybe let's, let's start

00:01:46: with defining what are voice agents?

00:01:48: Yeah.

00:01:49: So, yeah, basically, like the, the general definition is.

00:01:55: that we have some input that we usually get as text, which we now get as voice.

00:02:02: So we can have a call or a conversation, yeah, in a natural manner with our voice

00:02:09: instead of like typing stuff in.

00:02:11: So that's the basic idea.

00:02:13: And we basically have technically like two ways to approach it.

00:02:16: So we have the first way would be to actually take what we do in our chatbots,

00:02:23: take the text.

00:02:25: just synthesize it and yeah, give back the synthesized audio so that you, instead of

00:02:33: giving back the text, you speak.

00:02:35: This has several challenges, which we will go into in the next part, but that's

00:02:43: basically the one thing.

00:02:45: The new thing that's coming around now is that the LLMs getting multimodal, so they

00:02:53: don't...

00:02:54: take the input text, get text out and you synthesize it, but you give them directly

00:03:01: the audio feed and they spit out audio on the other side.

00:03:06: So that's basically the idea so that you don't need to go through all the synthesis

00:03:13: tasks, which all costs time, but you're able to get direct voice output.

00:03:20: That said.

00:03:22: It's still really early and we basically don't have a model yet, which makes this

00:03:27: really applicable.

00:03:29: There was one released this week.

00:03:31: I think it's called Mosh or something, which open source such a technology, but

00:03:38: the model itself is pretty, yeah, pretty early stage.

00:03:42: So, but yeah, it's getting through to be actually usable.

00:03:46: But I think through this episode, we might lean more on the classical way of

00:03:52: transforming the text.

00:03:55: But yeah.

00:03:56: Yeah.

00:03:56: And I think with another notable difference, because what's different

00:04:00: between the phone agents that you can have the robotic voice answering the 1 -800

00:04:05: number and this AI kind of voice agents we're speaking about is because there's

00:04:11: some intelligence behind it.

00:04:12: So the traditional 1 -800 number like press dial two for this option, dial three

00:04:19: for this option are very much rule -based.

00:04:22: types of systems.

00:04:23: So there's predefined script.

00:04:25: So someone like literally creates like a tree of decisions and tree of different

00:04:30: tasks for this robotic voice to, to voice out.

00:04:34: And it's very constant with AI voice agents.

00:04:38: They are tied to some intelligent systems.

00:04:40: I think here you're referring to large language models that are able to pivot.

00:04:45: So there's not a predefined script.

00:04:47: So depending on the context and you can be very

00:04:50: You can be very interactive with this agent in a way, so you don't have to say

00:04:55: things in a particular way, enunciate them in a specific way without an accent.

00:05:00: And it can pivot, not based on the predefined script or tree of decisions.

00:05:05: It can actually go different paths.

00:05:09: And sometimes instead of taking you to another agent, like a live agent, it can

00:05:13: answer those questions live based on the knowledge base that it kind of has access

00:05:18: to.

00:05:19: Yeah, I think that's also a differentiation factor that I think is

00:05:24: important to highlight.

00:05:26: Yeah, definitely.

00:05:29: So, yeah, that being said, I think we should directly switch to the challenges

00:05:35: topic.

00:05:36: So, yeah, like I said before, so we have some technical, some also, yeah, like soft

00:05:47: skills stuff.

00:05:50: Talking about the technical stuff.

00:05:52: So one major thing that really breaks the immersion is always the latency.

00:05:59: So if I say something and I need to wait seconds without any response from the

00:06:05: system, it makes the waiting awkward.

00:06:11: And yeah, it kind of feels like you're going a step, waiting, then going the next

00:06:16: step, waiting.

00:06:17: So yeah, that's basically where you're at when you have too much of latency.

00:06:23: Which is not natural, I think.

00:06:25: And that's why I think just to kind of give an example, it's like, you and I are

00:06:29: kind of talking and then I'll ask you a question and you're like, silence

00:06:34: processing is just very unnatural because we're not really sure.

00:06:38: Is it because you don't really have that immediate feedback, right?

00:06:43: Like, did the understanding of my question and processing is a...

00:06:48: Am I going to get hung up on?

00:06:50: And so I think that inference is the reason why it's important.

00:06:53: That delay in getting a response is because it's just not natural for us to

00:06:59: wait that long.

00:07:01: And it's actually quite off -putting to customers to have to wait for a response.

00:07:07: And it's also even more awkward because they don't have a face.

00:07:10: Now we do, of course, a podcast, but we also have a video feed to that.

00:07:14: So if you're talking

00:07:17: And if you ask me something and then you wait for an answer, you see that I'm

00:07:22: thinking actually.

00:07:23: So we get some feedback.

00:07:26: If you have only a call, you might ask yourself, like, are we disconnected?

00:07:32: Is there something wrong?

00:07:33: Like, did you not understand?

00:07:35: Because you don't get any feedback because that's what you have to work with.

00:07:39: There are mitigation strategies.

00:07:43: So you could.

00:07:44: I spoke to a company, they actually, that's their main product is call services

00:07:49: automated through cloud systems.

00:07:55: And they, like you said, had basically the old call screen stuff where just like step

00:08:01: -by -step goes through something.

00:08:02: And now of course implement AI.

00:08:05: And what they did is they tried to first reduce the latency as far as they could,

00:08:09: but they still have several seconds delay.

00:08:12: And what they do is like,

00:08:13: on a random basis, they put in like arms and type, even typing, even like typing of

00:08:24: a keyboard or something to simulate that someone's searching up something in the

00:08:29: system.

00:08:30: So, and latency stays the same, but the perceived time of waiting is a lot more,

00:08:37: it's a lot more enjoyable to the customer because he then has some feedback.

00:08:42: It's the same if you're on the website, you click a button and nothing happens,

00:08:45: even if the actual action gets through.

00:08:50: It's just a bad experience.

00:08:52: And if you don't have a video feed, it's the same on audio.

00:08:56: If you wait without any feedback, it's just a bad experience.

00:09:01: I like that.

00:09:01: I actually, I can appreciate them thinking kind of going beyond, above and beyond.

00:09:06: And you think again, what we talked about previously about ChadGPT releasing their

00:09:10: kind of voice agent.

00:09:13: They did, I think, include the ums and ifs.

00:09:16: And so when even if there was some influence...

00:09:20: But that was actually the model.

00:09:24: That was not to mitigate time delay.

00:09:27: They actually did all the...

00:09:29: okay.

00:09:29: It was part of their personality.

00:09:31: Yeah, actually.

00:09:34: And that's what they did.

00:09:37: They removed all the processing between by...

00:09:40: the model the ability to produce audio, to like get audio and to produce audio

00:09:45: without any intermediate steps.

00:09:47: Same for images, by the way.

00:09:48: But we still don't have access to the full model.

00:09:51: So I cannot give too much detail about that.

00:09:54: But that's basically what they do.

00:09:56: They created a multimodal model, which means basically get different modalities

00:10:02: in and get different modalities out.

00:10:04: You could also...

00:10:05: in the future, for example, say you get audio in and get an image out.

00:10:09: So that's the beauty of multimodality.

00:10:12: So yeah, and that's actually why they got this natural conversation to that point

00:10:18: because they reduced latency to milliseconds.

00:10:23: And well, which you have a natural conversation and that also, and that's the

00:10:30: next challenge, you cannot interrupt.

00:10:34: So

00:10:35: Think about you take text, you put it in a machine and wait for the audio file and

00:10:40: then you can play the audio file.

00:10:41: In all of the process, if someone's interrupting you, it's not easy to react

00:10:48: to this because you play the whole audio file.

00:10:52: You might interrupt the audio file, but it will be even more natural than the

00:10:57: conversation was anyway because you have a lot of disturbances.

00:11:01: And if you do the new GPT -4 .0 way, or I think Google Gemini also wants to

00:11:07: introduce this, you're actually able to interrupt the model and it reacts

00:11:12: naturally to the interruption like a human would.

00:11:16: And that's kind of a big win on their side too.

00:11:21: Yeah, and I think we talked about just to drive home that kind of how it works

00:11:26: behind the scenes.

00:11:28: So it's almost like a radio.

00:11:29: I don't know if you've ever...

00:11:31: when you had a child, when you were a child had these radio things that you

00:11:35: walked, you know, across different rooms and you're like, Hey, can you listen to

00:11:38: me?

00:11:38: You have to press the button down.

00:11:40: Like a walkie talkie type of thing.

00:11:42: Yeah.

00:11:42: And then you press down and then the other person can listen when you talk, but they

00:11:47: can't press that, that thing and you hear it then vice versa.

00:11:52: And so that's typical.

00:11:52: That's basically the state of voice agents currently that are again, apart from some

00:11:57: of these newer technologies that aren't yet.

00:12:01: to the public, it is very like take turns and wait until kind of you're done with

00:12:09: your part and it stops.

00:12:11: And then it takes their part.

00:12:13: Like basically AI takes time to actually process and return a response for you to

00:12:19: take your turns.

00:12:20: But what I think you're referring to is with open AI, you can actually interrupt

00:12:26: mid sentence and

00:12:29: have the AI system pivot basically in a different direction without completing

00:12:34: that specific task, which I think could be game changing actually.

00:12:38: Definitely.

00:12:39: Yeah.

00:12:40: I don't have any info about how long can a conversation be before it reaches context

00:12:46: limits and stuff.

00:12:48: But I think they will stick to the 128 ,000 tokens they have right now, which to

00:12:55: give you some point of reference.

00:12:59: The first book of Harry Potter from the text tokens would be 117 ,000 or

00:13:08: something.

00:13:09: So you're saying it's a book of Harry Potter, basically?

00:13:12: The context window.

00:13:14: So it might be different for audio tokens.

00:13:18: I don't know.

00:13:20: But yeah.

00:13:20: That would be awesome.

00:13:22: Because I would imagine that companies have...

00:13:26: either voice scripts or policies that are hopefully less than a Harry Potter book.

00:13:34: So I can give you some promise that it would in the future, sometime in the

00:13:40: future when some of this technology, especially from OpenAI is released.

00:13:43: But I think that there's some of this already available and again, but we'll go

00:13:46: into use cases, but it's already being used.

00:13:49: It's also not far off.

00:13:51: So we actually will get access to these technologies pretty soon.

00:13:55: So yeah.

00:13:56: We will also, if any updates might be delivered on what we talk about today,

00:14:03: then we record another episode.

00:14:05: Promise you that.

00:14:07: One challenge I want to emphasize on that note, which also differs the direct

00:14:13: system, which was upcoming and the actual system that you could implement today, is

00:14:20: emotions.

00:14:21: So if you transform audio to text,

00:14:26: You most you use like one channel of a conversation, which is like the the

00:14:31: subtext, the emotions, the tension in the voice, stuff like that.

00:14:35: You completely lose that.

00:14:39: And you could go ahead use like there's a human eye, I think they have they focus

00:14:46: only on emotion detection in like basically every modality.

00:14:50: But that would add latency to your response.

00:14:54: So

00:14:56: You really have to weigh how much value it delivers to you before you actually use

00:15:02: stuff like that.

00:15:05: Could be also interesting in something like a chatbot to actually evaluate the

00:15:11: emotions to improve your answer.

00:15:17: But yeah, that's something that if you process the audio directly, which the new

00:15:21: GPT -4 .0 chatbot does, for example, it reacts and it also changes its voice based

00:15:29: on what your emotions are.

00:15:31: While today, it's really a manual process, which is kind of tedious to implement.

00:15:38: And even if you got it to a point where it's kind of working, you might run into

00:15:43: some issues.

00:15:44: So...

00:15:45: If you have something where a lot of emotions would be needed to be tracked,

00:15:50: hotline for first aid for people with any problem, then the where emotions have a

00:16:02: big part in understanding the people, the human on the other side.

00:16:07: You might wait for newer technologies because or add latency to the pipeline.

00:16:14: by knowing you could deliver a better response, still, yeah, you have to

00:16:19: evaluate if there is a need of processing the emotions in the message.

00:16:27: Yeah.

00:16:28: So these agents can get then quite complex, because I think if you're

00:16:32: accommodating by filling in the gaps or the ums and the thinking or...

00:16:42: whatever, and then considering emotions and things like that.

00:16:46: So there's multiple factors.

00:16:48: I don't know that they're challenges, but I think they're really important

00:16:50: considerations for how do you actually deliver the best customer experience for

00:16:55: your target use case?

00:16:57: And I think that's where we're probably headed in next is like, how are these

00:17:01: organizations and like, what are they using these agents for?

00:17:04: And it's, again, best AI is aligned to the business objectives and cater to your

00:17:10: customers.

00:17:10: And I like that you've

00:17:11: given that example about kind of emotional state may be important for like hotline

00:17:18: for like mental health or I don't know, some other kind of response system where

00:17:26: emotions are really part of your business objectives like to detect them.

00:17:30: And sometimes that's the reason why you have voice agents 24 seven to basically

00:17:35: the one that better detects them than a human.

00:17:38: But there are, again, like emerging approaches to actually help with some of

00:17:42: that.

00:17:43: But just talk of to kind of dive into some of these other use cases that we're seeing

00:17:48: a lot of and you probably have heard of.

00:17:50: And I think this one's an exciting one that I've kind of heard just to give you

00:17:55: an example of how else you can be thinking about agents as sales.

00:18:00: So I saw a recent use case and I think some demo, I want to say in the last

00:18:04: couple of weeks where

00:18:06: the voice agents are actually used to do a follow -up post bookings that are made on

00:18:12: the website.

00:18:12: So think real estate, building new communities, you fill in the form, you

00:18:17: wanna talk to an agent, if it's over the weekend, then chances are the agents are

00:18:22: not working, they're taking time off.

00:18:25: And there's research, and don't quote me on this, but there's research that states

00:18:30: that if you don't return a call, the better conversions are...

00:18:35: best within a certain time period.

00:18:37: So you get the best conversion if you have a return call within, let's say, the first

00:18:42: hour, and then it declines after that.

00:18:45: So if you do wait two days before reaching back to that client, the chance of that

00:18:50: client converting to a paying customer is less and less.

00:18:54: And so voice agents are now used to do follow -ups.

00:18:57: So when someone completes the forms online and within minutes, sometimes you can

00:19:03: even...

00:19:04: put a timer on it, within five minutes, they'll return a call asking you questions

00:19:10: that clarify your responses.

00:19:12: So typically, like these forms on the website, so typically like legions, so

00:19:16: they're very light, you're probably asking them for a name, phone number and email,

00:19:20: something light.

00:19:21: So this agent is used to then collect additional information so that when a live

00:19:27: agent does call to do this relationship building, they have more data to then look

00:19:32: into.

00:19:33: maybe that customer's background, look into options and things like that.

00:19:37: But it also gives you a touch point.

00:19:40: So instead of just having this kind of informal conversation with the website,

00:19:46: you now have talked to an agent.

00:19:48: So that's another touch point for the company.

00:19:50: And then it gives you information to build a better relationship the first time an

00:19:54: agent, like a live agent, talks to that customer.

00:19:57: So again, you're kind of building up that relationship through these multiple

00:20:01: touch points, but the business objective there, especially being kind of in sales,

00:20:06: real estate, again, you want to get more people converting.

00:20:11: And so for their perspective time, the response time for that follow -up is

00:20:16: really important.

00:20:17: And that's where kind of these voice agents come in and really deliver on that

00:20:22: value for them.

00:20:23: So.

00:20:24: Yeah.

00:20:26: I think you mentioned something important.

00:20:28: So you have background information of the customer.

00:20:30: And the good thing is if you have an agent that might do the call for you, you can

00:20:35: collect a lot of context upfront.

00:20:37: So you don't have to do it during the call.

00:20:41: So on every new lead that comes in, you try to collect as much information as you

00:20:46: can.

00:20:46: Before I had to take seconds with an AI system built for that.

00:20:50: And then you already have something you could combine with a call script.

00:20:55: By the way, it's one of the first things I did when I was doing my first blog article

00:21:03: on AI.

00:21:04: I took HubSpot call scripts and made a pipeline that if a call center agent is

00:21:14: talking, that the system would receive the message and based on the call script give

00:21:18: you the correct answer, or like the best answer by the synthesis of

00:21:24: what's incoming, what we found in the vector database.

00:21:31: And I was hugely impressed because I got the latency of that down to like one and a

00:21:36: half seconds.

00:21:36: And it was just the first try.

00:21:39: I didn't even optimize anything.

00:21:42: So yeah, that's basically something you could apply to all your AI approaches.

00:21:48: Like what can I do beforehand to then make

00:21:53: make the actual answer quicker and better at the same time.

00:21:58: Yeah.

00:21:59: And I think it's, again, I think the beauty of these AI agents and what you're

00:22:02: kind of talking about is having that context, but also having more of a human

00:22:07: interaction because of this intelligence that is built into it.

00:22:11: And so this make it feel, again, more research that says on this, I think people

00:22:16: acknowledge that it's not a realistic conversation, but...

00:22:21: it is better than the voice prompting and they still find it useful to give

00:22:27: information so that they're not wasting time later.

00:22:29: And they're still kind of in that state of providing more information.

00:22:33: They're curious, they've just been on your website, so it's top of mind for them.

00:22:37: So they are kind of in that prime state to give information rather than, again, if

00:22:41: you call them two days later to follow up with the live agent, they're like, I

00:22:45: forgot why I was searching your website.

00:22:48: Nevermind, right?

00:22:49: kind of capture them in the state and have that conversation, even if it's not the

00:22:54: most ideal.

00:22:55: But people, again, are willing to interact.

00:22:58: And again, some of the early research with these voice agents saying that people are

00:23:04: seeing value in those conversations, even if they are not, again, probably the

00:23:11: OpenAI level yet.

00:23:14: Yeah.

00:23:15: But still, I think that there's

00:23:16: some really exciting use cases and sales, real estate that you can explore with

00:23:23: voice agents.

00:23:25: Yeah, I think customer service is another one that we're seeing.

00:23:29: I don't know if you kind of have any use cases that you are kind of specifically

00:23:35: thinking of.

00:23:36: So I have a use case I would have wished I could have.

00:23:40: We were flying with kids and you can

00:23:46: you're allowed to take one seat per kid, like a car seat and one stroller.

00:24:00: I was like, are there any restrictions?

00:24:02: How does it work?

00:24:04: I wasn't getting any proper information.

00:24:09: But just like, yeah, you can.

00:24:13: You can do stuff.

00:24:16: Because usually when you fly you have like a billion restrictions like for everything

00:24:21: and everything's like and I just would have wanted to have an AI which has a

00:24:25: proper database Talking about data Which has proper database of explanations maybe

00:24:33: an fhq whatever Who could could just answer my question in a quick way?

00:24:40: Yeah, and I would have preferred the voice agent

00:24:45: over a chat at that point, because I wanted a quick answer.

00:24:50: And all this chatting takes more time than me just speaking it out and getting an

00:24:55: answer within seconds.

00:24:56: It's just faster than typing.

00:24:58: So it's generally, I think for customer service, it's even the better use case,

00:25:02: because it's not even close to being as slow as the old calling systems are.

00:25:12: And it saves time compared to

00:25:15: compared to stuff.

00:25:16: And you don't have to get as much information.

00:25:18: Like you just have to get like standard information from a vector database, which

00:25:21: is pretty quick, handed over to the AI, synthesize the stuff and go out.

00:25:26: So I think customer service is more often than not pretty straightforward because

00:25:33: like most of the things that are asked are pretty simple and mundane.

00:25:40: So yeah.

00:25:41: I think I am looking forward to add it to our own customer service solution too.

00:25:46: So yeah.

00:25:47: And I think one thing about these, just kind of to drive home on the customer

00:25:52: service, I do not, I hate talking to chat bots on websites.

00:25:59: Like hate is a strong word, but I do because of like, it is so trial and error

00:26:03: and that is very scripted, right?

00:26:05: Even like those agents are the equivalents of like these.

00:26:08: prompt systems.

00:26:09: And so the chances of you even going through the agent that is not AI enabled

00:26:14: to get a response.

00:26:15: It's like the success rate is so low that I don't even try.

00:26:20: And so I usually like my default is like you, and especially in circumstances where

00:26:24: you have your child or children, you're traveling, like you're not going to like

00:26:28: have a stroller and like a kid and then start a type questions and interact with

00:26:33: the chatbot.

00:26:34: Like for customer service, I think it is important

00:26:38: for you to deliver that quick response systems.

00:26:41: And so usually for me, I'm like, okay, what's my success rate if I call this 1

00:26:45: -800 number and press 0 -0 -0?

00:26:49: Until I get to the right one.

00:26:51: Yeah, even if you think about it, even if you can just have an AI system, understand

00:26:55: the request and route the customer to the right endpoint.

00:27:01: So even that would already be a huge improvement on what we have today in most

00:27:05: of the systems.

00:27:08: So yeah, to make me happy and press less of the zero buttons.

00:27:13: Because I do, like literally, I just need to speak to an agent that I just need a

00:27:16: quick response instead of navigating all of these kind of prompts.

00:27:22: If it accepts voice input, I always like directly try to human agent, human agent,

00:27:29: human agent until it just gives up and gives it, puts me through.

00:27:33: Exactly.

00:27:33: Yeah.

00:27:34: And I think a lot of it actually is like when you do these prompt systems, I don't

00:27:38: know if people, I guess those who are listening who operate, you know, customer

00:27:42: service or call center agents, a lot of these promptings are often used for

00:27:47: analytics.

00:27:48: They're not actually helping you to divert to multiple different centers.

00:27:54: Sometimes they do.

00:27:55: So if you have like a really, really huge multi -product service kind of

00:27:59: organization, you have no...

00:28:02: for example, that way.

00:28:04: How many times I've called like maybe smaller offices and you have like 500

00:28:08: different things, different buttons and like things, decisions to navigate and you

00:28:13: still land on the same person at the end.

00:28:16: And it's like, so I do think that in some cases, it is mostly used for analytics to

00:28:23: understand like staffing needs.

00:28:24: So if there's more types of questions that are used in like one part of the, you

00:28:29: know,

00:28:31: people are pressing more of these functions.

00:28:33: They're like, maybe we need to staff more people or educate our people on these

00:28:37: types of questions and then have very nuanced staff.

00:28:41: But sometimes if you do have a call center that is across different products, so for

00:28:46: example, you need specialized staff in those.

00:28:49: But yeah, it's just not really helpful.

00:28:51: And sometimes I would rather just go to the general agent, who is the router for

00:28:57: them all, who can answer my question.

00:28:59: then go through these prompts.

00:29:01: And sometimes again, like this is where I think an AI agent, like why can't the

00:29:05: person when they call the 1 -800 number hear a voice speaking and ready to answer

00:29:12: a question?

00:29:13: If it's at least somewhat conversational and not like a predefined script, it

00:29:20: already improves the customer experience by a lot.

00:29:24: And with that customer satisfaction and with that maybe also

00:29:30: reduced churn, whatever.

00:29:32: So, yeah, one thing I also want to add is you can be multilingual from the get -go.

00:29:39: So, at least like in the main languages, I'm not sure how the whole voice synthesis

00:29:44: works with like smaller or like not as much used languages, but if you have

00:29:50: French, English, German, it's pretty much solved.

00:29:54: So, yeah, if also think about like

00:29:57: Do you have an international customer base or even for like internal stuff?

00:30:01: Do you have different subsidiaries of your company which use different languages?

00:30:06: Maybe a centralized knowledge management could be done by a voice agent, which is

00:30:13: then also like taking the English text and just translates it properly because that's

00:30:18: something that like large language models do basically perfectly.

00:30:22: So, yeah.

00:30:23: Yeah.

00:30:24: And I think just to demonstrate like,

00:30:26: Maybe another use case that I think I'd love to kind of dive in.

00:30:29: So right now we've kind of talked about cutting the response rate and then

00:30:33: responding to customers immediately to close out the rate.

00:30:37: So that was like one objective, business objective that you want to do and use

00:30:41: voice agents for.

00:30:42: Now we talked about improving customer experience.

00:30:45: How do you actually deliver the best experience to your customer to increase

00:30:49: satisfaction, your kind of rankings for the organization.

00:30:55: But and then the other extreme I would say like you don't have enough staff and I

00:30:59: think a good example that hits home is within healthcare You do have you know

00:31:04: specialized clinics?

00:31:05: Let's say and you have higher demand than you have the supply of agents and in this

00:31:11: case You know where the people who are behind the phone often especially in these

00:31:14: specialized clinics or in taking appointments or who are in taking

00:31:19: information from Patients they there's not enough of them

00:31:23: So in my experience, again, if you do have specialized and very nuanced clinics, you

00:31:29: just, they're often operated by nurses.

00:31:32: Right now there's a nurse shortage kind of behind it.

00:31:36: It's just not enough people to answer the phone because they have to be highly

00:31:42: skilled, trained, and you know, you can't, like the ramp up time for those types of

00:31:47: workers is not that quick.

00:31:49: So how can you augment the,

00:31:52: you know, these nurses, what part of their workflow or data collection can you take

00:31:57: in order to leave just kind of, again, the nuanced work that you do need the nurse

00:32:02: for in order to validate?

00:32:05: So like there are use cases where, you know, you can even do patient validation.

00:32:09: So for example, like you can hook up these voice agents to APIs behind the scenes to

00:32:15: validate the customer, you know, or validate certain conditions or validate to

00:32:21: data that's in electronic health records, for example, validate to the appointment

00:32:27: type that they're seeking, collect some information, determine the appointment

00:32:31: type, and then maybe can go as far as like actually scheduling.

00:32:35: So all of these tasks, you can't necessarily delegate because there's some

00:32:40: nuance and like some training that is involved, like for you to determine like

00:32:43: the types of visit that patient is looking for.

00:32:47: So for example, if it's like a cancer clinic,

00:32:49: and they need to schedule a biopsy and then they need to schedule like lab

00:32:53: results.

00:32:54: There's some logic to the way that these nurses are organizing.

00:32:59: And then there's timing between the last labs and the new lab that you can't

00:33:04: necessarily just apply a rule -based kind of algorithm.

00:33:07: There's just like validation that happens.

00:33:10: And that's why, again, these highly specialized nurses are operating these

00:33:13: phones.

00:33:14: What happens when you don't have enough of them?

00:33:16: Do you turn patients down?

00:33:17: Yeah.

00:33:18: And do you really want your nurses to do phone work?

00:33:23: Like that's, I think that's the most important stuff.

00:33:26: So like they have to do healthcare, like why should they do the phone job?

00:33:31: So, of course you can't like to be, to be completely real here, you cannot have a

00:33:39: hundred percent solution, but would it benefit you to free up 80 % of your stuff?

00:33:44: Yeah.

00:33:45: I guess so.

00:33:47: Exactly.

00:33:48: Especially if it's...

00:33:49: And it's also maybe fulfilling a shortage, right?

00:33:53: So you could be taking on or scaling and taking in more clients or helping more

00:33:59: customers if you had basically more of that supply, which you can again, fulfill

00:34:05: with voice agents.

00:34:07: So it really goes back and I think we talk about this a lot, but it comes back to

00:34:10: what problem are you trying to solve?

00:34:13: I think voice agents are a great way to scale your customer service operations,

00:34:19: like kind of behind the scenes work, whether it's to follow up, to provide

00:34:23: better experience or to augment your existing people to like free up their

00:34:28: time.

00:34:29: I think they're really great use cases.

00:34:31: Definitely.

00:34:33: Cool.

00:34:33: I think we had a lot of use cases now.

00:34:37: And I think general application of voice agents got through pretty

00:34:43: pretty well.

00:34:44: So yeah, I hope you out there are the same.

00:34:50: And yeah, enjoy this episode as much as we did.

00:34:54: Because, Lana, we already got to the end.

00:34:57: Yeah, I'm flying by it's like ridiculous.

00:35:00: Like it's always the same on the podcast starts the first 10 minutes you think,

00:35:04: what should we talk about like in 10 minutes, and then the next 10 minutes or

00:35:08: the next

00:35:09: thing that felt like 10 minutes was actually 25 minutes.

00:35:14: That's great.

00:35:15: I mean, I think we're both really deeply passionate about the AI and this topic and

00:35:21: really sharing our experience.

00:35:22: So yeah, definitely.

00:35:23: I agree.

00:35:24: Good to be back.

00:35:25: Time flies.

00:35:26: Yeah.

00:35:27: So thank you for joining us.

00:35:28: I'd love to invite you to subscribe to us if you enjoy these types of sessions that

00:35:35: we have.

00:35:36: I'll leave a comment.

00:35:37: let us know how we could improve and what topics you want us to cover next.

00:35:42: Thank you for joining.

00:35:43: And think about liking and subscribing the video if you liked it.

00:35:49: And subscribing to the channel, of course.

00:35:51: So yeah, really appreciate you.

00:35:54: And thanks for listening.

00:35:55: See you next time.

00:35:57: Bye.

Show notes

Show transcript

New comment