In my projects, I see again and again how much expectations for voicebots have increased. Many organisations go into it with the expectation of building a fully-fledged AI-supported service within a few weeks. The reality looks different. Voicebots can do a lot today, but they need a solid foundation, clear objectives and organisations who are willing to invest a little more energy at the beginning.
In my webinar (in German only) as part of the 16th Customer Service Week, I presented two projects that show very well what is possible and which pitfalls you need to be aware of. Both use cases come from the public sector, and both had the same goal: to relieve the service centre and automate standard enquiries. Nevertheless, the projects turned out very differently.
A voicebot as the first point of contact: What works well and what doesn’t
In the first project, the challenge was a very high call volume. Staff were constantly interrupted while also supporting citizens on site. The goal was to reduce the workload by at least 50%.
The authority therefore decided to route all public calls to a voicebot. Only when the bot could not resolve the enquiry could the caller leave a callback request. These callback requests are sent as an email with a summary and a detailed transcript to the mailbox of the authority’s staff so they could handle them when time allowed.
The approach sounded simple at first. The idea was to take the existing FAQ texts from the website and use them to build a bot. We therefore started with a classic NLU (Natural Language Understanding) bot. The customer did not want to risk any hallucinations and insisted on a strictly defined customer journey.
But this is exactly where we realised how challenging the topic actually is. Website content is not written for voicebots. It often grows over time, with overlapping or imprecise content. A bot, however, needs clear, unambiguous and well-structured information. The dialogues felt stiff and would not have satisfied callers.
Switching to an LLM (Large Language Model, LLM) bot was the real turning point. The enthusiasm was great: human dialogues, empathy, different formulations. The bot said “Oh, sounds wonderful” or “I’m sorry to hear that”. The bot was able to answer the same question in different ways and reacted more humanly without leaving the defined guidelines.
Measurable results
The result was clearly measurable. Only around 20% of the original calls ended up as callback requests for staff. Most cases were neatly resolved by the bot. A very nice side effect: repeat callers disappeared because the bot does not have a “busy” signal. Employees now have much more time for citizens on site. This was exactly the relief the customer had hoped for.
The bot hangs up on 15% of callers, mostly after successful conversations. Between 30 and 40 % of callers hang up themselves. Here, we had to look at the transcripts to assess whether this was good or bad.
What we learned from this project
Human resources
You can achieve quick wins, but no voicebot project works without active involvement from the customer. This includes maintaining the knowledge base, creating a glossary, testing, compiling a clean pronunciation lexicon for text-to-speech, and revising content. In addition to a subject-matter expert, there should be technical support that takes care of the APIs and has knowledge of scripting in JavaScript or XML. Testing resources are also required. For a project of this size you need at least two dedicated customer-side contacts, ideally three.
LLM knowledge
Another important point is the LLM’s built-in general knowledge. A model always brings its own background knowledge. It must be actively constrained so that it remains within the provided content. This works well, but not one hundred percent. You need to test, correct and adjust whenever necessary.
Naturalness
When a citizen asks a question, the bot “thinks” – speech to text, text in the LLM, text to speech. These pauses must be filled. Typing sounds, small hesitation markers, variations in volume. Background noise variations help make the dialogue feel natural.
Barge-in
Most bot providers are now also capable of what is known as barge-in, i.e. interrupting the dialogue. The bot must accept that it will be interrupted. For example, if the caller has already received the information they need, they should not have to wait through the rest of the bot’s scripted output. The downside is that background noise can also be interpreted as an interruption, for example on a train platform. Sensitivity settings can solve this.
A second example: When call forwarding is part of the process
The second project also involved FAQs, but the requirements were different. The bot was supposed to answer enquiries and relieve the service centre – the target was also 50%. However, during opening hours, calls needed to be forwarded to backend or contact centre staff if required. There was no callback option. In addition, the existing Avaya system had to be integrated with regard to forwarding from the bot to the employee.
Technical integration
Technical integration was a key aspect. The call comes into the call centre, is deflected to the voice bot, and should be returned to an employee if necessary. The routing identifier must be provided and reflected back – via user-to-user information or SIP header manipulation. This ensures the call appears correctly in the historical reporting.
We also integrated Parloa’s Call Data Service. The call summary appears in the Avaya front end before the employee accepts the call. This allows agents to see in advance what the call is about.
Results
Once again, we had the same pattern at the beginning: the customer wanted to import the website FAQs. And once again, this did not work. The content had to be revised manually.
A 50% forwarding rate was achieved – a partial success. Ideally, forwarding would not be needed, but the other 50% were automated successfully.
There was a 10% hang-up rate by the bot – a success after reviewing the transcripts. 40% hang-up rate by the customer – this needs to be looked at more closely. Is it a success or not? The transcripts need to be checked again, and the conversations reviewed. One option is to conduct a satisfaction survey at the end of the conversation in the bot.
Lessons learned from the second project
Time required
We had a project duration of 3 months. That worked out well. Design, configuration and technical testing on our side took approximately 20 person-days.
Human resources
We had too few customer resources here. This led to additional work for us that was not planned.
Opt-in
An important point for both bots. The citizen must consent to the recording. The bot always records everything – full transcripts are always available, regardless of whether they are deleted after 7 or 30 days.
The opt-in must be requested. Citizens should also have the option to say, “No, I don’t want to continue here.” Or they can continue – and, as we saw in the first case – request a callback and provide only minimal information about themselves.
Here, it can make sense to start with NLU, i.e. the structured approach, and then continue with the LLM-Agentic approach to ensure opt-in is handled correctly. With a full LLM start, this step can be unintentionally bypassed.
What we have learned from both projects
Voicebots are powerful today. But they are not plug-and-play products. To be successful, you need to take three things seriously.
First: content. A bot is only as good as the knowledge base you give it. Website texts can almost never be used as they are. They need to be revised and structured.
Second: resources. Without customer-side subject experts who test, correct and maintain content, the project will become slow and difficult.
Thirdly: expectation management. There is almost always a phase where quality is not yet sufficient. If you continue improving, quality rises quickly. In both projects we ended up with high success rates and satisfied callers.
Conclusion
Voicebots are a real relief when set up correctly. The technology is mature, but it requires careful implementation. If content, testing and collaboration are handled well, you reach a point where most enquiries are automated and teams have more time for cases that genuinely need human attention.