Conversational artificial intelligence is revolutionizing business operations in every industry with applications like virtual agents, chatbots, and assistants. A conversational AI system engages in humanlike dialog, understands context, and offers intelligent responses by recognizing speech and text, understanding intent, deciphering language, and responding in a way that mimics human conversation. However, these AI models can be massive and highly complex.
For a high-quality conversation between a human and a machine, responses have to be quick, intelligent, and natural sounding. The larger a model is, the longer the lag between a user’s question and the AI’s response. Gaps longer than two-tenths of a second sound unnatural. Therefore, all the necessary computation must take place in a 200-millisecond window.
With such a tight latency budget, developers of conversational AI have to make tradeoffs. A high-quality, complex model could be used as a chatbot, where latency isn’t as essential as it is in a voice interface. Or developers can use a less bulky language-processing model that delivers results quickly but lacks nuanced responses. We are all familiar with how a voice assistant may stall during conversations by providing a response like “let me look that up for you” before answering a question. The ideal conversational AI is complex enough to accurately understand a person’s queries, and fast enough to respond quickly in seamless natural language.
NetApp and NVIDIA collaborate for conversational AI architecture
NetApp and NVIDIA are collaborating to create a conversational AI architecture that delivers the required response times. With NetApp® ONTAP® AI, powered by NVIDIA DGX systems and NetApp cloud-connected storage, state-of-the-art language models can be trained and optimized for rapid inference.
To demonstrate the capabilities of this architecture, NetApp has used this framework to create NARA, a simple virtual assistant for retail. NARA consists of the components illustrated in the following figure.
Major elements of the framework include the following.
Nvidia Jarvis. Jarvis provides GPU-accelerated services for conversational AI using an end-to-end deep learning pipeline optimized to keep latency low.
- Jarvis comes with pretrained conversational AI models for speech, vision, and natural language understanding tasks, all available from the NVIDIA GPU Cloud (NGC).
- In addition to AI services, Jarvis allows you to fuse vision, audio, and other sensor input.
- With NVIDIA NeMo you can easily fine tune existing models using your own data to achieve better accuracy for specific needs.
NetApp ONTAP AI. This proven architecture combines NVIDIA DGX systems and NetApp all-flash storage. ONTAP AI reliably streamlines the flow of data, enabling it to train and run complex conversational models without exceeding the latency budget.
- Incorporates the latest NVIDIA DGX A100 for unprecedented compute density, performance, and flexibility.
- Uses NVIDIA Mellanox high-performance Ethernet switches to unify AI workloads, simplify deployment, and accelerate ROI.
- NetApp AFF systems keep data flowing to deep learning processes with fast, flexible all-flash storage, using end-to-end NVMe. The AFF A800 is capable of feeding data to NVIDIA DGX systems up to 4 times faster than competing solutions.
NVIDIA NeMo. A Python toolkit for building, training, and fine-tuning GPU-accelerated conversational AI models, NeMo enables you to build models with easy-to-use APIs, including real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants, and automated call center support. Pretrained, customizable models can be downloaded from NVIDIA GPU Cloud.
The NARA framework uses several example fulfillment engines, enabling it to answer questions about weather, find retail locations, and provide some pricing information. You can view a demonstration of NARA in action with text-based input. It also works with voice.
How the architecture works
The framework starts with a pretrained model from NGC. The chosen model can be fine tuned using archived and generated text data, such as customer queries and dialog transcripts. This data allows the system to recognize intents for specific use cases and to connect to the appropriate fulfillment resources to provide the necessary information to respond to domain-specific questions. Developers can deliver an improved conversational AI experience with fast time to market.
When the framework receives spoken input, Jarvis uses automatic speech recognition (ASR) to translate it into text. The text is routed to the Dialog Manager, where the state of the conversation is remembered. The Jarvis natural language processing service determines the speaker’s intent, enabling the Dialog Manager to request specific actions from the Fulfillment Engine.
The Fulfillment Engine uses third-party APIs and SQL databases or other means to perform the requested action and return results to the Dialog Manager. If an audio response is needed, the resulting text response is routed to the Jarvis text-to-speech (TTS) module.
Each conversation history can be used for ongoing NeMo training, so the service continues to improve as users interact with the system.
NetApp and NVIDIA are continuing to enhance this conversational AI framework. In particular, we are working on merging NVIDIA Merlin, a deep recommender application framework, to enable development of more nuanced and intelligent recommendation systems.
We are also working on a solution for edge inferencing that combines the capabilities of NetApp HCI and the NVIDIA Triton inference server. This solution also incorporates NetApp CloudSync and Trident capabilities to simplify edge data management tasks.
Find out more
Although we chose a retail use case to demonstrate the capabilities of this conversational AI framework, the approach has obvious uses far beyond the retail domain, including industries such as financial services, insurance, healthcare, and more. This framework enables the creation of conversational services that eliminate the frustrations of the voice-activated menu trees of past systems.
More information and resources
To learn more about the full range of NetApp AI solutions, visit netapp.com/ai.
And check out these resources to learn more about NetApp AI solutions:
- NetApp ONTAP AI solution brief
- NetApp AI Control Plane solution brief
- NetApp AI Control Plane technical report
- NetApp Data Science Toolkit solution brief
- Accelerating the AI training workflow with the NetApp Data Science Toolkit