In Part 1 of our interview with Monty Barlow, director of machine learning at Cambridge Consultants, we discussed the idea behind Vincent™, an AI system that can turn human-made sketches into artworks reminiscent of Van Gogh, Cézanne, and Picasso. Part 2 describes the technology that was used to train Vincent to paint. The system was created by Cambridge Consultants in their Digital Greenhouse—a research lab dedicated to discovering, developing, and testing breakthroughs in artificial intelligence (AI).
What technologies did you use to train the Vincent system?
We used deep learning to train Vincent to analyze artworks. It took lots of image data, lots of training sets, and lots of trial and error. The real learning comes from seven neural networks that challenge each other during training. It took Vincent about 14 hours of training, 8 GPUs, and millions of scratch files to learn to paint.
The learning system itself is built on NVIDIA DGX-1 servers and NetApp® storage. That might seem like a lot of horsepower for a lightweight app, but during the learning process, Vincent generates millions of iterations and a huge amount of data, as it tunes over 200 million parameters within its neural networks.
What are some of the challenges in building an AI system like Vincent?
Much of our AI work incorporates deep learning, and we’ve found that there are three main areas that need to be addressed. You have the algorithms themselves, the compute piece, and the collection, storage, and management of data.
There’s always a challenge to be solved for at least one corner of the triangle, but many vendors focus only on a single area and push the problem elsewhere. They may say, “Here’s a great algorithm, but you need to go and collect a million more data points.” Or, “Here’s a dataset you can buy,” but they can’t help you do anything with it.
One of the things we research quite heavily is what to do with dirty or imperfect datasets. Instead of asking our clients for perfect data before they can get started, we show them what they can do with the data they’ve already collected. We also help them understand the cost/benefit case for collecting more data.
How do you compensate for imperfect data?
In practice, people never have enough data. They’ve never looked after it quite well enough, and there are always issues. If it’s come from deployed systems, we find that there are always duplications and holes and other such problems. So we may need to use additional compute power to patch holes and synthesize and work our way through difficult data. Often we can incorporate information from other datasets, much as a human can bring a lifetime of experience to bear on a new challenge.
This part of the process is called generative AI. It uses neural networks to challenge each other during training. This is the approach we took when training the Vincent system. In many cases, this approach is quicker and more cost effective than collecting the perfect dataset.
What are some of the data management challenges?
As we work our way through segmenting the data, training on some parts and testing against others, we usually end up needing access to all of the data at once. Today, that can mean tens of terabytes, which is more than you can easily fit into RAM or a local cache. In addition, there are some things unique to the deep learning process that can create data management challenges.
For example, a generative AI approach can require that we randomly read every file hundreds of times as we work through a problem instead of just once, as might be the case when using a more basic training approach. And not only are we using big datasets that need to be read repeatedly, we often have multiple sub-teams trying out different approaches to the problem who may be accessing the same data at the same time.
On top of that, these are usually very small files and we need to access them as fast as possible to feed the NVIDIA GPUs that we use for our AI algorithms. The combination of everything is a worst-case scenario for a storage system.
What type of storage system is needed for deep learning, and why did you select NetApp?
We need low latency access to every file, although latency can be a little less critical when we can use a read-ahead approach for our data. More importantly, our data storage systems must deliver high throughput while randomly reading millions of small files, what you might call a metadata-heavy workload.
The reason our deep learning storage is based on NetApp technology is that it has been tried and tested in our own demanding environment. We needed a combination of high performance and flexibility because we’ve got a lot of different projects. We need our files to be available to different machines so that we can run a variety of compute jobs without having to move things around.
NetApp and our local reseller partner Scan also provide us with excellent support whenever we need help. There’s nothing worse for us than vendors that say, “Sorry, your use case is an outlier.” We like working with people who accept new challenges and approach them as opportunities to solve problems that can benefit other customers in similar situations.
Aside from generating art, what other possibilities do you see for Vincent’s technology?
Potential applications for Vincent-like technology reach far beyond art, with autonomous vehicles and digital security being early front-runners. The same technology can be used to generate training scenarios and simulations, introducing almost limitless variation and convincing detail beyond what humans could efficiently produce.
For more information, check out the following resources: