When you’re playing a game like Uncharted 4, you can’t help, but to stop and admire the detail in a characters animations, especially the faces. We’ve gotten really damn good at performance capture this generation and it breathes life into all kinds of games. However, sometimes you just don’t have the resources to spend on capturing the motions for every single line of dialogue and interaction in a video game, especially RPG’s. At this point, developers often rely on what are known as “Canned Animations”. These animations are generic and intended to be used repeatedly. They can include anything from poses, arm gestures, shuffling, walking around, jumping, head movements, etc. Animators go in, look at a line of dialogue, and set up tags so that their conversation system knows what animations to play and when. This is simple enough to do for general body movements, but what about facial expressions?
General body movements can be used in a wide variety of situations regardless of a line of dialogue. As long as the character isn’t standing there stiff as a board, unless that’s the intent, then it works just fine. However, the expressions on a persons face are not as simple as that. The mouth moves differently with every syllable, every pronunciation, and this goes hand in hand with eye movements and eyebrow movements. This is far more difficult and time consuming to implement as there’s just so much detail to pay attention to. And unfortunately, in todays world of photorealism, players expect perfection in this area in order to further immerse themselves in these worlds and believe that these are real people they’re having conversations with. Dragon Age: Inquisition and The Witcher 3 succeeded to different degrees, but Mass Effect: Andromeda was far less fortunate, getting torn to shreds upon its release due to characters with very stiff facial expressions outside of important cutscenes. Even general movements appeared to be missing, and were later patched in alongside better facial animation(albeit not for the entire game, partly because EA pulled the plug on post launch support.). This is an area that will need to be brought under control for next generation as the bar gets even higher, but it’s still a tall order to ask of any developer to spend exorbitant amounts of time and resources, combing through every line of dialogue and hand animating facial expressions and body movements from scratch. After all, they likely didn’t mocap any of that, too expensive to be wasted on something generic. But what if there was an alternative?
Over four months ago, as part of their Omniverse initiative, NVIDIA introduced an AI-driven program called “Audio2Face”. As the name implies, using deep learning AI, facial animations can be procedurally generated from nothing more than audio. And we’re talking using only a simple NVIDIA RTX GPU, be it a Turing 20 Series GPU or an Ampere 30 Series GPU. As long as it has Tensor cores, it will function. NVIDIA has thus far given three examples of the technology at work. These examples can be found in the below videos:
While not perfect by any means, the technology will improve and be updated as time goes on. The point is that this is a tool available for everyone and results will vary. Some animations will come out looking more lifelike while others may look stiffer around the eyes. At the very least the mouth movements and accompany jaw muscle motions are decent. With some more work, this could be a game changer for creating facial animations in RPG’s for conversations that normally don’t get nearly as much attention as the big jaw dropping moments.
There was also another piece of tech on display in the second and third videos, namely motion capture using nothing more than a standard webcam. This was also powered by deep learning AI and the Tensor cores on an NVIDIA RTX GPU. While traditional methods of performance capture may be expensive, this would be an effective way to create canned animations cheaply while getting some semblance of performance capture to give it a bit more nuance. You could even use it to create baseline animations that you can then go in an refine by hand, cutting out the process of animating that baseline by hand. It’s a time saver for sure.
Now, to clarify something, I am not suggesting that developers hook up to NVIDIA Omniverse and utilize NVIDIA’s features. That is an online environment contained in the Omniverse app and only that app. To my knowledge you cannot export the Audio2Face results or performance capture from Omniverse and then import it into your engine of choice for use. I am merely pointing out examples of what can be done using deep learning AI and how that can improve video games going forward. Developers can of course create their own versions of Audio2Face and webcam performance capture, but it still requires something like an RTX GPU with Tensor cores to function properly, at least during the development phase. Gamers would not be required to own such a GPU, as the final animations would run just fine on an AMD GPU, they were just produced using AI. The AI is not running the animation in the game itself.
There is certainly a future for these technologies in the video game industry and I hope that developers see fit to give them a try to alleviate some of the pains of development and deliver even higher quality experiences for us players. It’s definitely food for thought.