Research and Development at Jaunt: Volumetric Content

Matthew Hu, Sr. Research Engineer, MSc Stanford University — is working on problems related to computer vision and imaging.

Capturing accurate volumetric content can be a difficult and expensive proposition due to the massive amount of data needed. However, with the availability of lower cost depth and time-of-flight sensors, and the ever-increasing compute power of GPUs, we at Jaunt can now capture and process volumetric data in real-time. To do so, we link together multiple NVIDIA GeForce 1080s which each have over 3500 cores, tailored specifically for floating point calculation, coupled with 11 GB of memory. In order to maintain the fidelity of real surfaces, hundreds of thousands of triangles might be generated for a single mesh per frame. Along with the mesh, texturing data is also generated using several nearby RGB cameras at a rate of 60 frames-per-second or faster.

One challenge with volumetric capture is that the depth estimation is often noisy and temporally unstable. Many filtering techniques can be used to try and mitigate these artifacts, however these often result in a loss in detail or cause smearing due to large motions. To address this, many papers now look at ways to capture the 3D motion of the mesh and use this to refine the model over time. This approach allows them to gain a much better 3D reconstruction without sacrificing fine detail.

This approach, however, is not without challenges. The 3D motion estimation is often set up as an enormous non-linear least squares problem, and there are a variety of papers that document different ways of formulating the cost function. Solving this optimization problem for each frame at 60 fps is a difficult task even with the GPU resources we previously outlined. As a result, there are a range of techniques leveraged to get an approximate solution at a much faster rate.

Another challenge is that motion tracking is prone to failure and therefore requires a pipeline to gracefully resolve erroneous estimates. Despite these challenges, however, the literature seems to indicate a significant improvement in quality. This pipeline gives us the opportunity to leverage temporal information to perform a more accurate and stable 3D reconstruction.

This is the fifth post in a series of blogs from the Jaunt R&D team that will share more about the problems we are solving and how we’re working to help build the future of media. Stay tuned for our next update on the Jaunt Blog.

Interested in joining us? Explore job opportunities with our R&D team here: