Tensor Framework / TensorFaces

"Natural images are the composite consequence of multiple constituent factors related to scene structure, illumination conditions, and imaging conditions. Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the difficult problem of disentangling the constituent factors or modes.”

(Vasilescu and Terzopoulos,2002)

"Scene structure is composed of a set of objects that appear to be formed from a recursive hierarchy of perceptual wholes and parts whose properties, such as shape, reflectance, and color, constitute a hierarchy of intrinsic causal factors of object appearance. Object appearance is the compositional consequence of both an object’s intrinsic causal factors, and extrinsic causal factors with the latter related to illumination (i.e. the location and types of light sources), imaging (i.e. viewpoint, viewing direction, lens type and other cam-era characteristics). Intrinsic and extrinsic causal factors confound each other’s contributions, hindering recognition."
(Vasilescu and Kim, 2019)

While we can directly observe and measure the gray (or color) values in an image/video, we are often more interested in the information associated with the causal factors that determine the pixel values in an image, such as the person's identity, the viewing direction, or expression, which may only be inferred, but not directly measured. The tensor framework is suitable for disentangling the multifactor causal structure of data formation given the correct problem setup.

The tensor framework was first employed in computer vision, computer graphics and machine learning to recognize people from the way they move (Human Motion Signatures in 2001) and from their facial images (TensorFaces in 2002).  However, this approach may be used to synthesize or recognize any object and object attribute. The development and utility of the tensor framework have been illustrated primarily in the context of face recognition since the problem statement and facial images lend themselves to an intuitive understanding of the underlying mathematics. Other examples are TensorTextures (see video below of image-based rendering that demonstrates progressive reduction of illumination effects through strategic dimensionality reduction), and 3D sound. 

There are two classes of data tensor modeling techniques that stem from:

  1.  the linear rank-K tensor decompositions (CANDECOMP / Parafac decomposition) and

  2.  the multilinear rank-(R1,R2,...,RM) tensor decompositions, (Tucker decomposition).
     

Amnon Shashua's team has recently provided theoretical evidence showing that deep learning is a neural network approximation of multilinear tensor factorization, while a shallow network corresponds to CP tensor factorization (aka, linear tensor factorization).  However, problem setup and implementation differences between CNNs and our tensor algebraic approach impact interpretability, data needs, memory/storage and computational complexity. 


TensorFaces     is a multilinear tensor method that explicitly models and decomposes a facial image in terms of the causal factors of data formation where each causal factor is represented according to its second-order statistics. by employing the Tucker tensor decomposition. We refer to this approach more generally as  Mulitlinear PCA in order to better differentiate it from our Multilinear ICA approach.

Multilinear (tensor) ICA     is a more sophisticated model of cause-and-effect  based on higher-order statistics associated with each causal factor. Similarly, one can employ kernel variants (pg.43     ) to model cause-and-effect. By comparison, matrix decompositions, such as PCA, or ICA, capture the overall statistical information (variance, kurtosis) without any causal differentiation.

Subspace multilinear learning    demonstratively disentangles the causal factors of data formation through strategic dimensionality reduction.  For example, in the case of facial images (or bi-directional textures functions), we suppress illumination effects such as shadows and highlights without blurring the edges associated with the person's identity that are important fo recognition (or edges associated with structural information that are important for texture synthesis.  See TensorTextures video below. ).

Next important question: While TensorFaces is a handy moniker for an approach that learns and represents the interaction of various causal factors from a set of training images, with Multilinear (Tensor) ICA    and kernel variants as a more sophisticated approaches, none of the interaction models prescribe a solution for how one might determine the multiple causal factors of a single unlabeled test image.

Multilinear Projection (FG 2011   , ICCV 2007   , briefly summarized in the 2005 MICA paper) addresses the question of how one might determine from one or more unlabeled test images all the unknown causal factors of data formation. Ie, how does one solve for multiple unknowns from a single image equation?  In the course of addressing this question, several concepts from linear (matrix) algebra were generalized, such as the mode-m identity tensor (which is also an algebraic operator that reshapes a matrix into a tensor and back again to a matrix),  the mode-m pseudo-inverse tensor, the mode-m product in order to develop the multilinear projection algorithm. (Note: The mode-m pseudo-inverse tensor is not a tensor pseudo-inverse.)  Multilinear projection simultaneously projects one or more unlabeled test images into multiple constituent mode spaces, associated with image formation, in order to infer the mode labels.

  • "Compositional Hierarchical Tensor Factorization: Representing Hierarchical Intrinsic and Extrinsic Causal Factors ”, M.A.O. Vasilescu, E. Kim, In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’19): Tensor Methods for Emerging Data Science Challenges, August 04-08, 2019, Anchorage, AK. ACM, New York, NY, USA    Paper (pdf)
     

  • "Face Tracking with Multilinear (Tensor) Active Appearance Models", Weiguang Si, Kota Yamaguchi, M. A. O. Vasilescu , June, 2013. 
    http://pdfs.semanticscholar.org/6c64/59d7cadaa210e3310f3167dc181824fb1bff.pdf
    Paper (pdf)
     

  • "Multilinear Projection for Face Recognition via Canonical Decomposition ", M.A.O. Vasilescu, In Proc. Face and Gesture Conf. (FG'11), 476-483. Paper (pdf)
     

  • "Multilinear Projection for Face Recognition via Rank-1 Analysis ", M.A.O. Vasilescu, CVPR, IEEE Computer Society and IEEE Biometrics Council Workshop on Biometrics, June 18, 2010.
     

  • "Multilinear Projection for Appearance-Based Recognition in the Tensor Framework", M.A.O. Vasilescu and D. Terzopoulos, Proc. Eleventh IEEE International Conf. on Computer Vision (ICCV'07), Rio de Janeiro, Brazil, October, 2007, 1-8. 
    Paper (1,027 KB - .pdf) 
     

  • “Multilinear Independent Components Analysis and Multilinear Projection Operator for Face Recognition”, M.A.O. Vasilescu, D. Terzopoulos, in Workshop on Tensor Decompositions and Applications, CIRM, Luminy, Marseille, France, August 2005.
     

  • "Multilinear (Tensor) ICA and Dimensionality Reduction", M.A.O. Vasilescu, D. Terzopoulos, Proc. 7th International Conference on Independent Component Analysis and Signal Separation (ICA07), London, UK, September, 2007. In Lecture Notes in Computer Science, 4666, Springer-Verlag, New York, 2007, 818–826. 
     

  • "Multilinear Independent Components Analysis", M. A. O. Vasilescu and D. Terzopoulos, Proc. Computer Vision and Pattern Recognition Conf. (CVPR '05), San Diego, CA, June 2005, vol.1, 547-553. 
    Paper (1,027 KB - .pdf) 
     

  • "Multilinear Independent Component Analysis", M. A. O. Vasilescu and D. Terzopoulos, Learning 2004 Snowbird, UT, April, 2004.
     

  • "Multilinear Subspace Analysis for Image Ensembles,'' M. A. O. Vasilescu, D. Terzopoulos, Proc. Computer Vision and Pattern Recognition Conf. (CVPR '03), Vol.2, Madison, WI, June, 2003, 93-99. 
    Paper (1,657KB - .pdf)  
     

  • "Multilinear Image Analysis for Facial Recognition,'' M. A. O. Vasilescu, D. Terzopoulos, Proceedings of International Conference on Pattern Recognition (ICPR 2002), Vol. 2, Quebec City, Canada, Aug, 2002, 511-514. 
    Paper (439KB - .pdf) 
     

  • "Multilinear Analysis of Image Ensembles: TensorFaces," M. A. O. Vasilescu, D. Terzopoulos, Proc. 7th European Conference on Computer Vision (ECCV'02), Copenhagen, Denmark, May, 2002, in Computer Vision -- ECCV 2002, Lecture Notes in Computer Science, Vol. 2350, A. Heyden et al. (Eds.), Springer-Verlag, Berlin, 2002, 447-460. 
    Full Article in PDF (882KB) 
     

 
 
 
 
 
 
 
 
 

Human Motion Signatures, Style Transfer, and Tracking: 

 

Given motion-capture samples of Charlie Chaplin’s walk, is it possible to synthesize other motions (say, ascending or descending stairs) in his distinctive style? More generally, in analogy with handwritten signatures, do people have characteristic motion signatures that individualize their movements? If so, can these signatures be extracted from example motions? Can they be disentangled from other causal factors?

 

We have developed an algorithm that extracts motion signatures and uses them in the animation of graphical characters. The mathematical basis of our algorithm is a statistical numerical technique known as or M-mode data tensor analysis. For example, given a corpus of walking, stair ascending, and stair descending motion data collected over a group of subjects, plus a sample walking motion for a new subject, our algorithm can synthesize never before seen ascending and descending motions in the distinctive style of this new individual.

  • "Human Motion Signatures: Analysis, Synthesis, Recognition," M. A. O. Vasilescu Proceedings of International Conference on Pattern Recognition (ICPR 2002), Vol. 3, Quebec City, Canada, Aug, 2002, 456-460. 
    Paper (439KB - .pdf) 
     

  • "An Algorithm for Extracting Human Motion Signatures", M. A. O. Vasilescu, Computer Vision and Pattern Recognition CVPR 2001 Technical Sketches, Lihue, HI, December, 2001. 
     

  • "Human Motion Signatures for Character Animations", M. A. O. Vasilescu, Sketch and Applications SIGGRAPH 2001 Los Angeles, CA, August, 2001. 
    Sketch (141KB - .pdf)
     

  • "Recognition Action Events from Multiple View Points," Tanveer Sayed-Mahmood, Alex Vasilescu, Saratendu Sethi, in IEEE Workshop on Detection and Recognition of Events in Video, International Conference on Computer Vision (ICCV 2001), Vancuver , Canada, July 8, 2001, 64-72

 

 

Listening in 3D
 

Head related transfer function (HRTF) characterizes how an individual's anatomy and sound source location impacts an individual's perception of sound.  The size, shape and density of the head, the shape of the ears and ear canal, the distance between the ears, all transform sound by amplifying some frequencies and attenuating others. Learning how sound is perceived is important in:

  • pinpointing the location of sound that is vital for safe navigation in traffic, 

  • achieving a realistic acoustic environment in gaming and home cinema set-ups.
     

To measure an HRTF, one places a loudspeaker at various locations in space and a microphone at the ear.  To recreate an authentic sound experience, slightly differently synthesized sounds are sent to each ear in accordance with a person's HRTF. 

   This is not surround sound which uses multiple speakers to provide a 360 sound.

  • "A Multilinear (Tensor) Framework for HRTF Analysis and Synthesis", G. Grindlay, M.A.O. Vasilescu, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, Hawaii, April, 2007
    Paper (439KB - .pdf) 

 

TensorTextures: Image-based Rendering 

 

One of the goals of computer graphics is photorealistic rendering, the synthesis of images of virtual scenes visually indistinguishable from those of natural scenes. Unlike traditional model-based rendering, whose photorealism is limited by model complexity, an emerging and highly active research area known as
image-based rendering eschews complex geometric models in favor of representing scenes by ensembles of example images. These are used to render novel photoreal images of the scene from arbitrary viewpoints and illuminations, thus decoupling rendering from scene complexity. The challenge is to develop structured representations in high-dimensional image spaces that are rich enough to capture important information for synthesizing new images, including details such as self-occlusion, self-shadowing, interreflections, and subsurface scattering. 

 

TensorTextures, a new image-based texture mapping technique, is a rich generative model that, from a sparse set of example images, learns the interaction between viewpoint, illumination, and geometry that determines detailed surface appearance. Mathematically, TensorTextures is a nonlinear model of texture image ensembles that exploits tensor algebra and the M-mode SVD to learn a representation of the bidirectional texture function (BTF) in which the multiple constituent factors, or modes---viewpoints and illuminations---are disentangled and represented explicitly.
 

  • "TensorTextures: Multilinear Image-Based Rendering", M. A. O. Vasilescu and D. Terzopoulos, Proc. ACM SIGGRAPH 2004 Conference Los Angeles, CA, August, 2004, in Computer Graphics Proceedings, Annual Conference Series, 2004, 336-342. 
    Paper (5,104 KB - .pdf) 

    Animations:

    • TensorTextures - AVI (54,225 KB)

    • TensorTextures Strategic Dimensionality Reduction - AVI (19,650 KB)

    • TensorTextures Trailer - AVI (17,605 KB)

     

  • "TensorTextures", M. A. O. Vasilescu and D. Terzopoulos, Sketches and Applications SIGGRAPH 2003 San Diego, CA, July, 2003. 
    Sketch (6MB - .pdf) 

 

Adaptive Meshes: Physically Based Modeling 

 

Adaptive mesh models for the nonuniform sampling and reconstruction of visual data. Adaptive meshes are dynamic models assembled from nodal masses connected by adjustable springs. Acting as mobile sampling sites, the nodes observe interesting properties of the input data, such as intensities, depths, gradients, and curvatures. The springs automatically adjust their stiffnesses based on the locally sampled information in order to concentrate nodes near rapid variations in the input data. The representational power of an adaptive mesh is enhanced by its ability to optimally distribute the available degrees of freedom of the reconstructed model in accordance with the local complexity of the data.
 

We developed open adaptive mesh and closed adaptive shell surfaces based on triangular or rectangular elements. We propose techniques for hierarchically subdividing polygonal elements in adaptive meshes and shells. We also devise a discontinuity detection and preservation algorithm suitable for the model. Finally, motivated by (nonlinear, continuous dynamics, discrete observation) Kalman filtering theory, we generalize our model to the dynamic recursive estimation of nonrigidly moving surfaces.
 

  • "Adaptive meshes and shells: Irregular triangulation, discontinuities, and hierarchical subdivision," M. Vasilescu, D. Terzopoulos, in Proc. Computer Vision and Pattern Recognition Conf. (CVPR '92), Champaign , IL, June, 1992, pages 829 - 832. 
    Paper (652KB - .pdf) 
     

  • "Sampling and Reconstruction with Adaptive Meshes," D. Terzopoulos, M. Vasilescu, in Proc. Computer Vision and Pattern Recognition Conf. (CVPR '91), Lahaina, HI, June, 1991, pages 70 - 75. 
    Paper (438KB - .pdf)