Causality in a Tensor Framework
Developing causal explanations for correct results or for failures from mathematical equations and data is important in developing a trustworthy artificial intelligence and retaining public trust. Causal explanations are germane to the “right to an explanation” statute [15], [13], i.e., to data driven decisions, such as those that rely on images.
Computer graphics and computer vision problems, also known as forward and inverse imaging problems, have been cast as causal inference questions [40], [42] consistent with Donald Rubin’s quantitative definition of causality, where “A causes B” means “the effect of A is B”, a measurable and experimentally repeatable quantity [14], [17]. Computer graphics may be viewed as addressing analogous questions to forward causal inferencing that addresses the “what if” question, and estimates the change in effects given a delta change in a causal factor. Computer vision may be viewed as addressing analogous questions to inverse causal inferencing that addresses the “why” question [12]. We define inverse causal inference as the estimation of causes given an estimated forward causal model and a set of observations that constrain the solution set.
(Vasilescu, Kim, and Zeng, 2020)
Natural images are the composite consequence of multiple constituent factors related to scene structure, illumination conditions, and imaging conditions. Multilinear algebra, the algebra of higherorder tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the difficult problem of disentangling the constituent factors or modes.
(Vasilescu and Terzopoulos,2002)
Scene structure is composed of a set of objects that appear to be formed from a recursive hierarchy of perceptual wholes and parts whose properties, such as shape, reflectance, and color, constitute a hierarchy of intrinsic causal factors of object appearance. Object appearance is the compositional consequence of both an object’s intrinsic causal factors, and extrinsic causal factors with the latter related to illumination (i.e. the location and types of light sources), imaging (i.e. viewpoint, viewing direction, lens type and other camera characteristics). Intrinsic and extrinsic causal factors confound each other’s contributions, hindering recognition.
(Vasilescu and Kim, 2019)
While we can directly observe and measure the gray (or color) values in an image/video, we are often more interested in the information associated with the causal factors that determine the pixel values in an image, such as the person's identity, the viewing direction, or expression, which may only be inferred, but not directly measured. Given the correct problem setup, the tensor framework is suitable for disentangling the multifactor causal structure of data formation.
The tensor framework was first employed in computer vision, computer graphics and machine learning to recognize people from their gait (Human Motion Signatures in 2001) and from their facial images (TensorFaces in 2002). However, this approach may be used to synthesize or recognize any object and object attribute. The development and utility of the tensor framework have been illustrated primarily in the context of face recognition since the problem statement and facial images lend themselves to an intuitive understanding of the underlying mathematics. Other examples are TensorTextures (see video below of imagebased rendering that demonstrates progressive reduction of illumination effects through strategic dimensionality reduction), and 3D sound.
There are two classes of data tensor modeling techniques that stem from:

the linear rankK tensor decompositions (CANDECOMP / Parafac decomposition) and

the multilinear rank(R1,R2,...,RM) tensor decompositions, (Tucker decomposition).
Amnon Shashua's team has recently provided theoretical evidence showing that deep learning is a neural network approximation of multilinear tensor factorization, while a shallow network corresponds to CP tensor factorization (aka, linear tensor factorization). However, problem setup and implementation differences between CNNs and our tensor algebraic approach impact interpretability, data needs, memory/storage and computational complexity.
TensorFaces is a multilinear tensor method that explicitly models and decomposes a facial image in terms of the causal factors of data formation where each causal factor is represented according to its secondorder statistics. by employing the Tucker tensor decomposition. We refer to this approach more generally as Multilinear PCA in order to better differentiate it from our Multilinear ICA approach.
Multilinear (tensor) ICA is a more sophisticated model of causeandeffect based on higherorder statistics associated with each causal factor. Similarly, one can employ kernel variants (pg.43 ) to model causeandeffect. By comparison, matrix decompositions, such as PCA, or ICA, capture the overall statistical information (variance, kurtosis) without any causal differentiation.
Subspace multilinear learning demonstratively disentangles the causal factors of data formation through strategic dimensionality reduction. For example, in the case of facial images (or bidirectional textures functions), we suppress illumination effects such as shadows and highlights without blurring the edges associated with the person's identity that are important for recognition (or edges associated with structural information that are important for texture synthesis. See TensorTextures video below. ).
Next important question: While TensorFaces is a handy moniker for an approach that learns and represents the interaction of various causal factors from a set of training images, with Multilinear (Tensor) ICA and kernel variants as a more sophisticated approaches, none of the interaction models prescribe a solution for how one might determine the multiple causal factors of a single unlabeled test image.
Multilinear Projection (FG 2011 , ICCV 2007 , briefly summarized in the 2005 MICA paper) addresses the question of how one might determine from one or more unlabeled test images all the unknown causal factors of data formation. Ie, how does one solve for multiple unknowns from a single image equation? In the course of addressing this question, several concepts from linear (matrix) algebra were generalized, such as the modem identity tensor (which is also an algebraic operator that reshapes a matrix into a tensor and back again to a matrix), the modem pseudoinverse tensor, the modem product in order to develop the multilinear projection algorithm. (Note: The modem pseudoinverse tensor is not a tensor pseudoinverse.) Multilinear projection simultaneously projects one or more unlabeled test images into multiple constituent mode spaces, associated with image formation, in order to infer the mode labels.

CausalX: Causal eXplanations and Block Multilinear Factor Analysis", M.A.O. Vasilescu, E. Kim, X. S. Zeng In the Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR 2020), Milan, Italy, January 2021, 1073610743, Paper(pdf).

Compositional Hierarchical Tensor Factorization: Representing Hierarchical Intrinsic and Extrinsic Causal Factors ”, M.A.O. Vasilescu, E. Kim, In The 25th ACM SIGKDD, Knowledge Discovery and Data Mining Conference and Workshops: Tensor Methods for Emerging Data Science Challenges, August 0408, 2019, Anchorage, AK. ACM, New York, NY, USA Paper (pdf)

"Face Tracking with Multilinear (Tensor) Active Appearance Models", Weiguang Si, Kota Yamaguchi, M. A. O. Vasilescu , June, 2013.
http://pdfs.semanticscholar.org/6c64/59d7cadaa210e3310f3167dc181824fb1bff.pdf
Paper (pdf)

"Multilinear Projection for Face Recognition via Canonical Decomposition ", M.A.O. Vasilescu, In Proc. Face and Gesture Conf. (FG'11), 476483. Paper (pdf)

"Multilinear Projection for Face Recognition via Rank1 Analysis ", M.A.O. Vasilescu, CVPR, IEEE Computer Society and IEEE Biometrics Council Workshop on Biometrics, June 18, 2010.

"Multilinear Projection for AppearanceBased Recognition in the Tensor Framework", M.A.O. Vasilescu and D. Terzopoulos, Proc. Eleventh IEEE International Conf. on Computer Vision (ICCV'07), Rio de Janeiro, Brazil, October, 2007, 18.
Paper (1,027 KB  .pdf)

“Multilinear Independent Components Analysis and Multilinear Projection Operator for Face Recognition”, M.A.O. Vasilescu, D. Terzopoulos, in Workshop on Tensor Decompositions and Applications, CIRM, Luminy, Marseille, France, August 2005.

"Multilinear (Tensor) ICA and Dimensionality Reduction", M.A.O. Vasilescu, D. Terzopoulos, Proc. 7th International Conference on Independent Component Analysis and Signal Separation (ICA07), London, UK, September, 2007. In Lecture Notes in Computer Science, 4666, SpringerVerlag, New York, 2007, 818–826.

"Multilinear Independent Components Analysis", M. A. O. Vasilescu and D. Terzopoulos, Proc. Computer Vision and Pattern Recognition Conf. (CVPR '05), San Diego, CA, June 2005, vol.1, 547553.
Paper (1,027 KB  .pdf)

"Multilinear Independent Component Analysis", M. A. O. Vasilescu and D. Terzopoulos, Learning 2004 Snowbird, UT, April, 2004.

"Multilinear Subspace Analysis for Image Ensembles,'' M. A. O. Vasilescu, D. Terzopoulos, Proc. Computer Vision and Pattern Recognition Conf. (CVPR '03), Vol.2, Madison, WI, June, 2003, 9399.
Paper (1,657KB  .pdf)

"Multilinear Image Analysis for Facial Recognition,'' M. A. O. Vasilescu, D. Terzopoulos, Proceedings of International Conference on Pattern Recognition (ICPR 2002), Vol. 2, Quebec City, Canada, Aug, 2002, 511514.
Paper (439KB  .pdf)

"Multilinear Analysis of Image Ensembles: TensorFaces," M. A. O. Vasilescu, D. Terzopoulos, Proc. 7th European Conference on Computer Vision (ECCV'02), Copenhagen, Denmark, May, 2002, in Computer Vision  ECCV 2002, Lecture Notes in Computer Science, Vol. 2350, A. Heyden et al. (Eds.), SpringerVerlag, Berlin, 2002, 447460.
Full Article in PDF (882KB)
TensorTextures: Imagebased Rendering
One of the goals of computer graphics is photorealistic rendering, the synthesis of images of virtual scenes visually indistinguishable from those of natural scenes. Unlike traditional modelbased rendering, whose photorealism is limited by model complexity, an emerging and highly active research area known as
imagebased rendering eschews complex geometric models in favor of representing scenes by ensembles of example images. These are used to render novel photoreal images of the scene from arbitrary viewpoints and illuminations, thus decoupling rendering from scene complexity. The challenge is to develop structured representations in highdimensional image spaces that are rich enough to capture important information for synthesizing new images, including details such as selfocclusion, selfshadowing, interreflections, and subsurface scattering.
TensorTextures, a new imagebased texture mapping technique, is a rich generative model that, from a sparse set of example images, learns the interaction between viewpoint, illumination, and geometry that determines detailed surface appearance. Mathematically, TensorTextures is a nonlinear model of texture image ensembles that exploits tensor algebra and the Mmode SVD to learn a representation of the bidirectional texture function (BTF) in which the multiple constituent factors, or modesviewpoints and illuminationsare disentangled and represented explicitly.

"TensorTextures: Multilinear ImageBased Rendering", M. A. O. Vasilescu and D. Terzopoulos, Proc. ACM SIGGRAPH 2004 Conference Los Angeles, CA, August, 2004, in Computer Graphics Proceedings, Annual Conference Series, 2004, 336342.
Paper (5,104 KB  .pdf)
Animations:
TensorTextures  AVI (54,225 KB)

TensorTextures Strategic Dimensionality Reduction  AVI (19,650 KB)

TensorTextures Trailer  AVI (17,605 KB)


"TensorTextures", M. A. O. Vasilescu and D. Terzopoulos, Sketches and Applications SIGGRAPH 2003 San Diego, CA, July, 2003.
Sketch (6MB  .pdf)
Human Motion Signatures, Style Transfer, and Tracking:
Given motioncapture samples of Charlie Chaplin’s walk, is it possible to synthesize other motions (say, ascending or descending stairs) in his distinctive style? More generally, in analogy with handwritten signatures, do people have characteristic motion signatures that individualize their movements? If so, can these signatures be extracted from example motions? Can they be disentangled from other causal factors?
We have developed an algorithm that extracts motion signatures and uses them in the animation of graphical characters. The mathematical basis of our algorithm is a statistical numerical technique known as or Mmode data tensor analysis. For example, given a corpus of walking, stair ascending, and stair descending motion data collected over a group of subjects, plus a sample walking motion for a new subject, our algorithm can synthesize never before seen ascending and descending motions in the distinctive style of this new individual.

"Human Motion Signatures: Analysis, Synthesis, Recognition," M. A. O. Vasilescu Proceedings of International Conference on Pattern Recognition (ICPR 2002), Vol. 3, Quebec City, Canada, Aug, 2002, 456460.
Paper (439KB  .pdf)

"An Algorithm for Extracting Human Motion Signatures", M. A. O. Vasilescu, Computer Vision and Pattern Recognition CVPR 2001 Technical Sketches, Lihue, HI, December, 2001.

"Human Motion Signatures for Character Animations", M. A. O. Vasilescu, Sketch and Applications SIGGRAPH 2001 Los Angeles, CA, August, 2001.
Sketch (141KB  .pdf)

"Recognition Action Events from Multiple View Points," Tanveer SayedMahmood, Alex Vasilescu, Saratendu Sethi, in IEEE Workshop on Detection and Recognition of Events in Video, International Conference on Computer Vision (ICCV 2001), Vancuver , Canada, July 8, 2001, 6472
Listening in 3D
Head related transfer function (HRTF) characterizes how an individual's anatomy and sound source location impacts an individual's perception of sound. The size, shape and density of the head, the shape of the ears and ear canal, the distance between the ears, all transform sound by amplifying some frequencies and attenuating others. Learning how sound is perceived is important in:

pinpointing the location of sound that is vital for safe navigation in traffic,

achieving a realistic acoustic environment in gaming and home cinema setups.
To measure an HRTF, one places a loudspeaker at various locations in space and a microphone at the ear. To recreate an authentic sound experience, slightly differently synthesized sounds are sent to each ear in accordance with a person's HRTF.
This is not surround sound which uses multiple speakers to provide a 360 sound.

"A Multilinear (Tensor) Framework for HRTF Analysis and Synthesis", G. Grindlay, M.A.O. Vasilescu, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, Hawaii, April, 2007
Paper (439KB  .pdf)
Adaptive Meshes: Physically Based Modeling
Adaptive mesh models for the nonuniform sampling and reconstruction of visual data. Adaptive meshes are dynamic models assembled from nodal masses connected by adjustable springs. Acting as mobile sampling sites, the nodes observe interesting properties of the input data, such as intensities, depths, gradients, and curvatures. The springs automatically adjust their stiffnesses based on the locally sampled information in order to concentrate nodes near rapid variations in the input data. The representational power of an adaptive mesh is enhanced by its ability to optimally distribute the available degrees of freedom of the reconstructed model in accordance with the local complexity of the data.
We developed open adaptive mesh and closed adaptive shell surfaces based on triangular or rectangular elements. We propose techniques for hierarchically subdividing polygonal elements in adaptive meshes and shells. We also devise a discontinuity detection and preservation algorithm suitable for the model. Finally, motivated by (nonlinear, continuous dynamics, discrete observation) Kalman filtering theory, we generalize our model to the dynamic recursive estimation of nonrigidly moving surfaces.

"Adaptive meshes and shells: Irregular triangulation, discontinuities, and hierarchical subdivision," M. Vasilescu, D. Terzopoulos, in Proc. Computer Vision and Pattern Recognition Conf. (CVPR '92), Champaign , IL, June, 1992, pages 829  832.
Paper (652KB  .pdf)

"Sampling and Reconstruction with Adaptive Meshes," D. Terzopoulos, M. Vasilescu, in Proc. Computer Vision and Pattern Recognition Conf. (CVPR '91), Lahaina, HI, June, 1991, pages 70  75.
Paper (438KB  .pdf)