A machine learning approach to artistic research

2020–

As part of my doctoral thesis (Okulov, 2024), I developed a research approach in interdisciplinary collaboration with Asutosh Hota and Yu Tian that merges quantitative methods with artistic research. The approach includes a digital platform where multimodal stimuli can be annotated nonverbally. Digital drawing is used to represent how individuals attend to and interpret a video, image or audio stimuli. From the pen expression data, motor expressions are analyzed using machine learning to understand how they reflect affective responses (see Sievers et al., 2019).

This method will be applied in interdisciplinary contexts to study how individuals and larger groups perceive emotionally charged social interactions. In this context, the method will be referred to as Nonverbal Stimulated Reflection (nSR), linking it to the SRI method developed for psychotherapy research (Kykyri et al., 2023).

The method is rooted in artistic research and aesthetics, which according to philosopher Dieter Mersch bring “attentiveness to nuances, to details, to fragile and often overlooked marginal phenomena and their vibrations” (2015, 52). Main idea is that through expression, phenomena can be annotated nonverbally without assigning them verbal labels. This kind of data responds to the question of how something is, rather than what it is, and thus offers insight into experience.

In the field of perceptual psychology, nonverbal expression that arises from sensory stimulus is referred to as crossmodal or multimodal correspondence. These terms describe the systematic way in which information is translated between two or more sensory modalities. The phenomenon resembles synesthesia; however, while synesthesia links sensory features in an absolute manner—for example, associating a specific pitch with a particular color—and occurs only in a small portion of the population, multimodal correspondence is considered a weaker but universal state and innate phenomenon (Deroy & Spence, 2013). Research indicates that the multimodal associations that persist with us from childhood influence us homeostatically, manifesting through changes in bodily arousal and being deeply connected to affective states.

The developed digital research platform allows the recording and synchronization of multimodal expression in a quantitative form. In the platform, the research participant sees an image or video stimulus on a computer screen, designed to evoke an affective reaction in the body. This induced affective arousal enables a multisensory response to the stimulus. The participants are asked what they perceive as salient in the stimuli and how does it affect them. The what-question is answered by a pen location and provides information about which elements are perceived as meaningful and how the gaze moves between them to construct the context. The how-question is answered by varying the pressure, speed, and form of the drawing. These features offer insights into the aesthetic-affective reactions elicited by the stimuli.

Currenty, the method is being developed especially for stylus pen input and annotations of videos recorded in real therapeutic and social situations.

The method relates to the techniques used in machine learning. Annotating data with a digital pen is common procedure for machine vision tasks. Annotators mark down the most the salient objects in an image by drawing with a stylus pen (Jiang et al., 2015) or by using a box-like framing like in the image above.

However, the typical goal of these models is to recognize linguistic objects in images through their verbal labels and these labels do not often include affective information. As commonly known, everyday observation and moments of high personal relevance, like flow, enthusiasm, anxiety, or fear, often evade verbal expression, leaving their descriptions dull and inadequate. Therefore, it is evident that affective computing requires other, nonverbal approaches so that it can be utilized more deeply, for example, in fields that study experience.

The stylus pen data gathered with nSR highlights nonverbal expression. X-, y-, and z-coordinates are saved and synced with the stimulus in the platform. The image above shows the responses of 4 research participants to affective-aesthetic words aggressive, restless, dynamic, calm, euphoric, and tense.

At the core of the nSR method is not only what information is salient for perception but also how this information emerges and affects the perceiver. This kind of expressive data can be applied for example in affective saliency detection.

References

Jiang, M., Huang, S., Duan, J. & Zhao, Q. (2015). Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1072–1080).

Kykyri, V. L., Wahlström, J., & Seikkula, J. (2023). Inner and outer dialogue in couple therapy: the potential of stimulated recall interviews. In The Routledge International Handbook of Innovative Qualitative Psychological Research (pp. 229-242). Routledge.

Mersch, D. (2015). Epistemologies of Aesthetics. Zurich-Berlin: Diaphanes.

Okulov, J. (2024). Quantifying Qualia – Aesthetic Machine Attention in Resisting the Objectifying Tendency of Thought. [Doctoral Thesis, Aalto University]. Aalto University. http://urn.fi/URN:ISBN:978-952-64-1746-2

Sievers, B., Lee, C., Haslett, W. & Wheatley, T. (2019). A multi-sensory code for emotional arousal. Proceedings of the Royal Society B, 286(1906), 20190513.

A machine learning approach to artistic research – Jaana Okulov