A machine learning approach to artistic research
2020–
As part of my doctoral thesis (Okulov, 2024), I developed a research approach in interdisciplinary collaboration with Asutosh Hota and Yu Tian that merges quantitative methods with artistic research. The approach includes a digital platform where multimodal stimuli can be annotated nonverbally. Digital drawing is used to represent how individuals attend to and interpret a video, image or audio stimuli. From the pen expression data, motor expressions are analyzed using machine learning to understand how they reflect affective responses (see Sievers et al., 2019).
This method will be applied in interdisciplinary contexts to study how individuals and larger groups achieve transformative experience. In this context, the method will be referred to as Nonverbal Stimulated Reflection (nSR), linking it to the SRI method developed for psychotherapy research (Kykyri et al., 2023).

The core of the method lies in an insight that a universal and affective logic seems to operate between the senses. Through aesthetic sensitization we are able to reach this nonverbal state and translate it into expression as aesthetics brings “attentiveness to nuances, to details, to fragile and often overlooked marginal phenomena and their vibrations” (Mersch, 2015, 52). Knowledge that emerges with aesthetic reflexivity leads to the capturing of a phenomenon in its medium that “reflects the perceivable through perception and the experiential through experience” (Mersch, 2015, 46).
The theory of developmental psychiatrist and psychoanalyst Daniel Sternit explains how generated expression in response to what is perceived may arise from the Gestalt of vitality, the lived force, and the dynamics of experience. According to Stern, these dynamic forms are the felt experience of force – in movement – with a temporal contour, and a sense of aliveness, of going somewhere. They do not belong to any particular content. They are more form than content. They concern the ‘How,’ the manner, and the style, not the ‘What’ or the ‘Why’” (Stern, 2010, 8).
In the field of perceptual psychology, the phenomenon is referred to as crossmodal or multimodal correspondence. These terms describe the systematic way in which information is translated between two or more sensory modalities. The phenomenon resembles synesthesia; however, while synesthesia links sensory features in an absolute manner—for example, associating a specific pitch with a particular color—and occurs only in a small portion of the population, multimodal correspondence is considered a weaker but universal state and innate phenomenon (Deroy & Spence, 2013). Research indicates that the multimodal associations that persist with us from childhood influence us homeostatically, manifesting through changes in bodily arousal and being deeply connected to affective states.

The developed digital research platform allows the recording and synchronization of multimodal expression in a quantitative form. In the platform, the research participant sees an image or video stimulus on a computer screen, designed to evoke an affective reaction in the body. This induced affective arousal enables a multisensory response to the stimulus. The participants are asked what they perceive as salient in the stimuli and how does it affect them. The what-question is answered by a pen location and provides information about which elements are perceived as meaningful and how the gaze moves between them to construct the context. The how-question is answered by varying the pressure, speed, and form of the drawing. These features offer insights into the aesthetic-affective reactions elicited by the stimuli.
Currenty, the method is being developed especially for stylus pen input and annotations of videos recorded in real therapeutic and social situations.
The method relates to the techniques used in machine learning. Annotating data with a digital pen is common procedure for machine vision tasks. Annotators mark down the most the salient objects in an image by drawing with a stylus pen (Jiang et al., 2015) or by using a box-like framing like in the image above.
However, the typical goal of these models is to recognize linguistic objects in images through their verbal labels and these labels do not often include affective information. As commonly known, everyday observation and moments of high personal relevance, like flow, enthusiasm, anxiety, or fear, often evade verbal expression, leaving their descriptions dull and inadequate. Therefore, it is evident that affective computing requires other, nonverbal approaches so that it can be utilized more deeply, for example, in fields that study experience.

The stylus pen data gathered with nSR highlights nonverbal expression. X-, y-, and z-coordinates are saved and synced with the stimulus in the platform. The image above shows the responses of 4 research participants to affective-aesthetic words aggressive, restless, dynamic, calm, euphoric, and tense.

At the core of the nSR method is not only what information is salient for perception but also how this information emerges and affects the perceiver. This kind of expressive data can be applied for example in affective saliency detection.
References
Jiang, M., Huang, S., Duan, J. & Zhao, Q. (2015). Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1072–1080).
Kykyri, V. L., Wahlström, J., & Seikkula, J. (2023). Inner and outer dialogue in couple therapy: the potential of stimulated recall interviews. In The Routledge International Handbook of Innovative Qualitative Psychological Research (pp. 229-242). Routledge.
Mersch, D. (2015). Epistemologies of Aesthetics. Zurich-Berlin: Diaphanes.
Okulov, J. (2024). Quantifying Qualia – Aesthetic Machine Attention in Resisting the Objectifying Tendency of Thought. [Doctoral Thesis, Aalto University]. Aalto University. http://urn.fi/URN:ISBN:978-952-64-1746-2
Sievers, B., Lee, C., Haslett, W. & Wheatley, T. (2019). A multi-sensory code for emotional arousal. Proceedings of the Royal Society B, 286(1906), 20190513.
Stern, D. N. (2010). Forms of Vitality: Exploring Dynamic Experience in Psychology, the Arts, Psychotherapy, and Development. Oxford: Oxford University Press.