Audio Metaphor Soundscape Generation Research


Audio Metaphor is an interactive system that presents itself as a search engine in which the audience is invited to enter an expression or a sentence that will serve as a request to an automatic soundscape generation system. Enter “The waterfalls inundate the city” or “The marshmallows explode in the campfire” and it will sound like it in quadraphonic! This interactive audio installation is questioning the ubiquity of information, be it real or fake, actual or synthetic. Using state of the art algorithms for sound retrieval, segmentation, background and foreground classification, automatic mixing and automatic soundscape affect recognition. Audio Metaphor is a powerful system that generates believable soundscape at interactive rate. The piece points at issues around big data, artificial intelligence, machine learning and other technoscientific advances, and their impact on our perception and experience of the world.
Hardware description: [Computer, Audio Interface, Speakers]

Online System

Audio Metaphor is a pipeline of computational tools for generating artificial soundscapes. The pipeline includes modules for audio file search, segmentation and classification, and mixing. The input for the pipeline is a sentence, a desired duration, and curves for pleasantness and eventfulness. Each module can be used independently, or in unison to generate soundscape from a sentence.
Try the Audio Metaphor system online. HERE


Audio Metaphor is a soundscape generation system that transforms text into soundscapes. A user enters a sentence of a scenario, the desired mood, and duration. Audio Metaphor analyzes this text, selects sounds from a database, cuts these sounds up, and recombines them in a sound design process.


A city in the bush
Crows feeding on rubbish at the garbage dump
The spring garden
A reservoir and fountain, raining in Vancouver
The text analysis identifies key semantic indicators used to search for related sounds either locally or online. The algorithm SLiCE attempts to optimize search results for maximizing the combination of keywords in a result. Sounds returned from the search are cut up based on a perceptual model of background and foreground sound. Each classified segment is then run through a predictive model that applies mood based labels to the sound from a two-dimensional affect space. We developed both these models from human listening experiments aimed at automating this process.
A mixing engine takes labelled sound segments and selects, arranges and mixes into the final soundscape. The engine creates separate tracks for semantic groups returned from the search and the mixing engine inserts corresponding sounds onto these, based on the overall mood of a mix at a particular time. The volume envelope of the mix is calculated by the control system. The generative result of Audio Metaphor reveal the human like creative processes of the system, and is used for assisting sound designers in game sound, sound for animation, and computational arts.

Valence and arousal mixing examples

Try out the different mixing outputs with alternatives of valence and arousal HERE

System Modules

Background/Foreground Classifier

Segmentation and sclassification is an important but time consuming part of the process of using soundscape recordings in sound design and research. Background and foreground are general classes referring to a signal’s perceptual attributes, and used as a criteria by sound designers when segmenting sound files. We establish a method for automatic segmentation of soundscape recordings based on this task.

Impress: Affect prediction

A soundscape is the sound environment perceived by a given listener at a given time and space. We developed an automatic soundscape affect recognition system to benefit for composers, sound designers, and audio researchers.


Fan J., Yang Y-H., Dong K., Pasquer, P. (2020). A Comparative Study of Western and Chinese Classical Music based on Soundscape Models. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain.
Fan J., Nichols E., Tompkins D., Méndez A. E. M., Elizalde B., Pasquer, P. (2020). Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix. International Conference on Acoustics, Speech, and Signal Processing, Barcelona, Spain.
Thorogood, M., Fan, J., Pasquer, P. (2019), A Framework for Computer-Assisted Sound Design Systems Supported by Modelling Affective and Perceptual Properties of Soundscapes. Journal of New Music Research.
Fan J., Thorogood, M., Tatar, K., Paquier, P. (2018). Quantitative Analysis of the Impact on Perceived Emotion of Soundscape Recordings. Sound and Music Computing (SMC)
Fan, J., Tung, F., Li, W., Pasquer, P. (2018). Soundscape Emotion Recognition via Deep Learning. Sound and Music Computing (SMC), 2018
Fan, J., Thorogood, M., and Pasquier, P. (2017). Emo-Soundscapes - A Dataset for Soundscape Emotion Recognition. Proceedings of the International Conference on Affective Computing and Intelligent Interaction.
Fan, J., Tatar, K., Thorogood, M., and Pasquier, P. (2017). Ranking-based Emotion Recognition for Experimental Music. Proceedings of the International Symposium on Music Information Retrieval, 2017.
Thorogood, M., Fan, J., and Pasquier, P. Soundscape Audio Signal Classification and Segmentation Using Listeners Perception of Background and Foreground Sound. Journal of the Audio Engineering Society. Special Issue (Intelligent Audio Processing, Semantics, and Interaction).
Fan, J., Thorogood, M., Riecke, B. and Pasquier, P. (2016). Automatic Recognition of Eventfulness and Pleasantness of Soundscape. Journal of the Audio Engineering Society. Special Issue (Intelligent Audio Processing, Semantics, and Interaction)
Bizzochi, J., Eigenfeldt, A., Pasquier, P. and Thorogood, M. (2016). Seasons II: a case study in Ambient Video, Generative Art, and Audiovisual Experience. Electronic Literature Organization Conference. British Columbia, Canada.
Bizzochi, J., Eigenfeldt, A., Thorogood, M., and Bizzochi, J. (2015) Generating Affect: Applying Valence and Arousal values to a unified video, music, and sound generation system. Generative Art Conference. 2015. 308 - 318
Thorogood, M., Fan, J., Pasquier, P. (2015). BF-Classifier: Background/Foreground Classification and Segmentation of Soundscape Recordings. In Proceedings of the 10th Audio Mostly Conference, Greece.
Fan, J., Thorogood, M., Riecke, B., Pasquier, P. (2015). Automatic Recognition of Eventfulness and Pleasantness of Soundscape. In Proceedings of the 10th Audio Mostly Conference, Greece.
Eigenfeldt, A., Thorogood, M., Bizzocchi, J., Pasquier, P. (2014). MediaScape:Towards a Video, Music, and Sound Metacreation. Journal of Science and Technology of the Arts 6, 2014. PDF
Eigenfeldt, A., Thorogood, M., Bizzocchi, J., Pasquier, P. Calvert, T., (2014). Video, Music, And Sound Metacreation. xCoAx 2014, Porto, Portugal. 321-333, 2014. PDF
Thorogood, M., Pasquier, P. (2013). Computationally Generated Soundscapes with Audio Metaphor. In Proceedings of the 4th International Conference on Computational Creativity, Sydney. PDF bibibtex
Thorogood, M., Pasquier, P. (2013). Impress: A Machine Learning Approach to Soundscape Affect Classification for a Music Performance Environment. Proceedings of the 13th International Conference on New Interfaces for Musical Expression, Daejeon + Seoul, Korea Republic. PDF bibtex
Thorogood, M., Pasquier, P., Eigenfeldt, A. (2012). Audio Metaphor: Audio Information Retrieval for Soundscape Composition. In Proceedings of the 9th Sound and Music Computing Conference, Copenhagen. PDF bibtex

Performance and Public Presentations


Miles Thorogood Miles Thorogood Miles Thorogood is a creative coding educator at Emily Carr University, and interactive sound artist. Through the installation based artwork, Miles explores the convergence of the human body, environments, and technology. His research at the School of Interactive Arts and Technology, SFU is toward modelling the phenomena of human perception in order for richer computational creativity systems. Contact Miles
Philippe Pasquier Philippe Pasquier Philippe is a professor in the School of Interactive Arts and Technology of Simon Fraser University. In his artistic practice, focused primarily on sonic arts, he is interested in studying and exploiting the various relationships and synergies between art, science and technology. He has been acting as a performer, director, composer, musician, producer and educator in many different contexts. Contact Philippe
Arne Eigenfeldt Arne Eigenfeldt Arne is a composer of acoustic and electroacoustic music, and is an active software designer. His music has been performed throughout the world, and his research in intelligent music systems has been published and presented in international conferences. He teaches music and technology at SFU's School for the Contemporary Arts.
Jianyu Fan Jianyu Fan Jianyu Fan is a Ph.D. candidate from Metacreation Lab, Simon Fraser University. His research interests lie in the field of Affective Computing, Machine Listening, Human-Computer Interaction, and Computational Creativity. He has been a researcher, engineer, and artist in many different contexts.