Singing with machines
a research proposal for a hybrid (human/machine) system of instant composition
What is the smallest unit of music? How can a system make musical choices in an environment of random live sounds? Can we define the variables that are essential to music making without limiting the music that can emerge?
For eight years now, these questions have been explored in Amsterdam, not by a group of computer scientists, but by an unusual ensemble of live musicians: The Genetic Choir.
This vocal ensemble approaches composition from the idea that vocal music can be reduced to elements of sound in a dynamic process of large numbers of agents (i.e. singers) making composition choices together, uninhibited by style or genre. Currently, with a project that will run until 2017, they are introducing their work to the realm of computational sound analysis and evolutionary composition to see how their approach can inspire creative music systems. This is a report on the first stages of the “Loop-Copy-Mutate” project – a collaboration between singers, electronic musicians and sound programmers to create a system for instant composition utilizing a hybrid of online user involvement, new approaches to sound analysis and live improvisers.
The research that is being conducted within this project is organised along three strands:
- On the basis of currently available sound analysis software, the project intends to develop a library system that can analyse musical patterns and interact with the broad range of Genetic Choir input, which can include any conceivable human sounds as basic elements for vocal compositions. A preliminary investigation of research in Music Information Retrieval (MIR) has shown that most of the literature refers to work with ‘lab-conditioned’ or computer-generated sounds, as opposed to live, improvised music. A first stage of development in SuperCollider has resulted in possible algorithms for analysing and sorting a broad range of live vocal sounds, as long as the samples are not too long or outlandish. More development time will be required to analyse longer phrases and discern elements of rhythm, melody lines, and extended voice techniques.
- By defining the principles that inform the Genetic Choir’s instant composition approach, and translating these into rules and definable variables, we intend to build a software program that can make musical decisions based on the same principles.
- By developing a user-friendly smartphone app that allows for composition processes that are partly user-driven and partly driven by the Genetic Choir algorithms for treating sound input (select, copy, mutate), we want to explore how the choir’s multi-agent approach to instant composition can be expanded into the realm of the internet, involving potentially endless numbers of contributors.
As describing the full scope of this project would go beyond the bounds of this article, the following report focuses on presenting the Genetic Choir approach to instant composition and the general challenges of bringing this approach to a digital environment. The first part describes the attitudes of listening and decision making, and the processing of musical ideas that are used in the multi-agent organism of the Genetic Choir. The second part describes the strategies for translating these essential musical skills to software that can process sounds in the same way.
- The Genetic Choir approach
1.1. Live singers in a complex gene-pool of sounds
The Genetic Choir was formed by Thomas Johannsen in 2007 and has evolved over the years into a diverse group of vocal improvisers investigating the dynamics of open systems, self-organization and complexity. The ensemble employs a collective process in vocal instant composition, operating from the assumption that vocal music can be reduced to elements and aspects of sound and treated as a process of copying, mutation and selection. In order to not lose the subtleties of human song and sounds, the operators of this process are real people and the basic elements of the music can be technically (ie. pitch, volume, note duration) as well as musically described (ie. timbre of a voice, timing of a phrase, emotional color of a chord). In Genetic Choir compositions, the musical material is either random input from an outside source (like the sounds of the environment where the singers perform), or random input from the singers themselves (sounds and musical phrases thrown into the sound gene-pool on the spot). The fitness function that Genetic Choir singers apply – as the actors in the mutation/selection process – is that a musical element needs to be relevant to them and have meaning in order to be copied and kept in the reproduction process. Imperfect copying, i.e. mutation, leads to a dynamic that pushes the composition forward.
1.2. Decision making – personal and impersonal
A defining issue of this ensemble’s musical practice is the varying extent to which they allow decision processes to be intuitive and personal or to keep them as detached and impersonal as possible. Evolutionary selection in nature is purely impersonal, and this is one of the choir’s main inspirations, as is the often cited metaphor of a flock of birds or a school of fish making their movement decisions together. This essentially impersonal attitude is crucial in order to make the composition process independent from particular tastes and individual inhibitions about music, so that truly any type of music can appear.
At the same time, the choir has found that taking a completely impersonal stance at all times can limit the range of musical possibilities as well: the compositions that emerge from such a strict attitude are mostly contemporary, abstract sound carpets or minimalist compositions, but are seldom producing melodic or rhythmic versatility or emotionally charged music. In order to make the latter possible as well, the singers train themselves to not be completely indifferent to the musical elements they handle. While operating within a complex sound gene-pool, the singers are trained to answer such questions as: “Which aspect of which sound touches you? Can you reproduce the sound, without losing that emotional aspect?”
In this way, the composition process of the choir can unfold via abstract sound elements as well as personally ‘charged’ elements. It is an ongoing process for the choir to find the right balance in between the impersonal copy/mutate decision processes, while also allowing for the imaginative and emotional aspects of music, which can only be discerned by personal engagement.
1.3. Units of music – small and big
The basic working principle of the choir is to identify the smallest units of music that can engage a singer and disregard higher-order musical structures as much as possible. The system stays vital and dynamic because the definition of ‘smallest units’ is not set in stone. Singers decide each moment how to define the sound unit and sound element that peaks their interest – which informs their decision as to which sounds are being copied and reproduced. At one moment, the element that engages a singer might be the timbre/colour of a sound they hear and therefore can be discernible and copied in a matter of half a second. At other times, the unit of sound that is being copied is several seconds long, due to a certain progression of sounds and silences (a musical phrase) that catches the attention of the singer.
A similar balance has to be negotiated here as with the personal/impersonal decision making mentioned above: if singers were bound to only copy very small units of sound, this would limit the type of music that could appear. If, on the other hand, they only listen to and copy higher-order structures like chord progressions and riffs, the musical composition would become predictable in a different sense, never escaping these elements of a particular musical style. As the ambition of the choir is to enact a system that can allow any style of music to appear, fine-tuning these elements and finding just the right balance for optimal, open musical interaction must remain a key feature of their training.
1.4 Conscious choices vs. unconscious ones
Another notable training principle of the choir is that the singers limit their conscious decision making without losing precision in listening and reacting to sound inputs. This means that the decision process to copy/mutate a certain sound should be left as much as possible to an immediate response without deliberation. The result is an organism of singers (agents) that can make composition choices together, in the blink of an eye.
On another level, conscious observation is not ignored as part of the system. After all, we are human beings and deliberation cannot be stopped entirely. However, the way the singers use conscious awareness of themselves and the composition process is a main component of the choir’s training. As an example, the choir has identified certain ruts in which an improvising group of singers can frequently find themselves. While these are natural group tendencies, they can interfere with keeping the composition process open and unpredictable. For example: the way a stable rhythmic pattern can be impossible to change without a conscious choice of some kind; the way a certain acceleration in a rhythmic pattern almost always moves into a rapid climax and then falls apart; or the way a group of singers keeps reducing their general volume more and more when sounds become more tentative and vulnerable. Therefore, conscious observation is very useful to counteract these identified group tendencies which are mostly social patterns, rather than belonging to a musical dynamic. The method the Genetic Choir employs in these instances is to allow the singers to make a conscious choice that goes against the current momentum. ‘Helicoptering’ (obtaining an overview of the composition) is therefore infrequently allowed, as long as you do not stay in that mode as a singer, but instead return to the mode of listening/reacting after taking a small action, intended to nudge the current group tendency into another direction. If the singers are trained to recognise these moments, the ‘nudge’ will be enough to tilt the composition in an unexpected direction. Acceptance if that does not happen, or if the direction is completely different from a singer’s intention, is another crucial behavioural aspect for the system to work. Continuous analysis of the composition while constantly making choices from ‘helicopter view’ will result in singers fighting for the ‘right route’ to take, and the whole system will break down.
To summarise, one can say that the Genetic Choir is on one level actively looking how to diminish human tendencies in the behaviour of their singers in order to create a system that can make composition decisions as quickly as a flock of birds, and in any musical direction that is conceivable. On another level, human intuition, emotion and engagement remains in the system in order to create the right dynamic for not only random, but also heartfelt music to emerge. The singers use the question ‘what engages/moves you in the music?’ on the level of smaller units of sound and use awareness of higher order patterns to counter-act socially informed ‘ruts’ of group behaviour. This training keeps their multi-agent approach of evolutionary composition focussed on the musical possibilities, open and unpredictable.
For listening to examples of the various types of music the Genetic Choir organism produces, follow this link: http://genetic-choir.org/recordings-compositions/
After this brief presentation of some of the aspects of Genetic Choir composing, the second part of this article will now address the challenge of letting a computer system engage with the kind of processes described above.
- Vocal Instant Composition in the digital realm
In an expert meeting held at STEIM in Amsterdam on October 2015, the Genetic Choir discussed several routes to accomplish their ambition with researchers from the University of Amsterdam (NL), Utrecht and Amsterdam High School of the Arts (NL) and Plymouth University (UK).
The following describes the key issues and the strategic choices that are being made for the development process of the upcoming Loop-Copy-Mutate project.
2.1. Sound Analysis / Computer Listening
For a computer to interact with human sound in a musical way, it needs to understand it on the basis of musical variables. But with the ambition of the Genetic Choir to be open to any type of music, the computer shouldn’t have any bias towards a certain musical tradition. At first glance, this ambition will make the exercise of teaching a computer to analyse music rather complex: any musical style in the world would need to be taken into account, all the definitions of the endless numbers of musical traditions. But on second glance, the approach of the Genetic Choir could inspire a new way of looking at this problem and circumvent some of its implications. Genetic Choir singers do not need to have knowledge of any musical system in order to become part of the choir. They train to be familiar with musical principles like tonality/atonality, harmony/dissonance, silence, timing and layering – but without attaching them to a particular musical system. For the rest, only improvisation principles are extensively looked at, which means that in a way singers are training to be ‘unknowing agents’ who will react to sounds with a certain engagement and attitude, but unbiased – and partly oblivious – of musical style. There is no need for a Genetic Choir singer to understand that they are part of a 5/8 rhythm, as long as they are able to copy and engage with it.
So to build a computer system that can understand music in a Genetic Choir way, the computer merely needs to be able to recognise patterns. It does not need to know what the correct name for that pattern is in any of the human invented systems to categorise music.
A pattern can be as small as a certain note frequency, or broader, e.g. a spectral analysis of one human sound, or even as broad as a rhythmic pattern in a phrase of several seconds. To emulate the Genetic Choir approach to vocal instant composition, the software needs to be able to recognise a pattern and copy it (keep it in the loop), but then also make a decision in which direction to mutate the chosen sound unit, which we will come to later.
But first, the question is: how does the software make a decision which piece of sound to view as a ‘unit’ in which it will look for a pattern? As we are talking about live performance, the computer should be able to react to an ongoing musical performance of several singers, and just like the Genetic Choir singers, pick out a unit of sound and replicate/mutate it. We can simplify the problem by giving each singer a separate microphone, so the computer does not have to deal with discerning several layers of sound. But we are not satisfied if the software randomly chooses to view a piece of sound between two time markers as a ‘unit’. The system needs to be aware of basic musical principles, like sound and silence and it needs to have a basic understanding of musical units, like a musical phrase. Again, it does not need to be able to recognise the music, it only needs to have an understanding of what types of elements can define a ‘unit’ in a musical sense. A single sound can always be seen as a unit, so if the sounds are separated by silences that would be easily quantifiable. If the music is tonal, pitch changes can also be used as separators. For other types of music, a significant change of any sound parameter might be sufficient to define a unit. As with the organism of the Genetic Choir, there does not need to be clear-cut definitions for the system to work, as long as decisions are – musically speaking – not totally random.
The same goes for the dynamic of sound units of several lengths in the system: the system works only when not just very short sounds are considered a unit, but also prolonged sounds and phrases of several seconds.
The strategy we intend to employ in order to teach a computer to recognise longer phrases is to build a sound library that holds a large number of vocal sound samples of varying length and content. When we want the software to take an interest in longer phrases, it could record, for example, 20 seconds of live sound and look for patterns in this window of time that are similar to other phrase-length samples in the library. We hope that on the basis of this comparison, the software can decide which part of the 20 second recording to view as a ‘musical phrase’, so that it can select it and incorporate that specific phrase in the musical reproduction cycle.
2.2. Sound Mutation – In which direction to mutate?
Another challenge is to make the software mutation process of the sound samples not completely random, but musical in a way that approaches the instant composition process of the choir. The mutation that happens in Genetic Choir compositions is partly random (singers make mistakes or copy inaccurately, which is then adopted by other singers), but this all happens within the parameters of the human voice. If sound software is taught to change any aspect of a sound sample randomly, mutation cycles quickly turn the voice sample into beeps or utter distortion. So how to keep the mutation cycles musical? As we do not want to limit the type of music that can appear, distorted sounds are fine, but the question is again one of achieving the right balance.
Let us ask first: what exactly constitutes a mutation? Also here, there is a range, from barely noticeable micro-shifts in the pitch of a long tone to very noticeable sudden stops, because a part of the sound-DNA has been lost (such as when a rhythmic phrase is copied and a few of the sounds that constitute the rhythm are replaced by silence in the next mutation cycle). In both examples, much of the original sound material stays intact. Similar to the problem of deciding which ‘unit of sound’ to select from a longer series of sounds, the problem in this instance is how to determine which parts of the sound unit remain the same, while changing only one (either subtle or very audible) aspect of that same unit.
For this challenge as well, a comparison with sound samples in a library seems to be the most fruitful option. Through pattern comparison, the software can place the recorded sound sample in a field of similar sound samples and decide (randomly) which direction to take in changing one aspect of the sound unit that will bring it closer to a slightly different sound unit in the same field. The idea is that by changing only one aspect that differs in the context of other recorded sound samples, the overall ‘shape’ of the sound unit as a vocal sample will stay intact, while still allowing the copy/mutation process to happen in a randomised manner.
2.3. Behavioural Algorithms
Another option to map the Genetic Choir technique within the digital realm that was discussed was the idea to have a computer listen to a large number of Genetic Choir compositions (with separate channels for each singer’s contribution) and find patterns in the evolving interaction. Perhaps, by listening to the way each Genetic Choir singer makes choices to copy or mutate, support or counterpoint the present musical material, the computer can distill certain behavioural algorithms, so as to emulate the behaviour of a Genetic Choir singer when being fed a sound sample. It is a complex endeavour, but if it would work, the computer could run an artificial process of any number of agents (only limited by processing power) who are treating musical samples in a Genetic Choir approach, with the result of having a computer compose music all by itself, without us having to teach it (or even understand in detail) the rules it acts upon.
This is particularly attractive as a long-range goal because there is one aspect that we have left out of the considerations until now: how to emulate the ability of the Genetic Choir singers to make choices on the basis of their engagement with the sound material. In other words: Genetic Choir singers copy sound material that attracts them, moves them or otherwise peaks their curiosity. For the quality of the compositions that the Genetic Choir can produce, this seems to be a crucial feature.
Perhaps this ability can be approached through the use of a large enough sample library, in the sense that recognition of a certain pattern will function as the ‘attractor’ for the software. But as there will be many patterns in the live music that will be known to the library, the question stays which patterns will be particularly attractive for the software to select.
Genetic Choir singers make these choices on the basis of personal intuition. Can we imagine an equivalent for musical intuition in computer programming? Perhaps we do not have to figure this out by ourselves, but have the software learn directly through observation of a human musical organism such as the Genetic Choir.
2.4. Crowdsourcing composition choices
As the scope of the project does not allow for all the aforementioned considerations to be fully investigated, the current research and development work underway for 2016 will focus on creating a working sound library and sound analysis software that can identify patterns in short and longer musical phrase in the above described manner.
This software will be tested in hybrid performances between Genetic Choir singers and electronic musicians who make use of that basic software.
Instead of addressing the challenge of creating behavioural software that can make musical composition choices all by itself, the current project will examine another option to add other musical agents to the Genetic Choir organism: a smartphone/tablet app will be developed that allows users to play with the sounds of the Genetic Choir sound library and create musical phrases and mini-compositions themselves. In the planned concert series starting at the end of 2016, Genetic Choir phrases that are sung in the concert will be directly uploaded into a certain section of the library, which online users can select, copy and mutate following their own musical intuition via the composing app, and upload those new phrases into the library again, from where they will be included in the ongoing live concert.
- Summary and follow-up ambitions
The pilot phase of this project was a feasibility study to see whether it makes sense to pursue our ambition. As the initial results have indicated a need for a follow up project, extra funding is sought right now for an expected project start of August 2016. Improving computer listening for live vocal inputs and devising a musical way to categorise the input will be our first goal. We then expect to have a solid foundation for the development of composition software that emulates the Genetic Choir approach. With the experience of building and running an app that uses Genetic Choir principles to give human users a number of options to mutate and compose vocal music, we hope to gain an understanding of the musical parameters that can be implemented to create a satisfying composing environment involving both random and user-driven processes.
In a follow-up project we expect to involve the ‘behavioural algorithm approach’ to create a computer system that can effectively treat and develop live music samples into new kinds of compositions, and interact with the Genetic Choir as an independent organism.
We are glad to connect with other initiatives that are dealing with any of the issues described in this article in order to share experiences and expertise. Please address any enquiries to the Genetic Choir via firstname.lastname@example.org .
The pilot stage of the Loop-Copy-Mutate project has been kindly supported by the Creative Industries Fund NL.
Expert Meeting at STEIM: Tijs Ham (STEIM); Robert van Heumen; Jorrit Tamminga and Roald van Dillewijn (HKU); Carlos Vaquero (UvA); and Marcelo Gimenes (University of Plymouth).
The pilot involved software development research carried out by Stelios Manousakis and live singing / electronic sampling research sessions carried out by Robert van Heumen and Genetic Choir singers Petra Pieck, Martine van Ditzhuyzen, Marjolijn Roeleveld, Jeannette Huizinga, Ralph de Rijke, Meagan Hughes, Anita Kooij and Thomas Johannsen
Programming team for the upcoming project: Stelios Manousakis, Marco Pieck and Niels Bogaards (Elephant Candy)
Project Management: Meagan Hughes
Artistic Direction: Thomas Johannsen