You might have heard that some pretty significant news has dropped over the past few months on the software side of Vocaloid and voice synthesizers generally. Here I’ll be taking some time to summarize what we know about these developments and briefly consider what implications they might have for Vocaloid and voice synth music.
The first bit of news came late last year when Crypton Future Media announced they would be pulling their voice banks out of future updates to the Vocaloid software. Instead, the “Cryptonloids” will have new voice banks released on Piapro Studio, Crypton’s own voice synth software. A few months later, the first of their new voice banks was announced: Hatsune Miku NT (New Type). Miku NT is still in development, with the prototype slated to release this month, but you can hear a short demo here and another sample from a Pocari Sweat commercial.
NT retains the sound of Miku’s distinct voice, but we’ll have to wait until the software becomes available publicly to see if the switch affects the way producers utilize and tune her. With the huge branding power and recognition the Cryptonloids already have, I doubt CFM will have trouble bringing artists to Piapro Studio, so the program’s success will ultimately rest on its quality. It’s also worth noting that, for the first time, the most well-known voice synth character will not be a Vocaloid. Of course, the Vocaloid voice banks for all of Crypton’s characters can and will still be used, but any future updates will not carry the name. Aside from changing terminology, this doesn’t affect fans all that much, but it’ll probably make explaining the whole “Vocaloid” music scene to the uninitiated just a bit more difficult. This change has more of an affect on the companies, as Crypton will have more control over their software development, and Yamaha will lose some of the most famous characters from the Vocaloid program. They’re both big companies, so there’s little cause for concern about their financial well-being, but it’s interesting to consider how this might affect the development side of their voice synth software.
The second big development was the release of a new free software called NEUTRINO in late February 2020. Neutrino was created by SHACHI and is described as a “neural singing synthesizer”. Most other voice synth programs draw from a voice bank of phonemes that the user can manipulate to create the desired sound. Instead, Neutrino uses a neural network that learns how to tune using complete singing samples from a given voice provider; the user only needs to input the melody and lyrics and the AI does the rest. Hear the results for yourself:
The first thing to note is that, just as the software claims, this voice is incredibly smooth and natural. It seems to easily remove most of the choppiness and robotic intonation that usually take time and attention to work out. That doesn’t mean it’s perfect; you can hear in this “Meltdown” cover that the voice struggles with the high notes at times. Even so, the results are undoubtedly impressive for how simple it is to use. Additionally, a producer has demonstrated that the user can adjust or correct the tuning that the AI creates. They posted an “untuned” version (only using Neutrino) and a “tuned” version of the same song for comparison. The tuned version visualizes where they tweaked the pitch so you can compare them easily:
Neutrino is not the only voice synth software to make use of artificial intelligence. Just last year, Yamaha’s VOCALOID:AI software reproduced the voice of the late Japanese vocalist Hibari Misora with startling success, and in February 2020, the Chinese language voice synth cloud platform AISingers began adding voice banks. However, VOCALOID:AI does not currently have any commercial products, and AISingers, being a Chinese language software, is not as accessible to the Japanese users who make up the majority of Vocaloid-related content. As a result, Neutrino stands out as the most prevalent creative tool for AI voice synth music.
In the month-and-a-half since its release, a huge number of artists, including OSTER project, Kirishima, and ____natural, have already collectively made hundreds of covers and original songs using Neutrino. Usually new voice banks – especially non-Vocaloid voice banks – don’t see such an explosion of content so immediately. This isn’t necessarily mysterious; after all, it’s easy to use, novel, high-quality, and free. What remains to be seen is if Neutrino will maintain a steady user base, and, if so, will AI-tuned voice synths begin to represent more of the content we listen to now that the technology has come this far.
Currently Neutrino only has two voice banks: Yoko and Touhoku Kiritan. Yoko’s voice is not well-suited for popular music, so Kiritan clearly sees much more use (far more than her Utau and Voiceroid versions ever have). If more voices are developed in the future, will Neutrino will be able to turn its success into a trend? We’ve already seen the potential of VOCALOID:AI, but will Yamaha eventually commercialize a professional-style AI voice synth? On the artists’ side, Neutrino seems to lower the skill ceiling for quality synth singing, so it’s possible that we’ll see an influx of new artists using the software. Established producers are already experimenting with the program and showing that they still have a level of control over the tuning, but perhaps some or most will still prefer the phoneme-based software for control, style, or any number of reasons.
Of course, only time will tell. Piapro Studio could ultimately result in no significant changes for Crypton and Vocaloid. Neutrino and neural nets could just be a passing curiosity. Even so, the development of Hatsune Miku NT and the influence of AI voice synths contain a wealth of possibility for the future of Vocaloid music and the artists we love.