You might have heard that some pretty significant news has dropped over the past few months on the software side of Vocaloid and voice synthesizers generally. Here I’ll be taking some time to summarize what we know about these developments and briefly consider what implications they might have for Vocaloid and voice synth music.
The first bit of news came late last year when Crypton Future Media announced they would be pulling their voice banks out of future updates to the Vocaloid software. Instead, the “Cryptonloids” will have new voice banks released on Piapro Studio, Crypton’s own voice synth software. A few months later, the first of their new voice banks was announced: Hatsune Miku NT (New Type). Miku NT is still in development, with the prototype slated to release this month, but you can hear a short demo here and another sample from a Pocari Sweat commercial.
NT retains the sound of Miku’s distinct voice, but we’ll have to wait until the software becomes available publicly to see if the switch affects the way producers utilize and tune her. With the huge branding power and recognition the Cryptonloids already have, I doubt CFM will have trouble bringing artists to Piapro Studio, so the program’s success will ultimately rest on its quality. It’s also worth noting that, for the first time, the most well-known voice synth character will not be a Vocaloid. Of course, the Vocaloid voice banks for all of Crypton’s characters can and will still be used, but any future updates will not carry the name. Aside from changing terminology, this doesn’t affect fans all that much, but it’ll probably make explaining the whole “Vocaloid” music scene to the uninitiated just a bit more difficult. This change has more of an affect on the companies, as Crypton will have more control over their software development, and Yamaha will lose some of the most famous characters from the Vocaloid program. They’re both big companies, so there’s little cause for concern about their financial well-being, but it’s interesting to consider how this might affect the development side of their voice synth software.
The second big development was the release of a new free software called NEUTRINO in late February 2020. Neutrino was created by SHACHI and is described as a “neural singing synthesizer”. Most other voice synth programs draw from a voice bank of phonemes that the user can manipulate to create the desired sound. Instead, Neutrino uses a neural network that learns how to tune using complete singing samples from a given voice provider; the user only needs to input the melody and lyrics and the AI does the rest. Hear the results for yourself:
The first thing to note is that, just as the software claims, this voice is incredibly smooth and natural. It seems to easily remove most of the choppiness and robotic intonation that usually take time and attention to work out. That doesn’t mean it’s perfect; you can hear in this “Meltdown” cover that the voice struggles with the high notes at times. Even so, the results are undoubtedly impressive for how simple it is to use. Additionally, a producer has demonstrated that the user can adjust or correct the tuning that the AI creates. They posted an “untuned” version (only using Neutrino) and a “tuned” version of the same song for comparison. The tuned version visualizes where they tweaked the pitch so you can compare them easily:
Neutrino is not the only voice synth software to make use of artificial intelligence. Just last year, Yamaha’s VOCALOID:AI software reproduced the voice of the late Japanese vocalist Hibari Misora with startling success, and in February 2020, the Chinese language voice synth cloud platform AISingers began adding voice banks. However, VOCALOID:AI does not currently have any commercial products, and AISingers, being a Chinese language software, is not as accessible to the Japanese users who make up the majority of Vocaloid-related content. As a result, Neutrino stands out as the most prevalent creative tool for AI voice synth music.
In the month-and-a-half since its release, a huge number of artists, including OSTER project, Kirishima, and ____natural, have already collectively made hundreds of covers and original songs using Neutrino. Usually new voice banks – especially non-Vocaloid voice banks – don’t see such an explosion of content so immediately. This isn’t necessarily mysterious; after all, it’s easy to use, novel, high-quality, and free. What remains to be seen is if Neutrino will maintain a steady user base, and, if so, will AI-tuned voice synths begin to represent more of the content we listen to now that the technology has come this far.
Currently Neutrino only has two voice banks: Yoko and Touhoku Kiritan. Yoko’s voice is not well-suited for popular music, so Kiritan clearly sees much more use (far more than her Utau and Voiceroid versions ever have). If more voices are developed in the future, will Neutrino will be able to turn its success into a trend? We’ve already seen the potential of VOCALOID:AI, but will Yamaha eventually commercialize a professional-style AI voice synth? On the artists’ side, Neutrino seems to lower the skill ceiling for quality synth singing, so it’s possible that we’ll see an influx of new artists using the software. Established producers are already experimenting with the program and showing that they still have a level of control over the tuning, but perhaps some or most will still prefer the phoneme-based software for control, style, or any number of reasons.
Of course, only time will tell. Piapro Studio could ultimately result in no significant changes for Crypton and Vocaloid. Neutrino and neural nets could just be a passing curiosity. Even so, the development of Hatsune Miku NT and the influence of AI voice synths contain a wealth of possibility for the future of Vocaloid music and the artists we love.
-Grindloid
Additional Sources: VNN on Neutrino, VNN on Crypton, Neutrino’s website, Hatsune Miku NT Product Page, VocaDB.net
Judging by this article, the Crypton strategy looks weak. In my opinion, one filter is not enough to attract most producers. And if Crypton does not have more innovative plans then it is unlikely that they will succeed. In addition, all neural networks have already begun to develop.
Regarding NEUTRINO, I think that SHACHI is a subsidiary of a larger holding company. I think we should wait for paid products based on NEUTRINO. It is unlikely that the developer will suck his finger when the producers are rowing money
p.s. Is this the end of Miku’s career?
Thanks for your comment!
It’s worth noting that Piapro Studio has been packaged with all of Crypton’s V3 and V4 voice banks (albeit as a plug-in) and the software is compatible with all versions of VOCALOID except V5, so hopefully they’ve got a good idea of what they’re getting into. Though I do agree that Piapro Studio should display some kind of advantage for users if they’re willing to commit to developing a standalone program. Still, with the sheer scale of Miku’s popularity (not to mention the Cryptonloids as a group), even if Piapro Studio flops, at worst Crypton will just have to return to VOCALOID.
I wasn’t able to find much info about SHACHI, but if I’m reading their Twitter profile correctly, it says NEUTRINO is a personal hobby not affiliated with an organization. I assume that’s why they used Tohoku Kiritan, as her singing samples are free to use. Either way, NEUTRINO has tested the waters for companies that will be releasing paid products, like you said. I wouldn’t be surprised if we see Yamaha or another company unveil a product within the year.
Then I wish SHACHI to find investors. Do you happen to know what license neutrino is distributed under? GNU GPL?
I was considering if it could be acquired (as long as the company isn’t already developing its own program). Of course programs like UTAU exist, but even UTAU takes donations, so I wonder if the developer is making anything from NEUTRINO.
I don’t know what it’s distributed under.
Not having to pay hundreds of dollars for a license every time a sound bank gets updated is a pretty big incentive to stop using Vocaloid, I’d say.
Not hundreds 70~120$. This greedy Crypton wants $ 200