MusicLM is a machine learning model that can match descriptive text to thousands of sounds. Users can conscript the AI to create songs of varying lengths with parameters that describe the type of tempo, rhythm, and cultural influences they want, whether that’s a game-ready 8-bit soundtrack or a thumping reggaeton anthem. You can even instruct it to add lyrics to the beat, though judging by the samples Google released, it seems to be generating little more than gibberish (albeit very on-beat gibberish).
If that wasn’t impressive enough, how about the fact that users can feed it an audio sample of a whistle or hum to emulate the exact melody they want? The AI can also generate all of this sequentially, giving users the ability to craft full songs that can ramp up or down in tempo with different sections. It takes all of these variables and seamlessly produces a full 24KHz audio composition, ranging anywhere from 15 seconds to 5 minutes.
A whitepaper detailing the research on it says that MusicLM was built atop AudioLM, which can hear a piece of music and attempt to emulate it. However, the project members suggest that implementing a text-oriented solution is a much more intense undertaking, as it’s significantly harder to accurately train the model on the complexities of sounds through everyday human definitions. Plus, it hasn’t had the same enormous library of samples to work from as other image-based machine learning algorithms have, though it appears to be closing the gap.
For all the latest Games News Click Here
For the latest news and updates, follow us on Google News.