Undergrads Create AI Speech Model Dia to Challenge Google’s NotebookLM 🚀

Hold onto your hats, folks! 🤯 Toby Kim and his buddy over at Nari Labs in Korea have pulled off something that’ll knock your socks off. These two undergrads, with just a sprinkle of three months diving into speech AI, whipped up Dia—a whopping 1.6 billion parameter model that churns out podcast-style clips like it’s no big deal. And the cherry on top? It’s out there for anyone to use. 🎉

Dia isn’t your run-of-the-mill voice generator. Want your speaker to sound like they’ve just run a marathon or maybe throw in a cough for that ‘I stayed up too late’ vibe? Dia’s got you covered. It’s all in the details—tones, disfluencies, even laughs. 🗣️ And the best part? You don’t need a supercomputer to run it. A decent PC with 10GB of VRAM will do, and it’s up for grabs on Hugging Face and GitHub. Oh, and did we mention it can clone voices? Because it totally can. 🎙️

Here’s where it gets juicy: Dia took a leaf out of Google’s NotebookLM but decided to throw the script (pun intended) out the window for more creative freedom. Thanks to Google’s TPU Research Cloud, Toby and team trained Dia without breaking the bank. Talk about dreaming big on a budget. 💡

But, and there’s always a but, Dia’s power comes with a side of ‘handle with care.’ Like its voice-generating cousins, it’s got zero safeguards against mischief. Nari Labs is all ‘please play nice,’ but let’s be real—they can’t babysit everyone. Plus, the mystery around Dia’s training data has some folks side-eyeing the copyright implications. 📚

Peeking into the crystal ball, Nari’s plotting to teach Dia new languages and maybe, just maybe, turn it into a social platform for synthetic voices. The future? Sounds like a blast. 🌈

Related news