Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Do falling delivery charges matter in an AI future?

    July 28, 2025

    mRAKL: Multilingual Retrieval-Augmented Information Graph Building for Low-Resourced Languages

    July 28, 2025

    Bioinspired synthetic muscle tissue allow robotic limbs to push, carry and kick

    July 28, 2025
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Raiza Martin on Constructing AI Purposes for Audio – O’Reilly
    Machine Learning & Research

    Raiza Martin on Constructing AI Purposes for Audio – O’Reilly

    Oliver ChambersBy Oliver ChambersJuly 11, 2025No Comments15 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Raiza Martin on Constructing AI Purposes for Audio – O’Reilly
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Generative AI within the Actual World

    Generative AI within the Actual World: Raiza Martin on Constructing AI Purposes for Audio



    Loading




    00:00
    /
    36m 00s


    Audio is being added to AI in every single place: each in multimodal fashions that may perceive and generate audio and in purposes that use audio for enter. Now that we will work with spoken language, what does that imply for the purposes that we will develop? How can we take into consideration audio interfaces—how will individuals use them, and what is going to they wish to do? Raiza Martin, who labored on Google’s groundbreaking NotebookLM, joins Ben Lorica to debate how she thinks about audio and what you possibly can construct with it.

    In regards to the Generative AI within the Actual World podcast: In 2023, ChatGPT put AI on everybody’s agenda. In 2025, the problem will likely be turning these agendas into actuality. In Generative AI within the Actual World, Ben Lorica interviews leaders who’re constructing with AI. Be taught from their expertise to assist put AI to work in your enterprise.

    Take a look at different episodes of this podcast on the O’Reilly studying platform.

    Timestamps

    • 0:00: Introduction to Raiza Martin, who cofounded Huxe and previously led Google’s NotebookLM staff. What made you assume this was the time to commerce the comforts of massive tech for a storage startup?
    • 1:01: It was a private resolution for all of us. It was a pleasure to take NotebookLM from an thought to one thing that resonated so extensively. We realized that AI was actually blowing up. We didn’t know what it could be like at a startup, however we needed to attempt. Seven months down the street, we’re having a good time.
    • 1:54: For the 1% who aren’t aware of NotebookLM, give a brief description.
    • 2:06: It’s principally contextualized intelligence, the place you give NotebookLM the sources you care about and NotebookLM stays grounded to these sources. Certainly one of our commonest use circumstances was that college students would create notebooks and add their class supplies, and it turned an skilled that you would discuss with.
    • 2:43: Right here’s a use case for owners: put all of your consumer manuals in there. 
    • 3:14: We now have had lots of people inform us that they use NotebookLM for Airbnbs. They put all of the manuals and directions in there, and customers can discuss to it.
    • 3:41: Why do individuals want a private each day podcast?
    • 3:57: There are a number of completely different ways in which I take into consideration constructing new merchandise. On one hand, there are acute ache factors. However Huxe comes from a distinct angle: What if we might attempt to construct very pleasant issues? The inputs are slightly completely different. We tried to think about what the common individual’s each day life is like. You get up, you examine your telephone, you journey to work; we considered alternatives to make one thing extra pleasant. I feel so much about TikTok. When do I take advantage of it? Once I’m standing in line. We landed on transit time or commute time. We needed to do one thing novel and attention-grabbing with that house in time. So one of many first issues was creating actually personalised audio content material. That was the provocation: What do individuals wish to take heed to? Even on this brief time, we’ve realized so much in regards to the quantity of alternative.
    • 6:04: Huxe is cellular first, audio first, proper? Why audio?
    • 6:45: Coming from our learnings from NotebookLM, you be taught basically various things while you change the modality of one thing. Once I go on walks with ChatGPT, I simply speak about my day. I observed that was a really completely different interplay from after I kind issues out to ChatGPT. The flip aspect is much less about interplay and extra about consumption. One thing in regards to the audio format made the varieties of sources completely different as effectively. The sources we uploaded to NotebookLM had been completely different on account of wanting audio output. By specializing in audio, I feel we’ll be taught completely different use circumstances than the chat use circumstances. Voice remains to be largely untapped. 
    • 8:24: Even in textual content, individuals began exploring different kind elements: lengthy articles, bullet factors. What sorts of issues can be found for voice?
    • 8:49: I consider two codecs: one passive and one interactive. With passive codecs, there are a number of various things you possibly can create for the consumer. The issues you find yourself taking part in with are (1) what’s the content material about and (2) how versatile is the content material? Is it brief, lengthy, malleable to consumer suggestions? With interactive content material, possibly I’m listening to audio, however I wish to work together with it. Possibly I wish to take part. Possibly I need my mates to affix in. Each of these contexts are new. I feel that is what’s going to emerge within the subsequent few years. I feel we’ll be taught that the varieties of issues we are going to use audio for are basically completely different from the issues we use chat for.
    • 10:19: What are a few of the key classes to keep away from from sensible audio system?
    • 10:25: I’ve owned so lots of them. And I like them. My main use for the sensible audio system remains to be a timer. It’s costly and doesn’t stay as much as the promise. I simply don’t assume the know-how was prepared for what individuals actually needed to do. It’s laborious to consider how that might have labored with out AI. Second, one of the vital tough issues about audio is that there isn’t any UI. A sensible speaker is a bodily system. There’s nothing that tells you what to do. So the educational curve is steep. So now you’ve got a consumer who doesn’t know what they’ll use the factor for. 
    • 12:20: Now it will possibly accomplish that far more. Even and not using a UI, the consumer can simply attempt issues. However there’s a threat in that it nonetheless requires enter from the consumer. How can we take into consideration a system that’s so supportive that you just don’t should give you the best way to make it work? That’s the problem from the sensible speaker period.
    • 12:56: It’s attention-grabbing that you just level out the UI. With a chatbot it’s a must to kind one thing. With a wise speaker, individuals began getting creeped out by surveillance. So, will Huxe surveil me?
    • 13:18: I feel there’s one thing easy about it, which is the wake phrase. As a result of sensible audio system are triggered by wake phrases, they’re all the time on. If the consumer says one thing, it’s most likely choosing it up, and it’s most likely logged someplace. With Huxe, we wish to be actually cautious about the place we consider client readiness is. You wish to push slightly bit however not too far. Should you push too far, individuals get creeped out. 
    • 14:32: For Huxe, it’s a must to flip it on to make use of it. It’s clunky in some methods, however we will push on that boundary and see if we will push for one thing that’s extra ambiently on. We’re beginning to see the emergence of extra instruments which can be all the time on. There are instruments like Granola and Cluely: They’re all the time on, taking a look at your display, transcribing your audio. I’m curious—are we prepared for know-how like that? In actual life, you possibly can most likely get probably the most utility from one thing that’s all the time on. However whether or not customers are prepared remains to be TBD.
    • 15:25: So that you’re ingesting calendars, e-mail, and different issues from the customers. What about privateness? What are the steps you’ve taken?
    • 15:48: We’re very privateness targeted. I feel that comes from constructing NotebookLM. We needed to ensure we had been very respectful of consumer knowledge. We didn’t prepare on any consumer knowledge; consumer knowledge stayed non-public. We’re taking the identical method with Huxe. We use the information you share with Huxe to enhance your private expertise. There’s one thing attention-grabbing in creating private suggestion fashions that don’t transcend your utilization of the app. It’s slightly tougher for us to construct one thing good, nevertheless it respects privateness, and that’s what it takes to get individuals to belief.
    • 17:08: Huxe might discover that I’ve a flight tomorrow and inform me that the flight is delayed. To take action, it has needed to contact an exterior service, which now is aware of about my flight.
    • 17:26: That’s a superb level. I take into consideration constructing Huxe like this: If I had been in your pocket, what would I do? If I noticed a calendar that stated “Ben has a flight,” I can examine that flight with out leaking your private data. I can simply search for the flight quantity. There are a number of methods you are able to do one thing that gives utility however doesn’t leak knowledge to a different service. We’re attempting to know issues which can be far more motion oriented. We attempt to let you know about climate, about visitors; these are issues we will do with out stepping on consumer privateness.
    • 18:38: The way in which you described the system, there’s no social element. However you find yourself studying issues about me. So there may be the potential for constructing a extra subtle filter bubble. How do you guarantee that I’m ingesting issues past my filter bubble?
    • 19:08: It comes right down to what I consider an individual ought to or shouldn’t be consuming. That’s all the time difficult. We’ve seen what these feeds can do to us. I don’t know the right method but. There’s one thing attention-grabbing about “How do I get sufficient consumer enter so I may give them a greater expertise?” There’s sign there. I attempt to consider a consumer’s feed from the attitude of relevance and fewer from an editorial perspective. I feel the relevance of data might be sufficient. We’ll most likely check this as soon as we begin surfacing extra personalised data. 
    • 20:42: The opposite factor that’s actually essential is surfacing the right controls: I like this; right here’s why. I don’t like this; why not? The place you inject pressure within the system, the place you assume the system ought to push again—that takes slightly time to determine the best way to do it proper.
    • 21:01: What in regards to the boundary between giving me content material and offering companionship?
    • 21:09: How do we all know the distinction between an assistant and a companion? Basically the capabilities are the identical. I don’t know if the query issues. The consumer will use it how the consumer intends to make use of it. That query issues most within the packaging and the advertising. I discuss to individuals who speak about ChatGPT as their greatest buddy. I discuss to others who speak about it as an worker. On a capabilities stage, they’re most likely the identical factor. On a advertising stage, they’re completely different.
    • 22:22: For Huxe, the best way I take into consideration that is which set of use circumstances you prioritize. Past a easy dialog, the capabilities will most likely begin diverging. 
    • 22:47: You’re now a part of a really small startup. I assume you’re not constructing your personal fashions; you’re utilizing exterior fashions. Stroll us by privateness, given that you just’re utilizing exterior fashions. As that mannequin learns extra about me, how a lot does that mannequin retain over time? To be a very good companion, you possibly can’t be clearing that cache each time I sign off.
    • 23:21: That query pertains to the place we retailer knowledge and the way it’s handed off. We go for fashions that don’t prepare on the information we ship them. The subsequent layer is how we take into consideration continuity. Individuals anticipate ChatGPT to have data of all of the conversations you’ve got. 
    • 24:03: To help that it’s a must to construct a really sturdy context layer. However you don’t should think about that every one of that will get handed to the mannequin. Numerous technical limitations forestall you from doing that anyway. That context is saved on the software layer. We retailer it, and we attempt to determine the suitable issues to cross to the mannequin, passing as little as doable.
    • 25:17: You’re from Google. I do know that you just measure, measure, measure. What are a few of the indicators you measure? 
    • 25:40: I take into consideration metrics slightly in another way within the early levels. Metrics to start with are nonobvious. You’ll get a number of trial habits to start with. It’s slightly tougher to know the preliminary consumer expertise from the uncooked metrics. There are some primary metrics that I care about—the speed at which individuals are in a position to onboard. However so far as crossing the chasm (I consider product constructing as a sequence of chasms that by no means finish), you search for individuals who actually adore it, who rave about it; it’s a must to take heed to them. After which the individuals who used the product and hated it. If you take heed to them, you uncover that they anticipated it to do one thing and it didn’t. It allow them to down. You need to hear to those two teams, after which you possibly can triangulate what the product seems to be wish to the skin world. The factor I’m attempting to determine is much less “Is it successful?” however “Is the market prepared for it? Is the market prepared for one thing this bizarre?” Within the AI world, the fact is that you just’re testing client readiness and want, and the way they’re evolving collectively. We did this with NotebookLM. After we confirmed it to college students, there was zero time between after they noticed it and after they understood it. That’s the primary chasm. Can you discover individuals who perceive what they assume it’s and really feel strongly about it?
    • 28:45: Now that you just’re exterior of Google, what would you need the muse mannequin builders to concentrate on? What points of those fashions would you wish to see improved?
    • 29:20: We share a lot suggestions with the mannequin suppliers—I can present suggestions to all of the labs, not simply Google, and that’s been enjoyable. The universe of issues proper now could be fairly well-known. We haven’t touched the house the place we’re pushing for brand new issues but. We all the time attempt to drive down latency. It’s a dialog—you possibly can interrupt. There’s some primary habits there that the fashions can get higher at. Issues like tool-calling, making it higher and parallelizing it with voice mannequin synthesis. Even simply the range of voices, languages, and accents; that sounds primary, nevertheless it’s really fairly laborious. These high three issues are fairly well-known, however it is going to take us by the remainder of the yr.
    • 30:48: And narrowing the hole between the cloud mannequin and the on-device mannequin.
    • 30:52: That’s attention-grabbing too. At this time we’re making a number of progress on the smaller on-device fashions, however while you consider supporting an LLM and a voice mannequin on high of it, it really will get slightly bit bushy, the place most individuals would simply return to business fashions.
    • 31:26: What’s one prediction within the client AI house that you’d make that most individuals would discover stunning?
    • 31:37: Lots of people use AI for companionship, and never within the ways in which we think about. Nearly everybody I discuss to, the utility may be very private. There are a number of work use circumstances. However the rising aspect of AI is private. There’s much more space for discovery. For instance, I take advantage of ChatGPT as my working coach. It ingests all of my working knowledge and creates working plans for me. The place would I slot that? It’s not productiveness, nevertheless it’s not my greatest buddy; it’s simply my working coach. An increasing number of individuals are doing these difficult private issues which can be nearer to companionship than enterprise use circumstances. 
    • 33:02: You had been imagined to say Gemini!
    • 33:04: I like all the fashions. I’ve a use case for all of them. However all of us use all of the fashions. I don’t know anybody who solely makes use of one. 
    • 33:22: What you’re saying in regards to the nonwork use circumstances is so true. I come throughout so many individuals who deal with chatbots as their mates. 
    • 33:36: I do it on a regular basis now. When you begin doing it, it’s so much stickier than the work use circumstances. I took my canine to get groomed, and so they needed me to add his rabies vaccine. So I began serious about how effectively it’s protected. I opened up ChatGPT, and spent eight minutes speaking about rabies. Persons are turning into extra curious, and now there’s an instantaneous outlet for that curiosity. It’s a lot enjoyable. There’s a lot alternative for us to proceed to discover that. 
    • 34:48: Doesn’t this point out that these fashions will get sticky over time? If I discuss to Gemini so much, why would I swap to ChatGPT?
    • 35:04: I agree. We see that now. I like Claude. I like Gemini. However I actually just like the ChatGPT app. As a result of the app is an efficient expertise, there’s no cause for me to modify. I’ve talked to ChatGPT a lot that there’s no method for me to port my knowledge. There’s knowledge lock-in.
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    mRAKL: Multilingual Retrieval-Augmented Information Graph Building for Low-Resourced Languages

    July 28, 2025

    How Uber Makes use of ML for Demand Prediction?

    July 28, 2025

    Benchmarking Amazon Nova: A complete evaluation by way of MT-Bench and Enviornment-Exhausting-Auto

    July 28, 2025
    Top Posts

    Do falling delivery charges matter in an AI future?

    July 28, 2025

    How AI is Redrawing the World’s Electrical energy Maps: Insights from the IEA Report

    April 18, 2025

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025
    Don't Miss

    Do falling delivery charges matter in an AI future?

    By Sophia Ahmed WilsonJuly 28, 2025

    Two sweeping visions of the longer term have been unfolding, every producing grim — but…

    mRAKL: Multilingual Retrieval-Augmented Information Graph Building for Low-Resourced Languages

    July 28, 2025

    Bioinspired synthetic muscle tissue allow robotic limbs to push, carry and kick

    July 28, 2025

    10 Uncensored AI Girlfriend Apps: My Expertise

    July 28, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2025 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.