
Google didn't just go it bat at it's I/O developer shindig - it was the bat, swinging hard and knocking out teeth. At the core is Veo3, Google's state-of-the-art video generation model, which builds on its predecessors by offering enhanced realism, longer sequences, and multimodal inputs. Surrounding Veo3 are ancillary tools like Extend, Flow, and Stitch, each adding layers of functionality. Then there's Google Workspace, which integrates these capabilities into productivity software, together, they form an ecosystem where creativity is not just augmented but reinvented.
Rory Flynn' post our entry point, showcasing Veo3's capabilities with a 49-second video of a British rockstar backstage. The prompt is a masterpiece of specificity: "A British rockstar with wild 80s-style hair and a studded leather jacket fumbles with a camera backstage at a rock concert, holding it in close like a snorricam rig." The result is a scene with raw, handheld texture, moody lighting, and dialogue ("Alright, is this thing on...let's do it") that feels plucked from a documentary.
But wait, there's more. Starting today in the U.S., Google's Labs introduces a virtual try-on tool for clothing. Imagine this: you see a great shirt, but you're not sure if it's right for you. You upload a picture of yourself, and voilà, the AI generates an image of you wearing that shirt. It uses image-to-image translation models to map the shirt onto your body, adjusting for pose, lighting, and fabric texture. The result? A personalized fitting room in your browser. This is the future of retail, where the shopping experience is as much about simulation as it is about substance.
Now, consider the rockstar from Rory's video. Imagine if the AI could generate not just the scene but also a series of outfit changes for the rockstar, each tailored to the backstage environment. This isn't just about extending the narrative of the rockstar; it's about transforming him from a single moment into a dynamic character with a wardrobe that evolves with the story. Essentially This Is Spinal Tap turned up to 11.
This is where the ecosystem starts to feel like a cultural shift. The ease of creating such scenes reinvents the future where professional video production is accessible to anyone with a prompt. But with the addition of virtual fashion, we're also seeing a convergence of media and commerce. The dancer isn't just performing; she's modeling. The rockstar isn't just preparing for a concert; he's a style icon. The implications for advertising, entertainment, and self-expression are profound.
@fofrAI's post adds another dimension. Now standups aren't even required required to stand up. They can just deliver their perfectly honed material from the comfort of their keyboard via a digital avatar.
(fofr on X: "NO WAY. It did it. And, was that, actually funny? Prompt: > a man doing stand up comedy in a small venue tells a joke (include the joke in the dialogue) https://t.co/LrCiVAp1Bl" / X)
And is this where the magic (or the tragedy) lies? In the above example the model generates the setup, the punchline, and the audience reaction, all based on patterns in its training data. The humor, if any, is a statistical anomaly, a convergence of probabilities that occasionally lands. Meaning if all comedy can't be reduced to 1s and 0s are we really the punchline? The rockstar, fumbling with his camera, is a metaphor for this moment. He's trying to capture something real, something authentic, but the AI is already one step ahead, generating not just the scene but the reaction, the context, the cultural significance.