Synthesia’s Victor Riparbelli on creating an environment to harness AI benefits and reduce harms
A self-proclaimed sci-fi enthusiast, Victor Riparbelli is drawn to the exciting frontiers of technology. His work began with website development and evolved into a passion for company building. Eventually, he found his place in the European startup ecosystem as Synthesia’s Co-Founder and CEO, where he has had a lasting impact.
After a year of exploring VR, AR, and AI startup ideas, he crossed paths with Professor Matthias Niessner. Niessner’s influential research on AI video generation at Stanford captivated Victor. When he saw the research paper for the first time, he knew he wanted to focus on exploring its concepts.
“I just felt like I saw magic. It's rare you get those moments in life. A lot of people had that with ChatGPT where, when you try it, you're mind blown. I had that moment. I saw the technology and realized this is going to change everything we know about media production.” - Victor Riparbelli
Building the Leading AI Text-to-Video Platform
The company they went on to build, Synthesia, is now the leading AI text-to-video platform for the enterprise. We first met Victor Riparbelli and the Synthesia team in 2021, and we’ve watched them push the boundaries of what’s achievable with generative AI.
Even when they launched in 2018, when the term generative AI was relatively unused, Synthesia had embraced the technology, referring to it as synthetic media. Despite their progress, Victor feels they’re only 5% into the roadmap of what’s coming at Synthesia, let alone the AI video ecosystem. There are entire scenes, interactive avatars, and enhanced movement, all to come.
The visionary nature of Victor and his co-founders is clear. They've come remarkably close to predicting precise developments across the AI industry, including the fact that text-to-video would materialize around 2023, and full Hollywood-style filmmaking would be done with AI by 2028 – both of which are well underway.
Victor Riparbelli’s Take on the Future of AI
In conversation with Accel’s Philippe Botteri, Victor explores Synthesia’s journey and breaks down his predictions for the future of artificial intelligence. The discussion extends to offer guidance for founders on product development, fundraising, AI research, and navigating regulatory shifts. Listeners will hear why Victor remains unsurprised by the swift transformations taking place in the landscape.
- 00:00 - Synthesia’s founding story and Victor’s belief in the radical impact of AI on video
- 05:00 - The hard work that went into building a founding team and raising initial funding from Mark Cuban
- 13:00 - Org structures of powerful AI companies; balancing science, research, and development
- 19:40 - Victor’s predictions for the future of video avatars and multimodal intelligence
- 34:00 - How Synthesia’s team built a great GTM engine through virality
- 39:00 - How founders and creators can harness AI’s benefits and reduce its harms
- 42:00 - The challenges and implications of AI regulation and legislation
Learn more about Accel’s relationship with Synthesia:
Explore more episodes from the season:
- Episode 01: AssemblyAI's Dylan Fox on building an AI company during a period of radical change
- Episode 02: Roblox’s Daniel Sturman on Building Great Teams in the AI Era
- Episode 03: Ada’s Mike Murchison on how AI is revolutionizing customer service
- Episode 04: Merge’s Shensi Ding on powering the next generation of AI SaaS companies
- Episode 05: Scale AI’s Alexandr Wang on the most powerful technological advancement of our time
- Episode 06: Bard’s Jack Krawczyk on the birth of Google’s AI chatbot and the creative potential that lies ahead
- Episode 07: Synthesia’s Victor Riparbelli on creating an environment to harness AI benefits and reduce harms
- Episode 08: Ironcald's Cai GoGwilt on a decade of anticipating the transformative power of AI
- Episode 09: Checkr’s Daniel Yanisse on tackling bias in people and AI
- Episode 10: Cinder’s Glen Wise on trust and safety threats and holding AI accountable
- Episode 11: Transcend’s Kate Parker on putting data back into the hands of users in an AI-driven world
- Episode 12: Arm’s Rene Haas on building the brain of artificial intelligence
Philippe Botteri (00:22):
Welcome to Spotlight On, I'm your host and I'm here today with Victor Riparbelli, the founder and CEO of Synthesia. Welcome, Victor.
Victor Riparbelli (00:33):
Thank you for having me. Philippe. Excited to be here.
Philippe Botteri (00:35):
Well, it's super exciting to be with you today, Victor. Just to give some context to our listeners, Synthesia is a leading AI text video platform for the enterprise. It's been incredible to watch all the progress that you guys have made and how you've really managed to push the boundaries of what's possible using Jive AI in the field of video and how that is radically changing the field of enterprise communication. But just Victor, to get started, tell us a bit more about the founding story of Synthesia and how you went from skateboarding and listening to punk rock music in Copenhagen to start one of the leading gen AI companies of your generation.
Victor’s background, interests and entrepreneurial origins
Victor Riparbelli (01:26):
I think my story, my path was starting Synthesia really started in my early childhood. I had a deep fascination and curiosity about computers and technology from a very early age, more than a tinkerer, than as a developer. I love playing games around a huge world of Warcraft Guild. When I was not that old, I used to have a voice filter to make my voice sound deeper, which was kind of fun. And so I began building e-commerce stores websites for local businesses in Copenhagen where I grew up. It was an interesting point in time, this is almost 15 years ago, but there's a point in time where everybody wanted to have a website and everybody wanted to have an e-commerce store. Everyone thought that it would cost you 30, 40, 50 k Euros to make a web shop because that's what all the agencies were quoting them.
(02:15): And I remember sitting at home and I was like, I can use Shopify or Squarespace, something like that, and I can actually put up a pretty decent web shop in two days. And so that's exactly what I did. I went out and said, Hey, if you pay me 3000 euros, I'll put up a web shop for you. And so that's kind of how I started my path into turning my interest in technology into a career. So I did that. Then it turned out that when people had these web stores, they also wanted to market them. They wanted them to grow. That led me into online marketing, running AdWords, Facebook ads, a lot of these kind of very practical things around how you run an online business. And I figured out that I loved it and thought it was really interesting. I learned about startups, which was a different category of working with technology and moved into the Danish startup ecosystem where I held a bunch of roles mostly around product and growth using my, I think, light technical chops, but combining that with an understanding of both the commercial side of building a product and selling that to people.
(03:19): And after having done that for four or four or five years, something like that, I knew I wanted to make my own company. I knew I love to build products, but I'd also learned during that time that I wasn't super passionate about building bookkeeping software and business process tools in my private life. I've always been a huge sci-fi nerd. I love the weird, wonderful, strange frontiers of technology and I wanted to build something with that. So I decided to move to London because as great of a city as Copenhagen is, it's not really the place to build frontier technology. And to make a very long story, very short, I spent 12 months in London working for myself along with a professor in machine learning. We were doing some consultancy work for the UK government for some big companies, mostly centered around VR and AR. This is right around the time when Oculus Rift came out.
(04:05): But I already spent that year on just getting super deep into the technology underpinning lot this, a lot of that was very much around AI. And I met my today co-founder, Professor Matthias Niessner. He had done some of the similar research in the space of AI video generation when he was at Stanford. And when I saw his research paper for the first time, I just felt like I saw magic. I think it's rare you get those moments in life. I think a lot of people had that recently with something like chatGPT, which when you try it the first time, you're just mind blown that this is even possible. And I had that moment. I saw the technology, I was just like, this is going to change everything we know about media production. There was an idea that I was captivating and interesting enough for me that I felt like I wanted to spend some years of my life exploring it. And that's actually how we initially kind of built the thesis then it was about getting a team together. It was very hard, but we managed to do it. We managed to raise the first round of funding from Mark Cuban.
Finding co-founders, cold-emailing Mark Cuban and raising the first funding round
Philippe Botteri (04:59):
Tell us a bit more about that, because that was quite an extraordinary story. How do you manage to get that first funding? It was a lot of hard work and some luck.
Victor Riparbelli (05:10):
So this is back in, I mean, AI had sort of a moment also in 2017, 2018, there was a lot of VC money going into the space, but most VCs to invest in PhDs as the CEOs, the people who are the deep technical experts, was going to drive forward this field. At this point in time, I was 25 years old and I had a decent track record, but I didn't have Google on my cv. I wasn't an AI PhD. So I think to a lot of people they were a bit like, why would you start this company? What makes you qualified do that? And maybe there wasn't that much that made me qualified to do it, be completely honest. But we persevered through that and managed to first convince Professor Matthias Niessner and Professor Lourdes Agapito who are the the two technical parts of the founding team and also my co-founder, Steffen to join me.
(05:59): And I think once we had the professors on board, we were like, okay, now we have the technical folks. The VCs will love it. Everybody's talking about AI and we have a great big vision for how we can build a fantastic company. And I think it was just one of those things where at the time it sounded completely crazy that you could use AI to generate video content. I don't think people thought it was a great founding team because we were two 25-year-olds with, I mean Steffen had been three years in Africa working in private equity running like sand mines and chicken farms before starting Synthesia. So we were kind of like an art couple in that regard. So we went around to all the VCs in Europe. I think we got turned down by almost like a hundred VCs until Steffen one day out of just pure hustle sent a cold email to Mark Cuban.
(06:42): He found his email in a Sony hack that happened a few years before that and that got leaked and Mark Cuban responded back immediately saying he thought it was very interesting. He knew everything about the technology, which was very rare. None of the investors we met at this point in time had any idea what generative AI was about. And then we had a 12 hour conversation, only an email with Mark Cuban. He would ask us questions, we'd respond back to him and he ended up saying that he'd do a million dollars if the due diligence checked out. And that is actually how we got started. And I think my lesson there was Mark Cuban already had the same thesis we had. We didn't have to convince him that this was going to happen. We didn't have to convince him that it was going to be a great business.
(07:25): He was definitely just evaluating our ideas and the founding team, and I don't, maybe it didn't went deep enough to figure out that maybe we weren't on paper, the right ones do it, or he just thought we had the right answers. But I think the lesson for us there was definitely, it's much easier to work with people who already share your thesis and are more looking to evaluate how you're going to get from A to B and the founding team. Whereas at that point in time we were starting at generative AI is this thing.
The state and perception of generative AI in 2017
Philippe Botteri (07:54):
It wasn't really generative AI at the time. The term was pinned more recently in 2017 was just machine learning. That's how it was. There was a bit of AI and machine learning. But when you started this, I mean, did you add the sense that how fast the technology would evolve and how fast would you achieve the milestone that you have achieved in terms of the technology and the realism of the video that you're producing today?
Victor Riparbelli (08:27):
Well, so we actually did call it generative AI back in 2018. I actually last week found a bunch of my old presentations that I used to run around with that talks AI talks here in London and we decided to call it synthetic media, which turned out to have been maybe a mistake. Now, generative AI is a term that was around, but actually I think a lot of the technology in a very early stage was there back then. And I think the way we think about generative AI today actually was how a lot of people thought about it, including us back then. Of course, we couldn't predict exactly how all this would pan out, but you look at our seed deck, which I actually, what I found a couple of weeks ago, we predicted that text to video, kind of like what we have today in our product would be around 2023.
(09:11): And that full Hollywood style filmmaking only with AI was going to be around 2028. So I think we probably were maybe a bit too pessimistic actually. We ended up having text to video avatars in 2020, and I think by 2025 or 2026, I think we'll see a movie that has been made 80%, 90% by AI and 10% by filming things with a camera. So I think we had the vision and the timelines somewhat, right? The thing that we had to learn and which was a very rough path was how do we go from, okay, we think this is going to happen in the world, how do we sequence a business to get to that really big opportunity down the line?
Why Synthesia is video-first
Philippe Botteri (09:55):
So was it obvious to you from the beginning that video would be a much better mean of communication and learning than text and that the first use case would be, well, let's turn text into video because then the content which we have is absorbed much better. Are the people looking at it? Is it something that was really a core part of the foundation of the company when you started or is that something that you discovered as the technology was evolving?
Victor Riparbelli (10:30):
It was definitely something we discovered as we evolved. I think we obviously knew that video was most, we preferred to watch video when we started the company, but our initial product that we took to market, which is the first more or less three years of the company, was all around working with existing video production professionals. So we were selling this AI solution where you could go in and give us advertisement for example, and then we would take that advertisement, produce it in 10 different languages by changing the voice track, but then also reanimating the face. So it looked like it was recorded originally in French or Italian display. And what we found during that period where we both were doing that, but also just talking to as many people as we could to build our first principles, understanding of what video is and why people make video and what text people would video.
(11:20): The really big learning that we made was that what we thought of as video at the time was what most people think of as video. If everyone who's listening to this right now close their eyes and think of video content, you'll see a cool series you saw on Netflix, you'll see a great produced YouTube video, a nice advertisement on TV or something like that. But this is actually 0.00007% of the video content that gets produced, right? This is the absolute best video content that gets produced. And there's a whole different genre of video, which is the 99% of video content. That's a lot of the stuff videos that have 17 views on YouTube. And what we learned was that these people who were doing the 99% of video content who doesn't have a lot of budget that necessarily people who are great at storytelling, this group of people, their house was really on fire.
(12:12): They all really wanted to make video, but it's so hard for them. They didn't have the budget to do it. They don't know where to start, and it's just really hard to do. And so what we learned during this process of selling to these actual video production professionals was that actually the much more interesting market is these billions of people who will be very, very happy with a video that is slightly lower quality than what you would shoot with a camera if it's a thousand times easier and a thousand times more affordable to make that video. And that's how we got to the inside that led us to the product which we're known for today, which is in easiest studio, which is like a web app, go in, pay $30, you just start making videos immediately. And the learning there was that for these people, and a lot of this content is not what people think again of as amazing content. It's very functional video. And the reason people do very functional video is because information retention is way, way, way higher than it would be with text. That's what led this sort of our mental model to shift from being building technology to make it easier to make video as most people think of video to can we take the world's text and turn it into video that'll have a huge benefit for all these people that we work with.
The unique org-structure challenges of running an AI company
Philippe Botteri (13:25):
So that amazing technology, I mean maybe in a few years from now we'll be able to do this podcast using Synthesia technology, but tell us a bit more about the science behind Synthesia. I think one thing that is quite particular about a generative AI company is that they do have a research department. If you look at most startup, they have an engineering department, they have a product department, and so you're managing engineer, you're managing product manager. But now with when you have an AI company like this, now suddenly you have research engineering and product, which adds a lot of complexity of course in the management, but also working with researcher is not the same as working with engineers and product managers as well. So tell us a bit more about how you thought about this from the start. How much of the science and the research you want to have internally versus relying on other proprietary or open source models?
Victor Riparbelli (14:31):
I think for us, we try to be very, very, very intentional about what kind of models do we want to be the best in the world at, and then focusing only on those and then using off the shelf open source, third party provider for the rest, for that first that's all about digital humans and digital voices. It's not just a new technology, it's a new type of company. I think in the same way that if you go back to tech companies or IT companies in the nineties who generally sold software that would arrive on a CD-ROM or on-prem to a big corporate or big enterprise, if you just look at the org structure of those kind of companies, they look very, very different than from what a SaaS company does. You need different types of functions, you need the different ways of operating. It's just two different types of companies, even though both of them sell software.
(15:19): And I think AI is the next natural evolution of that. And one of the big things as you mentioned is that the R&D department all of a sudden is not just a satellite team who is sitting somewhere and you're just hoping for them to build some cool technology that maybe we'll implement the product. Everything centers around the technology we're building in the AI team, and that means that for us, we have a big RD team who does everything to do with all the AI models, and we have a product engineering team who's building the product that sits around it, like everything like the more traditional kind of SaaS stuff. And these are just two very different teams. They think very differently. I think it's really important as a founder to really deeply understand both of those. For most founders, I think the natural thing will be to understand SaaS.
(16:05): If you come from a SaaS who've been in technology sector for the last whatever, 10, 20 years, but research teams is different. They work on, it's very hard to set timelines for a research team because they're doing science and sometimes science can take, and especially with building new networks, maybe you stumble upon the right neural network the first time and the problem is solved in two weeks. It rarely happens, but technically that could be possible, right? On the other hand, it can also be like we think it might take six months, but maybe it'll take nine, maybe it'll take 12. It's very hard to plan ahead because you're doing science. The people who work in R&D also care about different things than engineers. Engineers who work in SaaS companies generally a more move, fast break things, ship things always deploy. It is a whole kind of culture around how you're do engineering in a startup. Whereas in the world of research, a lot of people are more motivated by writing research papers, for example, citations.
The balance between publishing research and preserving trade secrets
Philippe Botteri (17:03):
That's a good point on the research paper. So as you say, researchers, they want to publish at some point, but if what you're doing is proprietary on this algorithm, you probably don't want them to publish. So how do you manage that? And so do you allow your researcher to publish? If so, how do you constrain their publishing if you do, and how do you balance that with building a moat and preventing your competitor to have access to the technology that you're building?
Victor Riparbelli (17:38):
So we have done two research papers. I think it is really difficult. There's a couple of things take into account. There's one which is if you hire the best researchers in the world, this is something that's important to them. I think for some people, almost more important than the compensation part actually, right? Because a lot of people who become scientists and academics do it because they want to be a part of the scientific community. They want to build on top what other people have done, and thus this idea of being open around your research is really important. So to attract a lot of those people, you kind of need to have an element of this in it. Then there's the other side of this, which is if you actually are very open on it, you can also have the community help better your models, help better things that you're doing.
(18:24): Of course it's content, the risk of that being out and open, so your competitors might also be able to do it, and then, which is a much more kind of soft thing. But I think a lot of founders, and I think myself also think it's generally good to contribute to the scientific community. For us, I think we do it, but we're not going to publish everything that we do. I think we are seeing a shift in AI companies the last 12 to 18 months where people are closing more and more off, unless you're specifically an open source company. And I think for a company like us, it's much more applied. It's going to be more about building products than doing research papers as we progress, but still hope we can share some of the learnings that we make and think it also helps build the profile of the company, helps you attract the best people and so on.
The importance of combining generative voice and video technologies
Philippe Botteri (19:16):
Yeah, that makes sense. And so we talked about the video, but there also the voice angle, right? So do you see this as two separate field of research or do you think that actually the combination of video and voice at the same time, that is very powerful. So how do you think about that from a research and algorithm standpoint?
Victor Riparbelli (19:41):
If you go back to before this large model LLM foundational model platform shift that we've seen the last two years, something like that, right? Then a lot of what was then called generative AI was these very narrow problems that you could solve with algorithms. For example, draw me an image of a face or understand how exactly my mouth moves when I talk. And we can use that as a component of a bigger system. Now what we are seeing with these bigger and bigger models is that increasingly it can be more, instead of having to solve all the individual paths one by one, we can actually build these systems that are more general by nature. If you just feed them enough data, it sort of works. So the way we look at this is that even though they are separate technologies, ultimately what we are trying to do is to enable our users to make a really awesome video that feels lifelike where you have control over it.
(20:30): And that is a combination of both the video, the voice, the script writing, a bunch of other things. So we think that these models increasingly is going to be joined together. They might still have some element of separation to them, but increasingly all these things will have to play together and be multimodal by nature to produce the outcomes that you really want to see. And I think that's something everyone is seeing. OpenAI just launched their multimodal GPT4 with test ability to speak to you. We can understand what you're saying to it, you can understand images and just feels very natural that all these things ultimately will kind of move more towards being some sort of intelligence that you can use to create content more than it's a specific technology for making text or voice or image or whatever.
Creating 3D data to train better generative video
Philippe Botteri (21:15): And so all these algorithm that you're building, I mean a lot of the success of the technologies based on the data that you use to train this algorithm. And one of the thing in particular that you've built is this super nice studio in East London with a hundred cameras. So tell us a bit more about where did the idea come from for the studio and why was it so important for your research?
Victor Riparbelli (21:42):
If you look at where the technology is today and where it needs to go, we all feel like we're going to bring 5% into the roadmap. There's so much to do before still you can make a full rich scene with avatars interacting with each other, talking to each other, picking up objects, sitting in chairs like what we're doing right now. And the first iteration of these technologies, as impressive as they are, are kind of fairly dumb what most companies today do. We've kind of moved past that, but most of the companies in this space do they take a video and then they loop it in a smart way and then they just change the dip movements. But it's kind of a ish way of actually doing it. And we believe that to really get this to a point where you can create much more rich and interesting scenes, and we need to move into a world where there's some level of 3D understanding and some kind of control structure that can guide how these networks synthesize like video scenes.
(22:37): And to do this, you need to build models that have a really solid world understanding. You want to have models that really understands in detail how a human speaks, how they move, how they articulate. And we think that just getting this from 2D data as in a normal video, it's not going to be enough. These models in a really, really deep understanding of the human as a species almost. And so the way we are approaching this is that we think that these large models are really awesome and they have already made a huge difference, but to really make them work incredibly well, we need less volumes of data, but really, really high quality data. And so one of the things we have done is we've built a studio in London where we capture people in almost like a Hollywood visual effects type of setup where we have lots of different angles, lots of different lights, and that's going to help us build this data set which is going to eventually turn into this kind of foundational model for how humans look move and speak. So that's a big part of how we see the next, not just modular improvement in making the avatars a little bit more motivat and expressive, but the true platform shift in being able to create these videos in a controlled manner where you can do much more interesting things.
Philippe Botteri (23:54):
I think what you're saying is a very important point that the avatars that you're producing are not existing video where you just change the leap movements to make a video say something else, like you're fully generating the full video, which is the full persona, which is I think a very, very important distinction.
Victor Riparbelli (24:14):
And that's something that's going to be in the product hopefully by the end of 2024. And even that is going to be a big, big shift because we're going to take the avatar from, I mean today they're great, but you can generally tell that they're an avatar. They have a little bit of robotics to it, and that has turned out to be very good for a bunch of use cases.
Philippe Botteri (24:36):
People who have seen my avatar say it looks real, but pre-media training, I think it's a good way to say where the V3 is, but the V4 is coming.
Victor Riparbelli (24:47):
The V4 is coming, and the V4 is going to be a huge step up in terms of the realism. And I think with this update, we might be able to push through and be able to actually generate these sort of still speaking to the camera type of videos where the avatars can have emotions. They'll be, if you give them a sad script, they'll kind of be sad and they'll say it in a sad way. If you give them a sales script, they'll be much more excited. They'll perform that much more like an actual afterward. Whereas today is still stuck a little bit in this is slightly more contained robotic type of performance. So that's the next short term that's going to make a huge difference, I think. And then this idea of moving into real worlds and scenes, that's probably going to be something that's going to be more by next year.
Rate of advancement and the current boundaries of large models
Philippe Botteri (25:33):
Nice. And where do you think we are? One thing which was very impressive is when I look at the version of Avatar you were doing 18 months ago versus what you released six months ago, there was a huge difference. And when I start looking at some of the video of the V4 that's going to come out hopefully in the coming months, then there's another step function in improvement. Where do you think we are in the s curve today? Do you think that that's very steep acceleration that we're seeing is going to continue for another 2, 3, 4, 5 years, or when you think at some point it's going to be more incremental in terms of performance?
Victor Riparbelli (26:14):
The last big platform change really was LLMs and the idea of these very, very, very large models that are able to perform almost with almost superhuman capabilities. And I think there's still a lot to go on there. I think if you take something like image generation, I think we're starting to see where this technology caps out to some extent. So we take company like Mid Journey, absolutely fantastic company, really pushing the boundaries in terms of what you can do with image generation. And if you look at where that was even just 12 months ago to where it's today, it's now much more controllable, it's super high quality. You can operate, you do a lot of things with it. It's a lot of fundamental things. It's still really hard to edit these videos. It's still essentially kind of like stochastic machine. You give it a prompt and you hope that the right output is going to be there.
(27:06): You're not able to edit it in a very granular way. I think that's still what these models struggle with. It's a bit the same. We don't really know how to make text-driven, LLMs not hallucinate, for example. And for video, I think we're still a bit earlier here, you are seeing some companies doing cool things with large video models, but I think that the ceiling for this generation of technology will be that you probably can produce fairly realistic looking video, but it'll be very difficult to control it. It'll be very difficult to make it consistent. And that means it'll be great for creatives who're okay with prompting 50 times to get something that kind of works and do some fun editing around it, but it won't be a system that's super reliable, super controllable and will enable someone with a vision in their mind to turn that one-to-one into reality.
(27:55): So I think that's where large models are going to kind of cap out. But having been in this space for almost seven years, every one and a half years to two years, we do get one of these big step changes. Now the big question is how do we activate the controllable and reliable? And I'm sure we'll see platform shift there. I don't know if it's going to be, I kind of hope it's not going to be in the scale of what LLMs have done to the space, but I'm sure we're going to see some really interesting stuff. It is a general thing actually, which is interesting of running an AI company for a long period of time, is that you constantly have to balance exploitation versus exploration. So exploitation being, okay, we have these technologies, how do we make them incrementally better with what we saw and know to be true?
(28:40): It's less risky and it's more like engineering these models, fine tuning them, all the stuff that people are doing to make them better, but they'll still be constrained by the overall model to some extent. So that's that one thing. That's where you can provide most of the time, you can build features, you can build a product that's marginally better month by month for your customers. And then there's the other part of this, which is the exploration, which is kind of like, okay, what do we think is going to be the next actual really big step change that's going to make all these small incremental improvements that we've spent a year on doing, just make them disappear out. And it's really dangerous. It's really hard to navigate because I think if you get two platform changes wrong, you're basically dead. But if you have a product that stands still for too long because you're only doing the science part and the exploration, then you might end up with unhappy customers. So finding the balance between those two is really, really difficult. And I see if you look at the history of AI startups over the last six or seven years, a lot of them have died because they missed these platform changes and someone else came up with a product made by four people in the basement that's just significantly better than whatever came before.
Philippe Botteri (29:50):
But if you get them right, the potential is massive.
Victor’s platform change thesis and Synthesia’s approach to enterprise lead generation
Victor Riparbelli (29:53):
So my current thesis is if you get three platform changes in a row right, I think you get escape velocity more or less. It's going to be very, very difficult to compete with companies that have gotten it right three times in terms of the go-to market, it was very much looking at those users starting off, of course talking to them, driving them into our funnel. We didn't follow the usual playbook of we are great, we're targeting the enterprise, know these people have a big problem, let's do enterprise marketing. What we figured out was that we're an early stage startup, we don't have that much resource and we have a product that has this viral wow factor, jaw dropping kind of element to it. How do we harness that to build a great GTM engine? And what we figured out was the best way to drive enterprise leads was not by cold calling people because when you do that, you have to start, what's an AI video?
(30:45): Why would I do this? This is a deep fake, not sure my boss would like this. No, I don't want to talk to you anymore. What we did instead was we figured out that if we can just get enough eyeballs enough top funnel, then among all these people, some of them will just naturally know that they have this problem, they'll want to talk to us. So what we did, we went hard on TikTok of just getting people to get to Synthesia and make a free demo video. Once people made a free demo video, we know that on average they would share this with three other people. And a lot of those videos were just people making a funny video for their mom or their partner or their friend or something like that. Not because it was a business video, but all those people who did that, they also have real jobs. That's how we landed a lot of Fortune 100 companies, either because they saw some TikTok themselves, some of them, their kids would make an AI video that because they saw some TikTok, they would share it with their family. And so we built this on the front end, very consumerized sort of marketing engine, but that ultimately helped us get into the enterprise.
Philippe Botteri (31:46):
Yeah, it was very interesting to see, I mean, you have customers paying, as you said, $30 month and then you have on the other end customers paying six figure deals or high six figure deals per year. So that huge difference. So how do you manage to get the two bottom up and top down sales motion coexisting and be synergistic?
Victor Riparbelli (32:09):
Yeah, I think especially as you get to a certain scale, it does become more difficult to make those two motions work really well. But I think for us, we know that the fundamentally best way to sell from these, no matter who we're selling it to is to get it in front of people, have them make a video themselves. That could be just as I said before, a video that is more kind of a funer just to try out the product and instead of focusing on huge upfront sales and long sales cycles, it's much more about just get the product front of the potential customer and then show value to them. And then from there they'll upgrade.
Three key tips and takeaways for AI entrepreneurs
Philippe Botteri (32:45):
So I mean, we've talked about the technology, we've talked about the go-to-market. So now just wrapping this all together, if you add three pieces of advice for an entrepreneur starting an AI company, what would they be?
Victor Riparbelli (33:01):
I think one thing that's always good is look where the spotlight isn't yet. Now everybody's talking about AI, it's a hot space. Everybody wants to invest in AI and there's still lots of great companies to be started, but I think if you're starting to start foundational model company number 125, maybe that train has already left the platform and maybe you need to look at other things that are interesting right now in AI. I'm very interested in companies that just use AI on the backend to fulfill some other business needs much faster. For example, speaking to a company who figured out a way to make manga anime kind of cartoons. And what they figured out was that this is an industry that's dominated by very few companies in Japan because it's a super specific skill to actually be able to draw this kind of manga and it's a massive business.
(33:57): Like Naruto, which is the biggest business in Manga, I think does 12 billion of revenue every single year. What they figured out was that if we can train an AI system, just help us make these drawings and tell the stories, still human storytellers, but just in the backend actually making the images that makes up this cartoon that could be pretty huge. And they did that and they've now launched I think three or four different types of new ip and one of them is going really, really well. I think it's a great example of you don't have to tell anyone you're an AI company that's just taking the technology and exploiting it through something you couldn't do before.
(34:29): So that'd be my one thing. And then I think my second thing would be if you're building an AI product, the kind of first thought idea is rarely the good ones to go for. So when chatGPT came out, everyone was like, let's build. I'm going to build a company that does a customer service via an LLM, for example. It's like the most obvious idea for most business people as soon as they try chat. What if you could just chat with a customer support agent like this? And it is a good idea. There's just a bunch of incumbents who already have the entire platform that sits around delivering customer service via like a chat interface, and they're going to do this as well, especially with this type of technology that's really easy to work with. And so try and look for areas that is much less obvious.
(35:10): And I often think a good way of thinking of this is what would be really hard for an incumbent to just build as a feature in three or six months. And that often is very much around building products that weren't really possible to do before. A loom, for example, is a tool that a human being can use to cut down things. The human being could have cut down things without a loom. Also, the loom just makes it faster and much easier to do. Whereas something like a crane, like a fundamentally new thing, no one human could ever lift a concrete pillar and build the skyscraper by itself that provides fundamentally something new that a human can do. I think that's a good framework for thinking about AI products. It needs to be kind of built AI first, not just an existing problem, but a little bit of AI and so on.
Rights, regulation and harm mitigation in generative AI
Philippe Botteri (35:57):
So I mean now we're at a point where nearly at a point where the technology is going to be so realistic that you won't be able to tell the difference between a synthetic video and a real video. So what are the implications in your minds in terms of regulation? I mean there's been a lot of efforts in the US, but also in the EU with the AI act that is being work right now. Where do you think regulation should go?
Victor Riparbelli (36:37):
Well, I think first and foremost, I think the reason you want to have regulation is because these technologies can definitely create harm. I think there's no doubt about that. That's something that's always been incredibly important to us. We found the company on an ethical framework around consent, control and collaboration, and we do a lot of work to make sure that our technology isn't misused. We want to create the right environment where we can still harness all the benefits of all these technologies, but also reduce the hands. And I'm not legislators, I don't think I have the exact framework now, but I think there's a few concepts that I think is good to follow. I think the first one is that I think it's very difficult to be prescriptive in the regulation. So being very granular and very detailed on it, because this is a space that's evolving so quickly.
(37:23): We've always seen in the EU AI Act, for example, how they spent many years and putting together the first draft of the AI Act and basically right before they were about to put it into action, the generative AI moment happened. And then a lot of those very prescriptive, which almost went down to describing these types of algorithms should be regulated in these different types of ways. All of a sudden they wouldn't really cover a lot of stuff around genAI. So I think a smart approach is to take a little bit on the back foot to being very prescriptive. I think that's actually something the UK is doing pretty well, but that doesn't mean of course, that you shouldn't do anything. I think there should be a general responsibility for companies to ensure that technology isn't used for harm. I think, again, being very prescriptive around that is very difficult because every company is very different and nobody knows how the space is going to play out.
(38:16):But I think governments should take the approach of putting responsibility companies to make sure that AI, that technology is safe. So that's one question I think around regulation. This would be a lot around some of the things people are discussing right now, of course is things like bias. It's things like how much should you kind of size these models that can output very general things to not put output harmful things. The other side of the discussion, which I think is very interesting, is around training data and copyright, which has been a hot topic. There's a lot of litigation going on in the US right now, and here there's sort of two camps. There is some of the AI companies who train very large models, especially around text for example, who have scraped a lot of data on the internet that they don't necessarily have permission to and manage to build these absolutely incredible systems.
(39:08): And then you have another hand rights holders who feel like their content has been kind of stolen. It's been used to train these AI algorithms and these AI models. They have these two camps for people. And then the question is, do you regulate what you can train the models on? So the input to the models. So this would be, for example, saying you're not allowed to train on any copywriting materials. You can only train models on content that you have explicit permission to use. Or you can say you should regulate the output of the models, which would be to say if your model can spit out copyrighted content like a Mickey Mouse cartoon for example, then we should regulate that. These models cannot spit that out, but it's okay that it's trained on Mickey Mouse consumers.
Philippe Botteri (39:50):
We've all read a lot of books when you were at school and university, but doesn't mean that everything we're doing today, which of course to some extent refer to all the learning we had in these fundamental years. To what extent are we using copywriting material in the whatever we do every day? I think this is the same thing for AI, which you can train your model on a lot of books, but the question is how do you control the output so that it is building on content but not plagiarism?
Victor Riparbelli (40:29):
Right, exactly. And I think that drawing that line is going to be really interesting. And that's a really hard question. I can kind of understand both sides of the discussion, but I think as you said, if the two of us, we sit down and we listen to 500 songs from a particular band or whatever, and then we write our own song, which is definitely inspired by our listening session of those 500 hours, is that then an infringement? How much residue of the original content does there need to be? And I think to subject that's already happening a lot today. You have lots of litigation around in music for example, you'll have a new puppet comes out, becomes really popular, then you'll have some would call them trolls who literally just sit and look through old catalogs of music and say That exact chord progression, I did that in 1987, I'm going to sue you because you stole my work and you're going to pay me 5% of all the royalties. And this also happens, of course, with visual content and many other things. So it's really interesting to see where all that lands up.
What Victor’s most excited about in AI today
Philippe Botteri (41:31):
Yeah, I think we'll need some regulation, but as you say, it needs to be done in a smart way, in a way that I think your point you're making about evolving technology and the legislation, being able to evolve and adapt to the evolution of technology, I think is a super important one. So if we leave the world of video for a moment and just focus on AI more broadly, what are the things that excites you most about AI today?
Victor Riparbelli (42:04):
We've taught these models that they have some kind of world understanding which translates into being able to produce almost any image you can technically think of just by as natural language prompt interface. It's just incredible what these models can spit out in any modality. So we've seen a glimpse of how capable these models are going to be. The thing I'm really excited about is the next generation thing will be about putting much more control structures into these models so that we can get them to do what we actually really want them to do and be kind of less random machine that you can try and notch it a bit to the left, a bit to the right, and then it misses the center of it. Because right now they are just, by nature, they're very general. They're kind of pretty random in the output.
(42:52): And I think the next generation of technology will be that we get much more granular control over what they output, and I think that's going to enable a whole new set of business use cases. I think from the tech side of things, I think this will mean we can hopefully get rid of most hallucinations if we can get rid of hallucinations. I think no one will disagree that the potential for LLMs is, I mean, it's just amazing in the enterprise, but it still has that roadblock to kind of go over. And if you think of something like image or video and there, if the we are in, it'll mean that you, I think can get to a point where you can use these tools to get what's actually in your mind out on the screen as opposed to you feeding kind of an idea. The AI takes it, interprets it, and puts out something that's kind of random, but in that same direction. And I think that's going to be really amazing once you get those controlled structure in place.
Philippe Botteri (43:44): Well, thank you very much, Victor. It was great to have you with us today and good luck to Synthesia.
Victor Riparbelli (43:51):
Thank you so much. Thanks for having me.