OpenAI Sora: A Closer Look!

2 місяці тому

❤️ Check out Lambda here and sign up for their GPU Cloud: lambdalabs.com/papers
📝 Sora:
openai.com/research/video-gen...
📝 My paper on latent space material synthesis is available here:
users.cg.tuwien.ac.at/zsolnai...
📝 The other material synthesis paper is available here:
users.cg.tuwien.ac.at/zsolnai...
📝 My paper on simulations that look almost like reality is available for free here:
rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
www.nature.com/articles/s4156...
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: / twominutepapers
Thumbnail background design: Felícia Zsolnai-Fehér - felicia.hu
Károly Zsolnai-Fehér's research works: cg.tuwien.ac.at/~zsolnai/
Twitter: / twominutepapers
#openai #sora

КОМЕНТАРІ: 400

@Dark_Brandon_2024 2 місяці тому

What a day to be alive! 🤯

@catakuri6678 2 місяці тому

i bet early humans said the same thing after discovering fire

@jr-fu6gj 2 місяці тому

American election cycles and tech. The non rich are doomed. #misinformation

@therealOXOC 2 місяці тому

@@catakuri6678 they said ehhhh ehhh ehhhh ehhh ehhhh ehhh ehhh ehhhh

@UtraVioletDreams 2 місяці тому

@Dark_Brandon_2024 "What a day to be alive! " A day as any day. BUT. Enjoy, be happy and positive! Be alive!

@danielvarga_p 2 місяці тому

EXACTLY!

@MerthanE 2 місяці тому

I remember writing 2 years ago that in the next 10 years we'll have video generation and text/script generation so good it could bring back cancelled series. Didn't expect the possibility to come so soon. There's barely anything holding it back anymore

@bat-amgalanbat-erdene2621 2 місяці тому

looks like we're on schedule. But probably this wouldn't happen due to copyright issues.

@randomworld-ik7tk 2 місяці тому

It would be personal playlists of series then

@lobabobloblaw 2 місяці тому

Oh, there’s quite a bit holding it back-and it’s all human, and it’s pretty greedy as well. Let’s hope those barriers are easier to work with.

@ArvilVonPoney 2 місяці тому

It's easy to get giddy in anticipation without seeing the limitations of new technology. As a reminder, this is meant to provoke hype as OpenAI is attempting to raise funding for related ventures. Don't believe me ? People have been talking about having personal humanoid robots next year for 15years below Boston Dynamics demo reels. Same people were amazed at having a fully audio interface to their phone via Siri not long ago. How about 3D screens so good we won't see they are here ? How about the VR revolution in the 80s ? We keep being fed an endless stream of hyped-up tech promises. It's cool that it exists, but so far "AI"'s impact on my world has been limited to shitty formulaic websites spamming Google results, bot spam in social media comments, indirection when contacting hotlines whenever I have a problem with a product, and clickbaity UKposts thumbnails. Not really world shattering.

@Earthwirm 2 місяці тому

@@randomworld-ik7tk like a black mirror episode

@thesilver7238 2 місяці тому

0:40 tram casually crushing a dozen pedestrians

@MilkGlue-xg5vj 2 місяці тому

"...And they all feel completely natural to me..."

@Mansin 2 місяці тому

@@MilkGlue-xg5vj caught on that too

@projo8460 2 місяці тому

hahahaha! They move away from the tram like the pedestrians from the ps1 game driver!

@richieqs7789 2 місяці тому

So we have an answer for the Trolley problem, the AI will just run over them

@14232319 2 місяці тому

@MilkGlue-xg5vj @Mansin @thesilver7238 Which frame was it? I was Looking at the 3rd one, and I was expecting the typical Ai out here Looking Like a scammer's poorly grammatically incorrect email(s). The noticeable, obvious stuff, and happen to notice what seemed to be a train flying through space.. Lol. "Felt completely natural".. Lol.

@kipchickensout 2 місяці тому

"Can you clean the dishes?" "Yeah just a moment, I'll clean them just two more papers down the line"

@elivegba8186 2 місяці тому

😂😂😂😂😂😂😂

@xlpizza007 Місяць тому

Figure 1. Aged like wine.

@devfromthefuture506 2 місяці тому

I follow Karoly since 2018 . And now I know that his favorite paper is that latent space paper. If he talks about an simple subject like an hamburger, he will find an way to include the latent space paper to talk about it

@fideys 2 місяці тому

You COULD use a latent space to make different burger combinations…

@watchingvideos9871 2 місяці тому

What paper

@futurehistory2110 2 місяці тому

It's occurred to me that history is repeating in many cases. Photography came along and felt new and strange in the latter 19th century. Then cinema arrived and would’ve shocked people in the 1900s/1910s. And now generative AI has emerged in the early 2020s and is shocking the current world. We're just a part of history really, aren't we? I think it's easy to forget that we're part of the same timeline, even if history feels like something completely separate, especially when going 100+ years back in time.

@dunzek943 2 місяці тому

This also means that people have to adapt because the world is always changing and you do live in history.

@Pindrop22 2 місяці тому

And it means that that users of this new technology needs to pay their dues to those that came before. No use of training data from existing works without fair compensation.

@gxrsky 2 місяці тому

🤓

@GwahirW 2 місяці тому

I imagine that you could create a storyboard of key images, with or without sora, and use sora to interpolate between them in a meaningful manner. A form of direction for longer videos.

@NeroDefogger 2 місяці тому

yeah, the same way we help language models like gpt with like step by step and mixture of experts and reminders and all of those helpful techniques, it is probably helpful for video models like sora to give it key images and use it to make the connections instead of the whole thing

@larion2336 2 місяці тому

@@NeroDefogger This will probably trivialize animation. Just make different poses of the inbetween points and AI handles all the movement in between.

@Exitof99 2 місяці тому

I'm studying AI (graduating in a couple months) at university and have been watching this channel for a while now. I often mention your videos and share them with my professors or other students. I shared your enthusiasm, what a time to be alive indeed!

@user-if1ly5sn5f 2 місяці тому

6:50 this is similar to how the human eyeball has a blind spot, but merges the differences of the many to cover the blind/fill it in. The same as how many humans and the differences share and flow the energy between to bring out something different like a movie or a new universe through a simulation that is a matrix of differences that reflect reality and can find things we haven’t uncovered yet.

@user-jx8qh4gy1f 2 місяці тому

I think most people don't realize the most incredible part of this technology. If it can make life like Video then it has the potential to make life like Virtual worlds. As in you can explore a completely life like Virtual worlds using a VR headset like the apple vision pro pro.

@FrostbitexP 2 місяці тому

Sure. But thats like thinking when path tracing was revealed, people would think we would all soon be playing in production quality path tracing video games (not the type you see in UE5)....Sure technically possible, but not exactly a quick transition, clearly. Takes home hardware several minutes the render 3D scenes. We'll see how much more efficient this tech can get. But im not holding my breath we'll see real time AI generation visuals that are super consistent at all time anytime soon.

@Wobbothe3rd 2 місяці тому

@@FrostbitexP we already have real time path traced virtual worlds right now

@Solizeus 2 місяці тому

Making special effects for movies will be so easy that literally a kid with no prior knowledge could do, it is amazing and i hope i can download and use this soon

@ive3336 2 місяці тому

Not really. Unless you understand fundamentally what makes a good image "good" then you will still be unable to take advantage of the tool. Artist who know what to look for will benefit from this but people who have no knowledge of the fundamentals wont get very far.

@faa9261 2 місяці тому

@@ive3336your assumption has some merit for now, but in the future, I think AI will figure all that out for any user with particular suggestions. Very exciting times.

@guilhermecampos8313 2 місяці тому

@@ive3336The person could just make the AI generate a bunch and then choose what look best.

@SonGoku-zr9nc 2 місяці тому

@@ive3336you won't need any knowledge that AI will be coupled with a language model AI just type in what you want and it will do the job for you

@n00dle10 2 місяці тому

"Hey Netflix, play a sci-fi movie set in the not so distant future, with a synth soundtrack, in the direction style of Darren Aronofsky, and I want the movie to run for 120minutes"

@neilhollands8448 2 місяці тому

Who needs netflix? The system will be tied into you bio-sensors and running local from your glasses. You will be able to tell it to wrap up the movie in 10 min because your mom just called you for dinner. No actor will be able to compete with a synth-actor trained on all the best performances ever recorded.

@bodethoms8014 2 місяці тому

We live in a simulation. We are just some prompts of some random guy living in his basement. Everyday I’m believing the simulation theory

@freshbakedclips4659 2 місяці тому

@@bodethoms8014 prove it

@Sick_Pencil 2 місяці тому

Why do you need netflix so it "tweaks" the algorithm to be more "progressive"? Just use open source.

@dibbidydoo4318 2 місяці тому

that would take a week to fully generate.

@mattgenaro 2 місяці тому

Your passion and teachings are inspiring! Thank you for the amazing content, always :)

@tomassilva1642 2 місяці тому

Thank you, Dr for going a bit in depth about how this technology works! We scholars love the details

@spysong1324 2 місяці тому

The real kicker is not that it creates videos but that it´s a huge step forward world understanding like physics and how objects in the real world interact with each other. That´s the holy grail of AI right there. Just like it "understands" fluids in a way that´s a whole subsection in graphics and mathematics without using them is mind blowing. It´s not perfect right now but it´s crazy we are seeing this, something that was unheard off like 2-3 years ago with experts saying we´re a half or even a full generation away from that.

@synesthesiaharmonics 2 місяці тому

Real-world physics simulator has wide-ranging implications that we've just begun to unravel! What a time to be alive!

@incription 2 місяці тому

Sora is not a "Real-world physics simulator", that was never claimed by OpenAI, and they even deny it stating cases where it fails

@mattmmilli8287 2 місяці тому

Won’t be long before it is though

@diridibindy5704 2 місяці тому

@@mattmmilli8287no. That is not what sora is and it will never be that lmao.

@CSPlayerDamon 2 місяці тому

If they somehow have a way to extract information relative to physics from the model, then yes. But I doubt training any model with videos / frames as input can result in any meaningful physics representation that can be extracted.

@technicalmaster4054 2 місяці тому

@@incription OpenAI's Sora technical report is literally titled "Video generation models as world simulators". Of course currently it's understanding of real world physics is limited but there is nothing stopping it from developing a highly detailed world model. They even showed that it's understanding just keeps improving with increased compute. It's just a matter of scaling.

@nanumpf 2 місяці тому

Impressive, what a time to be alive!

@bat-amgalanbat-erdene2621 2 місяці тому

I've seen other videos on SORA but this video gave me an important insight by comparing to GPT2. Now imagine how good this would be and the emerging properties it'll have when the GPT4 equivalent comes out. Probably could make a whole movie.

@hombacom 2 місяці тому

But GPT4 can’t create a great story so what exactly do you expect it’s filled with random things in a scene that not always make sense.

@Professorfungi 2 місяці тому

Theres a black mirror episode where they make "personalized TV shows" based on the viewer... it's crazy they take the data they collect on you and create a reality TV show and broadcast it to the world so you can literally watch people's lives played out by AI generated characters online

@ovum 2 місяці тому

Will it even be GPT? There's been whispers of Mamba, which takes the T(ransformers) out of GPT for being a different architecture

@davidvincent380 2 місяці тому

@@hombacom GPT4 is smart enough to create a story which makes completely sense but as you said, it will be a bland, basic, story. Simply because LLMs are inherently unable to plan.

@robstamm60 2 місяці тому

@@davidvincent380I wouldn't say that large language models are inherently unable to plan - just the most succesful ones are based on predicting the next token. But there is nothing stopping us from making llms that can plan, "think" and learn on theire own by running continuously.

@telebijeon3109 2 місяці тому

Never stopped to think about the fact that a model capable of creating convincing video would, of course, be able to make almost flawless images. When these systems hit the retail market, everything regarding marketing and concepting is going to change.

@hombacom 2 місяці тому

Convincing at first sight but impossible for it to understand the meaning of all details it adds or the human purpose of design. A combination of tech and human skills will give the best result.

@a64738 2 місяці тому

There is lot of models for AI image creating that can make photorealistic images from text input prompts. Video is MUCH harder...

@freezedTIM 2 місяці тому

After reading the paper, the paper doesn't refer to what the Unit of Compute is for the comparison of base compute, compute x4 and compute x16. I feel like that should be defined before we can really say how much compute is necessary to create a good video because if compute x1 is a unit that is beyond the level of anything that requires a cloud solution I think that there's quite a bit of work to be done before we can see Sora being used in a commercial or retail use case. Also another thing that needs to be revealed is how much training time was used and how long does it take for Sora to produce results like these. But it is indeed exciting to see the technology growing. I just hope that OpenAI would release a more detailed writeup soon.

@MikkoRantalainen 2 місяці тому

I would assume it's the number of iterations. Many AI generators work by taking fully random input and running full computation against it and that's called one iteration. When you take the input from the first iteration as the new base (instead of random input) you get second iteration. And 4x is the 4th iteration and 16x is the 16th iteration. Some new systems try to do 1-step generation but so far even those have seen improved image quality if you pass the output as input for the second iteration etc.

@ryans3979 2 місяці тому

Given how computationally costly text-to-image AI have been, and the fact that SORA is generating the entire video at once and apparently in 2K quality… I’m suspecting they didn’t mention it because it’s an ungodly amount of computation required, at the least requiring multiple TPUs.

@mash3d67 2 місяці тому

I would like to know many gpus used and how long it took. I have a strong suspicion this is something you will not be doing on your home PC. I have a feeling this will be a subscription based service due to the computing power it will take.

@acanofspam4347 2 місяці тому

@@ryans3979 yes at 2k 60FPS, but it can be downscaled to something like 720p 30fps much easier to run and still decent quality.

@soulsmith4787 2 місяці тому

I imagine it would be less demanding than cinematic quality CGI, which it's evidently able to generate in a matter of minutes. This is going to be a game changer for all manner of animation.

@JonnyCrackers 2 місяці тому

I'm excited to try this once it becomes available. I'm really interested to see how the new and improved image generator will work as well. Dall-E 3 is already really good so it should be interesting to see what it can do.

@Quinold 2 місяці тому

I find it hard to be optimistic about Sora Ai. It's an incredible AI where I can't even begin to fathom the hard work it took to make, but I can't see how it'll impact the world in a positive way.

@DonnieTheGuy 2 місяці тому

using video generators as image generators makes so much sense because pictures are nothing but moments of the non-stop video we call life. so getting an ai to do the same process as us andnot only the end result will definitely improve image generation specially the photorealistic results

@drlordbasil 2 місяці тому

Love Sora, Cant wait to add it to my autocoding program to make video tutorials too!

@pillblue2156 2 місяці тому

It’s already a superhuman level of visual intelligence in some aspect. It can do fairly reasonable, physically plausible predictions in this universe. Now I’m wondering how it would be after one more revolution. Sora feels like Alpha Go Lee. How it will looks like after it becomes like Alpha Go Zero. Entire film and game industry including XR will change forever.

@jenkem4464 2 місяці тому

Gaming is going to be amazing. We can just have the actual game developed quickly with very simple proxy stand-ins and the second layer or post-process will be an AI filter. You'll maybe be able to tune in real time to your hearts content or they can pre-train a few specific looks and you just select which one you want at runtime.

@hakimehamdouchi7468 2 місяці тому

As much as I love the work that goes on in this field. This is had gone on long enough

@andrewrice9383 2 місяці тому

That reverse mode could be really cool for innovative movie scenes

@smilefaxxe2557 2 місяці тому

Unbelievable 🤯🔥

@davidvincent380 2 місяці тому

It's such a huge step forward, it's like going directly from GPT-2 to GPT-4 ! o_O

@idcrafter-cgi 2 місяці тому

can't wait to be running such a model on a consumer GPU in the near future

@FrostbitexP 2 місяці тому

lol might not be that near future. Theres probably way more financial incentive to keep it cloud based for any company venturing in this. Along with having a way to control the outputs to make sure people arent producing extremely..."controversial" things with it. And double especially if theres no new method found to drastically lower the computational requirements, so then it ends up taking several years for consumer level pcs to have the power to run these (and wait 10 hours for a 10 seconds video lol)

@HasanArifEFAZ 2 місяці тому

1:53 man got six figures

@guilhermecampos8313 2 місяці тому

Yes, Sora still have problems with hands, but I have to admit that its much better

@consolechips 2 місяці тому

All I want is for Sora to make me a new season of the old red dwarf with the original characters!

@kenswireart88 2 місяці тому

The cloud man is awesome

@eliteextremophile8895 2 місяці тому

Can't wait to see the video about Stable Diffusion 3. "Two papers down the line" seems to be flying past every two weeks at this rate.

@21EC 2 місяці тому

3:32 - OMG, this is a new one for me with this lady with the glasses and I really thought at first that she's a real actual person in a video so got confused for a sec as to why would you show a real person like that :) so crazy that it can even still fool even those that already know Sora, goes to show how totally convincing already their videos can be (at least for a first glance at it)

@danielvarga_p 2 місяці тому

Yep, I was fooled by the Dog with the Keyboard :S I can not imagine what we will see the end of this year.

@21EC 2 місяці тому

@@danielvarga_p Yes, same, I also got fooled by the dog typing on keyboard video at first, it's just crazy, did you see the new video of sora of that turtle made of glass? it's insane how real it looks yet so bizarre since there are no glass - made turtles out there, the micro movements of the turtle are so spot on that it appears to be alive just like a real turtle, totally crazy.

@danielvarga_p 2 місяці тому

@@21EC Yes, I am not sure anymore, what we will experince in this year. It is getting insange. And we most likely know at least some of it, but imagine our Parents :S And a "regular" person. Also I am happy they took caution, they did not release like ChatGPT right away. Actually surprised a bit. It looks better for the "doomers" .

@phoenix3754 2 місяці тому

We're very close to being able to see actual memories and images from our mind.

@LucasFerreira-gx9yh 2 місяці тому

Regarding temporal coherence, it’s actually quite straightforward. The concept is that an image can be represented as a 2D grid, while a video combines a 2D grid with an additional time dimension, effectively creating a 3D grid. The AI is trained to generate a single 3D object, and what we see at any given moment are slices of that object.

@codersama 2 місяці тому

7:00 This picture in the SORA technical report was misleading because the denoising happens in the latent space, not in the pixel level

@danielmonge2318 2 місяці тому

What a life to be a time!

@Varun-iz2pj 2 місяці тому

its learning physics by looking at videos, thats insane!!!

@CSPlayerDamon 2 місяці тому

It does not learn physics. It creates a video relative to a text prompt, and it's trained with videos that have real physical interaction. I think the video is kind of misleading, but oh well.

@Vipce 2 місяці тому

@@CSPlayerDamonfacts. It’s impressive but it’s not alive.

@shakyricLshadow1111 2 місяці тому

I think learning physics is an appropriate way of putting it. A better way is to say that it is “modeling” physics… in other words, like humans it has developed a model of what makes sense to happen based on what its seen before. Transformers models seem to be able to “realize” certain things based on what they learn. However i doubt it can apply mathematical equations to what it simulates that are accurate, but perhaps one day… no very soon it will be doing just that.

@CSPlayerDamon 2 місяці тому

@@shakyricLshadow1111 That doesn't make sense. How can any model trained to predict the next best pixel have anything to do with understanding the world. It just finds the best pixel pattern relative to the prompt given.

@waynesabourin4825 2 місяці тому

Very impressive, they took an already proven solution for still images and tweaked it by adding another dimension. I can see potential for this being used in many other fields not just creating videos. Perhaps one concern will be compute power as it scales up?

@wormjuice7772 2 місяці тому

I want SoreTube, showing content creators from a alien race somewhere in the universe. 24/7

@ThomasBoard8 2 місяці тому

I was listening to Prof Stefano Ermon at Stanford talk about Sora, and was blown away by the idea that a diffusion based transformer model could 'understand' physics without a physics simulation model of the type used in game development (eg Unreal Engine). A recent article on Semafor revealed that there is a startup company that using the same type of modelling to predict the weather, and does it better than NASA, the military and any current weather prediction model... and has achieved this trick by associating real-world weather balloon data to previous word-based weather predictions. Again, no physics model required! This weather forecasting system is now being adopted as it only needs training once, and then can run very efficiently - more so than traditional weather system models that try to simulate the actual physics, and require enormous compute power. It leaves me wondering how far this approach can go... I still don't believe it'll arrive at any sort of real world agency, let alone consciousness. But tools like these will not need human qualities to change the world. As amazing as it is shocking: the pace of development is way faster than society is ready for.

@mosamaster 2 місяці тому

The videos generated are stunning visually, but it looks like watching in slow rewind mode ! 😵‍💫 which makes it land close to the uncanny Valley 😵‍💫

@Jandodev 2 місяці тому

What a time :)

@nikluz3807 2 місяці тому

The wildest thing to me is how it interprets physics

@nnhovogliadiscri 2 місяці тому

Sora's videos seems like dreams...

@nunobartolo2908 2 місяці тому

The fact it wasn’t open for use speaks volumes of how insane the cost must be even just inference

@zhandanning8503 2 місяці тому

Is the reason that the latent space is used, that the latent space is closer to the output, so one can fully utilize the distribution of (latent given output). Rather than (input given output) as the latter is very far. Then translate to both modalities, the latnet can be editted to perform "better" ?

@bernob9770 2 місяці тому

Very cool

@BlingBlingBandido 2 місяці тому

Do we know the amount of compute power used for generating these videos?

@RealityRogue 2 місяці тому

It’s like what it’s like seeing into a dream

@GierlangBhaktiPutra 2 місяці тому

Possible best use case is using Sora to create movie company intros

@nickoutram6939 2 місяці тому

That video from 6 years ago is a shocker, its so far away from where we ended up now that I am almost frightened to think where we might be by 2030... I mean all this has to be extended to 3 dimensional space (AR/VR) in real time with real interactable characters on demand so basically you are looking at The Holodeck...

@lynwoodjones 2 місяці тому

🤘

@Ulariumus 2 місяці тому

2:00 is that a pizza tower?

@geirmyrvagnes8718 2 місяці тому

A leaning one.

@smartalex22 2 місяці тому

So when will this be available to the public? Because I want to use it! I've got stories to tell, Series to write!

@TheAero 2 місяці тому

When Nvidia releases their 100GB VRAM TPUs we will an even bigger jump in speed of progress of AI development.

@gr8b8m85 2 місяці тому

That's at most 6 years away, and when you understand that this curve is exponential...

@TheAero 2 місяці тому

@@gr8b8m85 the tpu game will explode in the next two years. Competition drives innovation. The big tpus will launch lot sooner than 6 years. Finally the models build with more require less compute to train and use, so even with the same hardware training is lot faster for the same data. Finally companies wont rely on more and more data as much ss quality data. We will be seeing companies updating models per month soon. In 2 years tops the rate will be weekly in terms of retraining

@TheAero 2 місяці тому

The mixture of expert models I wanted to say**

@IAmGeeeWiz 2 місяці тому

That technique that gpt2 used to read multiple reviews, could we use that technique to have it read papers on computing and have it theorise improvements? Would it be viable?

@Certago 2 місяці тому

Imagine having this turn your favourite book into a movie... With a few twists just for you... And whoever you want to see in the lead role...

@gogokowai 2 місяці тому

This guy really loves explaining latent space

@bobbyc1120 2 місяці тому

0:38 I would not consider a magical sky tram coming down onto a piece of land supported by the Golden Gate Bridge to be "completely natural".

@lucaspayne2546 2 місяці тому

We are getting much much closer to really implementing The Entertainment :D

@bat-amgalanbat-erdene2621 2 місяці тому

is that like a personalized video generating machine that's find and show the most entertaining to us at any given moment?

@lucaspayne2546 2 місяці тому

@@bat-amgalanbat-erdene2621 The book Infinite Jest, set in an alternative 2010s (I think), has a major plot point being The Entertainment, a film you can't look away from. In the book it is not generated for the viewer, though.

@catakuri6678 2 місяці тому

@@bat-amgalanbat-erdene2621 i might be wrong, but i think they mean Games that use AI to calculate it's logic/physics in real time

@user-yd6mp6vw2c 2 місяці тому

You are wrong this time. SORA takes patches of temporal and spatial data from latent space, not pixel space. Love your channel always. Thx

@AlyphRat 2 місяці тому

Hold on, if you can generate a video out of a still image, does that imply that you can generate another 1 minute video out of the last frame of the previously AI generated video? If yes, it's possible to extend the video exponentially and create scenes for a TV show with this trick.

@bat-amgalanbat-erdene2621 2 місяці тому

technically i think you're right but the temporal coherence would not be long enough to remember what happened to the main characters in the last episode so it'll be some kinda random video without a coherent story. Just my thoughts

@MikkoRantalainen 2 місяці тому

@@bat-amgalanbat-erdene2621 I was thinking the exact same thing. For temporal coherence the input should be e.g. last 2 seconds of the previous video which causes less final output for the computations that emit longer video but should be able to avoid issues with temporal coherence.

@chadirby6216 2 місяці тому

That was my thought - the amount of options we're going to have soon will be insane.

@peripheralarbor 2 місяці тому

@@MikkoRantalainen better to do the whole film in overview at 1fps or less and then use those as key frames for the final pass.

@MikkoRantalainen 2 місяці тому

@@peripheralarbor Sure, if the AI network can support such method. Most AI networks have to be trained for a specific workflow and if you want all the frames even in the end, you cannot output just 1 fps and have less computing requirements for the draft version.

@kellymoses8566 2 місяці тому

The patch encoding seems like it could be used as a video compression codec

@spinninglink 2 місяці тому

i wanna try it with old pictures ive taken!

@pandoraeeris7860 2 місяці тому

Sora! Sora! Sora!

@jay_sensz 2 місяці тому

That's all very cool, but I'm way more excited for Stable Diffusion 3

@ZenBen_the_Elder Місяць тому

Clip# 2 [ 2:45-4:24 ] SORA is denominated in 'patches' rather than tokens

@e.v.k.3632 2 місяці тому

2025 will be the year of robots That can do all we want

@hmazzuchetti 2 місяці тому

day after day I believe more and more in the theory of simulation reality haha

@loubro4809 2 місяці тому

imagine when AI will create reels on Instagram that would increase, thou screen time to sinister level🤯😵‍💫 it would never let your eye off the screen

@kenswireart88 2 місяці тому

In the next 20 years computers will be able to make food appear from prompts that we can eat. When I was a child I used to hear stories that mankind long ago could raise their hands to the cloud and bring food out of it. Sounds similar.

@mygirldarby 2 місяці тому

I was watching a lecture not long ago and the guy said iPhones will grow on trees eventually. He said there's no reason why AI couldn't figure out exactly the materials needed to "grow" an iPhone from a genetically modified tree.

@kenswireart88 2 місяці тому

@@mygirldarby what's possible is even more fantastical than we can ever imagine.

@user-ef5hd6ss5g 2 місяці тому

Imagine this with sound generation ... Only a few months from now

@MickeRamone 2 місяці тому

My prediction is we will have simulated computer games from prompts within....3 years, and they will look and play better than current games today, and they will be catered to our own specific preferences....wow, what a time to be alive!

@hombacom 2 місяці тому

The hardware will still limit what is possible in realtime compared to generated ai clips that take several time and resources in the cloud.

@mygirldarby 2 місяці тому

@hombacom yes, and AI is also limited by robotics advancement. When I think of how far we seem to be from advanced robots capable of doing physically what AI can do "computationally" it seems world apart. Maybe AI will be required to advance our robotics. AI will have to build a physical body to "embody" itself.

@gr8b8m85 2 місяці тому

@@hombacom It's not just hardware, it's architecture. There's no reason why a sufficiently capable general AI like ChatGPT/Gemini couldn't recursively improve its own efficiency or the efficiency of other models once "good enough" reasoning is achieved.

@hombacom 2 місяці тому

@@gr8b8m85 "recursively improve" What does that even mean.. ai compute in the cloud is expensive. LLMs are trained and fine tuned on data by humans, they don't improve by themself and hallucinate easily. Hard work to go make progress..

@hombacom 2 місяці тому

@@mygirldarby Real tests are needed in real world and easier said than done to make a machine that does it though.

@nanow1990 2 місяці тому

can you imagine this with in-game engine overlayed with polygons so that you can actually control the game?

@hytalegermany1095 2 місяці тому

Is it also possible to enter a start and ending frame?

@kinngrimm 2 місяці тому

"you can count on me, being here" Seeing the speed of this development, i am not too sure about that. Either because it all goes wrong or maybe you have your personal agent doing these vids for you then.

@ayron-x5823 2 місяці тому

6:35 es de la era del hielo 1 esa escena de los dodos

@Anthro 2 місяці тому

Just imagine real-time AI video gen but you can control a character like you would a third/first-person game. Games would never be the same

@fishygaming793 2 місяці тому

Ye, make the AI react to controls like WASD and mouse, that d'be sick!

@fishygaming793 2 місяці тому

We might even be able to make real time videos within 2 years!

@hypersonicmonkeybrains3418 2 місяці тому

Cant see that happening until they somehow get rid of the element of randomness. Game developers like to have full artistic control over what you are going to see in the game.

@Fermion. 2 місяці тому

Perhaps in the future, we'll design the games and AI will play them. I can see tournaments with people trying to design the most difficult level for AI to beat.

@OnlineSarcasmFails 2 місяці тому

@@Fermion. I don't think we can design anything the AI cannot beat without making it literally impossible. It has pixel perfect inputs. Unless you make it pilot a robot and use a different method of control, mouse and keyboard or game controller.

@jz-xq4vx 2 місяці тому

we need to democratize this tech more; while acknowledging the potential for misuse of powerful technologies like SORA, it's important that these technologies remain accessible to well-intentioned individuals. This approach would allow for positive use and innovation, akin to how regulated substances like alcohol are managed, balancing the risks and benefits.

@ScriptureFirst 2 місяці тому

Countdown to full dynamical 3d model output: 3… 2… 1….

@projectarduino2295 2 місяці тому

AI learning something completely unrelated to its task because it is something that happens to help out performing that task? Where could this possibly go wrong? All jokes aside, what a time to be alive.

@Waffles_Syrup 2 місяці тому

can sora be used to finally make a non-janky looking higher framerate video from lower framerates?

@alanverduzco6513 2 місяці тому

One day, we will see conventional computing like we see black and white tv's

@jasonhemphill8525 2 місяці тому

Neural rendering is the future

@keklord5817 2 місяці тому

When is it coming out anyway?

@milesgrooms7343 2 місяці тому

So, could I eventually input my favorite novel, give the AI vague or extremely vivid details of cinematography, character development, musical scores and compositions etc, etc, forgive me I do not have the film vocabulary….but you could request it to be directed and produced in different styles related to eras and actual director and producers styles in an almost infinite array and hit the “play” button over and over again until “you” create “your” masterpiece!?! How far away are we from that? Just a regular ole Joe like myself creating theatrical masterpieces??

@balbalofficial8212 2 місяці тому

Well that would mean that you won't need to go through the hassle of having rich enough to make the production.

@gr8b8m85 2 місяці тому

Take "expert" predictions (2050, 26 years) and divide it by 8 to 10. That's the nature of exponential progress. Not even the smartest people on the planet were seeing the timeline correctly.

@ethzero 2 місяці тому

Martin Scorsese: Marvel is not cinema Martin Scorsese: Sora... *sigh* 🤷‍♂

@devxsadik 2 місяці тому

The more computation you do the better, time to accelerate quantum computing technology advancement

@nakazaki1254 2 місяці тому

im really curious about how much compute power it requires to generate these videos. i have a feeling that it requires a super computer otherwise they would have mentioned how little it takes to generate.

@klaarnou 2 місяці тому

Jezus if AI progress continue any further this guy will cream his pants.

@BryAlrighty 2 місяці тому

I feel like we're slowly figuring out how to create the matrix lmao..

@digitalmarketing1727 2 місяці тому

I work with Local Businesses & a big pain point is getting images. If only I could put an image of one of my tradesman within a video ….

@Magnum_opusAI 2 місяці тому

🎯 Key Takeaways for quick navigation: 00:00 *🚀 Sora, OpenAI's text to video AI, can create high-quality videos from text prompts and extend still images into video, both forwards and backwards.* 00:24 *🔄 It can also generate several natural-feeling scenarios leading to a prescribed video ending.* 00:50 *🌟 As a researcher in light transport simulation, the speaker expresses delight in Sora's long-term coherence and the quality of its reflections and refractions.* 01:19 *📈 Sora can produce videos up to 60 seconds long, showcasing advanced details like dust and grease marks.* 01:45 *🎬 Sora is capable of synthesizing a complete scene in one go, without needing to splice various parts together.* 02:06 *∞ One impressive feature is the ability to create infinitely looping videos, alongside limited physics simulations.* 02:35 *🎨 It can generate highly detailed still images at a resolution of 2048x2048, surpassing the capabilities of DALL-E 3.* 03:26 *🧠 Unlike large language models, Sora deals with visual content in a unique way through 'patches', allowing for effective video generation.* 04:21 *🌐 Sora operates within a 'latent space' that allows for the generation of new, similar materials or visuals by navigating through this space.* 05:11 *🤖 By understanding the structure of English language and sentiment detection, Sora can accurately extend images into coherent and physically accurate simulations.* 06:09 *🌌 A diffusion-based transformer model Sora to avoid flickering effects in videos by refining multiple 'bunches' of noise simultaneously, considering the entirety of each image in a sequence.* 07:26 *💡 The rapid advancements in AI video generation exemplify the power of human ingenuity and computational research, highlighting the potential for future breakthroughs.* Made with HARPA AI

@Exitof99 2 місяці тому

The real question I have is where is that lightning bolt coming out of the angry cloud guy? Looks painful.

@Kneephry 2 місяці тому

Another channel said that it would take 24 hours to produce 60 seconds of video. It's still a remarkable accomplishment but if it takes that long to create video there will be a lot of obstacles to using it for replacing current video production.