Animator vs The Machine
With the explosion of AI, looking at what people in the animation industry think might occur in the future. Then talking to experts in artificial intelligence to help ground what this new technology is. Skynet or a tool? The downfall of animation or is this a possible new renaissance?
Animator vs The Machine
Holy McEvoy! Let's Take A Look Under The Hood with Josh McEvoy.
Get ready to unlock the enigma of stable diffusion and AI models as we sit down with animator and AI enthusiast, Josh McEvoy. This episode we take a deep dive at how exactly generative AI works with convolutions, autoenconders and UNETs and more technical terms. With Josh's expert insights, we peel back the curtain to figure out how it works and how it can work for the animation industry.
Have you ever wondered about AI's transformative impacts on animation? We delve into exactly that - discussing the shifting roles of background artists and painters, the potential disruption to the entire animation workflow, and AI's influence on 2D animation software like Harmony and Moho. Our conversation covers the tools and libraries crucial to this area, and what he believes artists should learn if they are interested.
This eye-opening episode is a must-listen for anyone with even a fleeting interest in the convergence of art and AI. So, prepare for a deep dive into the compelling world of AI in animation with us and our guest, Josh McEvoy.
Alright, on today's episode we have a very special guest. He's an old classmate of mine and who now lives in Toronto and works at Guru, or used to work at Guru's studio. His name is Josh McAvoy. Thanks, man, hi. Thanks for having me, alright. So no problem, buddy, tell the people a little bit about yourself.
Speaker 1:I've basically been an animator for nine years in the industry. I've been mostly working out of Toronto, mostly at Guru's studios. I've had an interest in AI since about 2017-18. I've kind of just recently, on the last year, I've kind of taken a new shot with all the tools that are coming in All the new revolution of AI.
Speaker 2:that's happened recently. Alright. So to explain to the less techno savvy people out there who are fearful of AI, how does it work? What is diffusion?
Speaker 1:to begin with, Okay, so I have included some images, so maybe I can just step through, I'll magically put them in.
Speaker 1:I'm just going to try and go through them in order of what is I think would be easiest to teach. The first one is convolution. So each of these three colors here are RGB the three values of images and what's happening here is something called convolution. So you can see that there's a window that moves across the image and then it takes information and condenses it into a smaller form. So if you go to the next image, you can kind of see what the kernel is moving across the image and it's a blue on the blue, one on the left is the image and then the right is the output. So you can see that I think this is just doing addition here. This is just kind of a simple example of it, but you can see that it's adding. I believe it's multiplying the values with the thing and then getting an output. So, for example, the first one is 12. That's the output. So that's simple. I'm doing a simple animators version of AI.
Speaker 2:Perfect. That's what I want to hear.
Speaker 1:So in the third one, this is something called an audit over encoder, so right now it's also known as a VAE, a variational auto encoder. So you have, for the sake of diffusion or stable diffusion, you have your three inputs, rgb, and then it condenses it into it's kind of like a bottleneck or hourglass. So it will go through it and then it will has two parts the encoder, which encodes it into compressed up to 64 times compressed, and then the decoder which, for the sake of stable diffusion, brings it back to what it was. So it's basically compressing information and then recreating it back to its original form. And you might say what's the point of this? Why would I need an AI model that compresses information and then returns it back to its original form?
Speaker 1:For example, one good reason is just sending information.
Speaker 1:Like if I wanted to send information, it's a smaller form, a smaller size in the latent space representation, so I could send you the latent space representation and then you could decode it with the same decoder and then you'd have the full image. But for stable diffusion, the reason they do this is because it's easier. If you compute it in what's called the pixel space, it would be very compute intensive. But when you compute in the latent space it's 64 times less computation. So what they do is they compress the image into what's called latents and then they operate in the latent space and then when you're done operating on the latents like when you have your image it's decoded through the autoencoder back to your output image. Or like if I was generating an image, for example, I wouldn't have used the encoder at all and I'd just start with noise in the latent space and then it would denoise that noise and then it would run that noise through the decoder and then I'd have my output image, my beautiful piece of art, all right.
Speaker 2:So that was a lot to take in, so let's simplify it in one sentence, if you could. So you take an image it gets broken down, pixelated and then when it gets unfocused, and then when it gets refocused, it's to whatever you want it to be. Is that more or less?
Speaker 1:Yeah, basically, there's three parts of stable diffusion and this is the first one.
Speaker 1:The first one is the VAE or the Variational Autoencoder. These are the three parts of the model that you'll be using, basically as an artist. So the VAE is basically to compress images into latents or to decode latents back into images. So think of it as, yeah, that's the first part. Okay, so the next one is called the UNET.
Speaker 1:So inside the latent space, representation of the autoencoder is a little neural network that is shaped as a U. So it's kind of the same thing as the autoencoder, except imagine, this is like the hourglass shape, and then we pull that down and then we add what are called skip connections. So basically it's a U-shaped network. That's how we visualize it. And then at the beginning of the input it also connects to the output and then it goes it down, sizes it with convolution again, and so you can see the first image is 64 by 64. Oh sorry, no, it's 572 by 572 pixels, for example. And then it runs it, it does it convolution, and then it makes the image smaller and then so as it travels through the UNET, it kind of goes down and then it goes back up with the skip connections. So the UNET is basically what generates the images. So when you would be generating an image with stable diffusion, you'd be using what's called the checkpoint file, and that would be this. Okay, so go for it.
Speaker 2:Oh, go ahead, Sorry. No, go for it. Go for it. You had a question. I was just going to say is most of the AI that's out there that uses art and like stable diffusion or give?
Speaker 1:me a name of the other ones. I'm just blanking right now.
Speaker 2:They all use the similar process, yeah.
Speaker 1:Yeah, this is well. I actually can't say for sure because I'm not sure how the other ones work, but this is how latent diffusion models work.
Speaker 2:Okay.
Speaker 1:So we're doing diffusion in the latents, which is what we just learned in what happens in the autoencoder. Okay, so basically, this is the same thing basically as autoencoder. Okay, no-transcript in the simple sense, but it has skip connections and what the skip connections do. A paper just came out called Free U, where it just learned that what I learned, at least through it reading it, was that the skip connections are more for they learn more details, more like textures or higher frequency details of an image, and then the actual U net itself learns more of the 3D representation of things. So like, for example, if I was generating a face, it would have like the 3D structure of a face. It would be learned through the U net, and then the skip connections would be more for the high details like the, for example, freckles or something.
Speaker 2:So could the U net understand a hand like the structure of a hand, then that seems to be the issue. Yeah, I think they can't do hands.
Speaker 1:I think the main problem with that is I think they need better data sets for hands, Cause I don't think there are many data sets of hands, especially hands that are holding stuff or doing things that hands do, Like most of the data sets are maybe just like a flat hand right.
Speaker 2:Or like they're probably medical yeah.
Speaker 1:So there could probably be better data sets for hands. I know some people who have fine tuned the network of their own like images yeah, of their own like data sets and they've showed quite good results. So it's not, it's not a, it's not. I wouldn't say it's like a any for any reason, like a show stopper. It's possible, I think.
Speaker 2:Yeah, it's not the AI's Achilles heel here, yeah, yeah, it's not like it's not the reason to drop the whole thing. I'd say All right, so then, what's that? What's part three? Then, if you're good, we did part one, part two, so this is part two Was that, yeah.
Speaker 1:Part three is something called clip. So let's see here I don't actually have any images for this, maybe, maybe, maybe. If you go to image six you can see the whole. So six, diffusion, inference. So basically these are the three things here the blue is clip, the yellow is diffusion and then the green is the VAE.
Speaker 1:So when I'm running what's called inference or when I'm generating images, I input text into a prompt, for example.
Speaker 1:So here it says aircraft are performing an air show, and then clip is something that was created by open AI, and stable diffusion uses clip to take a text and give it what's called a text embedding, and it has a position in a theoretical space, a positional encoding, so like things that are similar to each other could be close to each other in that space, for example. So the text encoder embeds the prompt into text and then it conditions the. So also, you see, at the top there says a noisy image. So that's how all images with latent diffusion models, such as stable diffusion or mid-juni they start with complete noise and then the diffusion model runs that noise over and over again in steps and removes the noise from the image that doesn't exist yet but that you've prompted on the input, and then, once you've done a certain amount of steps which you can, and there's different things called samplers which do different. It's basically math that I probably wouldn't be able to describe, but I think it's differential equations, but-.
Speaker 2:Okay, sure, I'll take your word for it.
Speaker 1:Yeah, yeah, so so it does a bunch of time steps and it just diffuses the noise in steps and then, once you've done a certain amount of steps, you run that noise, the latents, through the decoder and you get your image, which is the air show. So that's basically how it works. One thing I want, that is a good visualization, I think for people is if you go to 3C, this is called gradient descent. So this is an example of how neural networks, a visual example of how neural networks learn. So the parameters for neural network not parameters, but the way which a network learns is by something called gradient descent.
Speaker 1:So there's something called a loss function or some objective that the network has to learn. So, for example, with diffusion, it learns how to remove noise from an image. So when your training a diffusion model, you have an image that you're training on and it adds noise to it in steps and it learns how to reverse that process, and so it's called stochastic gradient descent. So you'll have, if you see the top at the point with the black dot, there it moves in steps as it trains and it's trying to get as low as possible on the loss surface and in theory, if your loss is low, you want. It's kind of like golf. You want the lowest score possible. So it will do it in steps as well when you're training. So it will take. It will take, you'll feed it information and then it will calculate the gradient or the slope of the loss curve that it's on and then it will take a step in the position that's down. That's a very simple definition of it. So when you're just imagine this when you're training something, you're trying to move the tune, the weights of the network so that the output is it, for the loss function is as low as possible. And if you go to 3D you can see.
Speaker 1:This is an example of what the loss surface will look like, with skip connections on the right and then without it on the left. So it kind of averages it out. If you remember in the unit how they had skip connections or the lines that connect the bridge between the on the U, it's a lot easier to find which way is down when there's not all this noise, for example. So it kind of averages out and smooths it out. So it can, you can learn, it can learn faster. And I think those that's the basics of it from the images I showed. There's some other stuff too, I can mention if it comes up.
Speaker 2:Sure, there you go, phantom. Let's just add a crash course for computer science.
Speaker 1:A visual course, a visual representation? Yeah.
Speaker 2:So, like I know, the main reason people fear it a fear generative AI is cause they're like, oh, it's gonna, it's gonna steal jobs, it's gonna steal like, especially the anime. They're like, oh, it's gonna start animating. Do you see this as an issue Cause, like, when I look at generative AI, I see it more like right, at least right now, I see it like almost like a filter.
Speaker 2:It takes a preexisting thing and maybe moves parts, but doesn't necessarily animate, right. It's just like almost like this, like underwater feel, like. Ooh, in your opinion, what would it take for AI to you know? Quote unquote like actually animate with the 12 principles in not just moving parts.
Speaker 1:I know that's super advanced, yeah, yeah, I'd say. I'd say, like the first thing that needs is data, which we have lots of, but it's not. It's not organized in the way that that could machine, like could learn from right now, like it has to be labeled and stuff. So I don't, I don't know if it will be happening anytime soon, but right now I feel like we have an incredible tool that we can use. It's all, it's all. I find it's just a tool that you can use to manifest your intent. Like, as an artist, I have an intent that I want to create something, and if I can use the tools to generate that, that's like that's show biz, baby.
Speaker 2:It's very true, you're not wrong, like there's always like you're always going to use the technology of the time to create things from like Disney, using rotoscoping, and like the forties to you know, the photocopying era of.
Speaker 1:Disney, where it's like, oh photocop let's save money. And the whole and the whole ink department just gets sent out the door. But but then the other jobs get created too. I know, I know it is scary, because a lot of people I feel like it just kind of came out of left field for them.
Speaker 2:Right. And very much so, because you're like, it was just like oh, I can take this in and I don't need a background artist anymore. Like what the fuck?
Speaker 1:Yeah, like, like. For me, my experience was I was interested in AI in 2018, but the image is like back then there was the state of the art, was something called like style again, which is just faces, and that's all the model could create was faces of people, right? And then just last year, I was on the internet and I started something. I saw like a bunch of beautiful pieces of art and that were created by something called stable diffusion and I was like, okay, maybe it's time to get back into this, because it's clearly changed a lot in the last couple of years.
Speaker 2:Yeah, like it definitely was like this. It's like, oh, it's like it's for hobbyists. And then, all of a sudden it was like oh, and now just spike. Now it's like we're affecting everything and everyone.
Speaker 1:Yeah, like I did.
Speaker 2:Sorry, go ahead. No, we're not doing this. Canadian standoff.
Speaker 1:I didn't actually. I forgot as I say, so you go.
Speaker 2:Oh no, I forgot to, but here let's address our notes oh. It doesn't matter, we're too kind, uh yeah. So yeah, like it was this thing that came out of left field and then People were just caught off guard, right, and I know that scares a lot of like, especially the people that it directly affects. Like, like I said, background artists, painters, Like those types of fields are like, those are the directly affected ones.
Speaker 2:But it's also affecting other aspects, like I did an episode where I posed as a a mysterious like as like a professional in AI and it was just me asking chat gpt questions and using a AI voiceover To act and then I just made myself look like I was in witness protection with like the black silhouette, so you can see who it was until the very end.
Speaker 1:But it's just like yeah, I like it.
Speaker 2:Yeah, the big reveal is just a sock puppet With like an AI written on top of it. I know.
Speaker 1:I got it's fun.
Speaker 2:But even then like like but like voiceovers are starting to be affected by it. Why?
Speaker 1:don't I talk to rob tankler.
Speaker 2:He was like yeah starna Affects some aspects of it, and that's primarily where the the strike came from.
Speaker 1:Yeah, I think one thing is it will. It will disrupt our entire pipeline eventually. But right now it's not disrupting everything equally, which makes it like if it disrupted everything equally, you could just use it and no, there wouldn't be any displacement. We just train everyone and then we just make more cartoons. But if, if AI can just do Backgrounds, for example, which I think it could do, production ready backgrounds right now, like that's a bunch of people who are background artists who suddenly have to compete over the one person who can do their all whole job as a team, uh, just, and then, and then the animators obviously can't like animation doesn't really work with it quite yet. So there's, they still have to animate it. So there there'd be like the bottleneck is now animation, I think.
Speaker 2:Yeah, for now. Yeah, because even there's even rigors. I'm like, yeah, maybe, but I'm like I don't even some aspects.
Speaker 1:Yeah, I don't even think that rigging will be a job in the future.
Speaker 2:I'm like that's what's gonna ask because, like my next question was like I've, I've seen, maybe not AI, but like Just smart tech, like, uh, like animation bot in Maya or you can download it and it will give you the code, like it will give you the in between Right, the property, you can tween it and you can do all these fancy things. But like that's a Jenner, that's a, that's a, that's a software that calculates Everything you need to do to simulate animation. So they have that for Maya and Blender. So my question is do you think something like that with AI will be in 2D software like Harmony or Moho?
Speaker 1:And what are the roadblocks?
Speaker 2:from that.
Speaker 1:Uh, I actually don't know if, if in the future, we'll use 3d geometry at all like that, that's how. Oh, yeah, this could be. Yeah, like, because why would you spend all this time modeling something or creating a 3d model something if you can just have an AI model that understands 3d representations or something and it can turn that around?
Speaker 2:Yeah. But at the same time it's like, why do people still 2d? Like why is like stew ghibli still draw by pencil?
Speaker 1:You know what I mean when you have paper, you have puppets yeah, it's that, but if it, if it's cheaper than it, I think it will sure.
Speaker 2:There, yeah, there always be like the cheaper option. That's always a but it's not also worse too.
Speaker 1:I think it would be better. No, it'd be. No, I'm not saying I'm not. I'm not saying it would be worse. I'm not saying it would be worse.
Speaker 2:Uh, I'm saying, If everyone like you're gonna have this, it's almost like the early 90s feel of anime 3d animation.
Speaker 1:Other than pixar, where everything started looking Like the side pixar where, yeah, like everything started looking the same here, like this weird Almost floaty until, like, they started iron out the cake like if you look at like first season of reboot and first season of beast wars.
Speaker 2:Mm-hmm the animation is not like super, super great, it's like a little floaty and you're like you bite. You give it a pass because you're like it's so good.
Speaker 1:You're like don't worry about it, yeah, or uh, but also back then it looked like real to me, yeah, yeah, like as a claw.
Speaker 2:There was nothing to compare it to right.
Speaker 1:Yeah, like yeah this is.
Speaker 2:This is like 3d space and like my dad was an electrical engineering, he's like I love reboot because I understand all the computer jokes. I'm like.
Speaker 1:I'm like shuffle man, let me watch my show. I'm not learning anything, dad.
Speaker 2:No. I don't care about the 0011 joke that's going on in there.
Speaker 1:Oh yeah, I don't know like I feel I feel my concern is that this is just happening so fast and it's progressing so fast that I don't know if the transition will be the same as it was from 2d to 3d.
Speaker 2:Right, that that's. That's what I think a lot of people are concerned about is, when I've talked to other people, they're like oh, is it like who Who've been in the business for like 30, 40 years?
Speaker 1:and I'm like is this similar to?
Speaker 2:like. Is this Panic and chaos similar to what it was like when 3d was introduced and when digital, like paper puppets, was introduced? Right? And a lot more like yeah, it's similar, like it feels the same, but it's not the same at the same time.
Speaker 1:It's like the momentum of it is it changes things? I think.
Speaker 2:Yeah, like it is progressing ridiculously quickly and it's like within.
Speaker 1:A month.
Speaker 2:It could be yeah go ahead.
Speaker 1:Uh, no, you finish it within a month. Yeah, within a month. It could be like a game we could be out of the yeah.
Speaker 2:Yeah, in a month, you know, from now you can be like, oh, and you know, it's totally changed. Now I can do hands, you do fucking everything. Oh, okay, cool.
Speaker 1:I think, honestly, the only thing that it needs to learn like, or we need to train it to do, is at least for animation is to uh, animate with temporal consistency, like it has to be able to maintain the image that it's, which I don't I don't think that we're that far away from, because, from from what I've learned, like we like it here, maybe it's a time to show those images that I sure go for it like, so okay. So right now there there's, um, something called motion modules, which are things you add on to stable diffusion, you kind of Sets, splice it into the model in it and it can, uh it can, have understanding of how things move. So here's three examples Uh, I didn't make these, these were made by a user called man Shutee, who, uh, yeah, and the, the. The motion module is available to download on a website called civet ai, which is great for downloading. I haven't even mentioned what lores. I like that, just, uh, that.
Speaker 2:There's a lot to explain this well as deep boys, let's keep going.
Speaker 1:Okay, so here, these, these are things that I think would be useful to explain just uh, as ways to get finer control and to, as an artist, to uh, to control the model and its output. So there's there's something called low rank adapters, or laura, short for short, and so basically, if you look at the, the images of the stuff, number seven is laura, so basically it takes, uh, so it takes, basically, think the in the orange things on the right, they're called Tensors and there's two low rank tensors, so it kind of, again, our glasses are a big big thing in the ai for some reason, so hourglass shapes, so, um, basically, you're, you're training it'll. It's called a lower rank Um, so it's a lower dimension, so that you, you don't have to train as uh many parameters. So, uh, here's the laura that I trained on, the actress jenna kohlman, so in the um, if you're familiar with dr who, she's an actress and doctor, who, um, so In the in the first one is the baseline images.
Speaker 1:So these are images of, actually of jenna kohlman, so there's three images here that are real pictures of her. And then, if you look in the second folder, this is stable diffusion XL's understanding of Jenna Coleman before I trained it so it doesn't quite look like it. And then in the third folder is images of her that I generated with my low-rank adapter, that I trained on stable diffusion XL, so you can do, for example, her in a drawn style with the witch image. That's not, it's more. You can, and you can have her wearing clothes like a nun outfit here.
Speaker 2:Yeah, except from, like the sound of music or something.
Speaker 1:No, it's just I did like an 80s horror movie. Oh, okay, Like the.
Speaker 1:Exorcist or something yeah something like that, that's, I can't remember what I typed exactly, but I was going for like a Brian De Palma kind of movie, yeah, and so you can train not just people, but you can train styles, you can train outfits, you can train, and basically it's each thing that you train is a file that you just add into the network. So I could say, have Jenna Coleman and then I train Laura for a red dress that I wanted to wear, and I can just kind of daisy chain these together and then I can have an output, for example, that I wanted. If, like, for example, I had a scene with a character that it doesn't have to be an actress, it could be like when I created, like a cartoon character that I made to, and then I just daisy chain all these together and then I could animate a shot with motion modules. And then there's also a thing called control nets, which are ways to control the output. So have you ever heard of these control nuts?
Speaker 1:Okay, so that one of the things I think will be one of the control nets I think will be most applicable to animation is called open pose. So it's basically a skeleton, a black image with a skeleton in each bone, like in 3D geometry is a different color. So it learns that if it says, for example, this bone is blue, so that is an arm in theory, and so it will kind of condition the latents on this image and it will make a character in that pose. And what you can do now is you can feed those a video of those poses. So I imagine that you could in theory animate with your body, so it's kind of like stop motion with your body, and then you could take the footage of you animating with your body and run it through open pose and get the poses like a sequence of poses and then you feed that into a control net trained for stable diffusion and then you could have, in theory I imagine, an animation.
Speaker 1:With Lord you could have it in your character. Yeah, I think there are examples of that. One great discord for anyone who's interested in is it's called Bandodocoai, bandodocoai.
Speaker 2:Yeah.
Speaker 1:It's in the link on the thing. From what I've seen, there's some pretty cutting edge work happening in real time. People are making videos and art and it looks fantastic. In my opinion, some of it looks bad, but some of it looks really bad, sure.
Speaker 2:It's learning, though right, you've got to get through all the shit before it's becoming good.
Speaker 1:Yeah, and it is just a bunch of people who are learning it at the same time, in parallel. So if you have any questions too, that's a great place to ask.
Speaker 2:Perfect. So you briefly mentioned pipelines. So what do you see? Ai being used in the pipeline. How would it work?
Speaker 1:What I think the pipeline of the future will be is. Let's say, for example, I wanted to make a cartoon with AI, I would have to first take a pre-trained model like stable diffusion. For diffusion models, I think that stable diffusion, excel and stable diffusion, like the previous versions of them they're the only ones I know that have open weights that allow you to train. You can train with some other ones, but you have to train through their website. You give them images and then they, but this is the only one where I can train the weights. I have control over the weights of the model and I can train them and make my own stuff that I have agency of. So I imagine I'd take stable diffusion and I'd train a checkpoint on or a lower rate adapter on my style and then I would take that checkpoint. I'd probably do what's called the dream booth, so it's like a. It's training a whole checkpoint on my style. So anything that I would make from that is in the style. So, for example, on Civit AI the website, you can get models that are based off of checkpoints of stable diffusion, but they're trained to be in a style like Dream Shaper or any of that's what it's called the model. So I would train my own model and then I would have train my Lora's on characters that I designed. So you could either generate them through stable diffusion or you could draw them yourself. You could take a picture or something and teach it. This is what I want.
Speaker 1:So there's really no. As long as you have an image of something, it doesn't really matter how you make it, as long as you have something to feed into the model to train it. And then I would train. For example, let's say I had a character like let's call him Bob, I would train it to learn what Bob is, and then I would have a Lora file that I can just feed into stable diffusion. So any shot that would have Bob in it I would splice in the lower rate adapter into the model. And if anything that I would need to train, that would have to be consistent, like if Bob had an outfit that he had to wear or something, then I'd train it on that. And then I would use what I said before about the control nets and stuff too, and also the motion modules also learn something. So if I said windy, I could have my hair blowing in the wind, so I'd be animating the higher level features of the animation and the motion module will do the other stuff like overlap and stuff like that.
Speaker 2:So then do you have control over? So say, you type in I want my hair windy, right. Do you have control over that, or does it just?
Speaker 1:generate it randomly, they're also Okay. So these are the You're slowly learning all the pieces that are just used for everything. So there's also low rate adapters for motion modules. So I could train a motion module on wind. So if I could, then I could tune Like I want more wind, I want less wind, so I could just tune a value between zero and one and then it would be extra windy or no wind at all, and these are all just things that have to be trained. That's the hard part is training all these. And then you can have a bunch of pieces that you can pick off the shelf, off or off the internet and compose them all into output or an enjoyer video, so. So that really brings down the cost of animation, like if, if you just have to download a bunch of things and piece them together.
Speaker 2:Yeah, I guess, like I Don't know it's high. I'm trying to like run some numbers in my head. I'm like okay. Zero, zero, yeah, I pay for like pay for people versus like computer space.
Speaker 1:So okay, and servers like okay, let's figure this out. Yeah, like you like the biggest thing is like GPUs, like that's what you need, yeah, is to do inference, and then a couple of, yeah, big space hard drives to store all these files.
Speaker 2:Yeah, it's almost like you're making a like a crypto farm, but just for animation.
Speaker 1:Yeah, that's what I'm so excited about this winter is I can heat my house by training. Yeah, yeah, just.
Speaker 2:Yeah, you're cozy, I think cozy soon by my computer. Okay, so then, but like the big question, that's like basically the big issue I've seen online.
Speaker 2:He's like everyone's calling this the, the gen AI right, whatever. It's kind of shitty name, but whatever there there's, they're upset, and rightfully so. That, like it's so, it's grabbing its sourcing from artists, work online without credit and just stealing styles that are very similar, mm-hmm. So what's a, what's a, what could be a solution to this is just like sourcing the artists like you would like, I like in a paper, or, or you give credit, like what? Or should there be regulations?
Speaker 1:I honestly don't think that it can be stopped. Like if, like we, you as an artist, you your intention is for people to see your art and, unless that's like in a room where no one can take photos or anything of it, it'll probably be online and then people can take that and then they can train an AI on that. Like I don't think there's any way you could stop it. You could like, you could, you could write, you could set laws and stuff, but at the end of the day, this is information and math, which is pretty hard to ban.
Speaker 2:So, if resistance is futile for the future, what skills do you think artists should learn, in your opinion?
Speaker 1:to Adapt and I already got a list, so the first thing is the first and foremost Python.
Speaker 1:It's. It's the language that AI is used to program. They like Python is the is the what is used to Program most things, but most stuff, it like the higher level stuff is used, used in language called CUDA, which is runs on the GPU, but but I don't have every program included. I just program in Python, so that that is the the first thing you have to learn. I'd say Not, yeah, not, that's one there's, most of them are computer things. The second one is it's a tool called count conda or and a conda or mini conda. They're basically ways to manage your Python environments. So if I there, when I'm have, if I have a Python script, they'll have. They'll have some dependencies, for example, a way to Up. I can. There's a like, let's say like.
Speaker 1:Pytorch is one. It's a learning, it's a library for basically machine learning and deep learning, and it was made by Facebook a while ago but it's since bridged up to its new thing. So that's, that's a dependency that's used heavily in Python. So you'd have to. It's like basically a box that you have all your dependencies in and it runs in its own sandbox. Basically that's what an icon does and you can pick your Python version stuff. One skill is definitely useful is learning how to use the command line on a computer. So like the text, the black text and the green yeah, black image, and they as I used to call it my dad.
Speaker 2:It was yeah, I was at a hacker window. You're like, oh yeah.
Speaker 1:So that's like that's mostly good just for using an icon and stuff there are. There are like front-ends, like Automatic 1111 if you've ever used it or comfy UI, that you. You basically start those in the command line, so and if you update them, you can update comfy UI Through through itself now, but you can update it through the command line too. So it's a very good skill to have, I'd say. One is get. So basically get is a way to update Repositories. So like, let's say, I wanted to download comfy UI, download it with get from, usually from a website called get hub, which is Website for sharing code with each other. So I can, for example, I can, download the get repository and then anytime I need to update it, if there's a new update that comes out, I can just say get, pull and then in the command line and then it will download all the updates. And so those are basically the main three, the. I guess it's four things. I said so for four things and then one. One thing that is like you can use Windows, but I would recommend Dipping your toe into Linux, just because that's what a lot of AI people use, like Linux, not Windows. I know. I know a lot of people in the art industry use max and, and max are actually Maybe better than Windows, I'd say. But as far as AI goes, but I think I'd say Linux is king. So you don't have to necessarily know how to use Linux In the future will probably just be the back end that runs Linux and then you'd be using Windows with comfy UI or something or something like that. But it would be helpful, you know, do what you want.
Speaker 1:Okay, so then the last four, four things that I think would be useful to learn as a but for Python library. So there's pie torch, which I mentioned before, which is for Python library, for tensors and deep learning and whatnot made by Facebook, and then there's three others that I find are useful. So these are all made by a Website called hugging face, which is the go-to place for downloading models and stuff like stable diffusion, and these. These are accelerate, diffusers and transformers, and all of these are basically ways like I can. I can with the Transformers.
Speaker 1:I can basically, in five lines of code, download an AI, run it on text and like, say, if I have, like a for sentiment analysis, if I have text and I want to say, is this positive or negative text. I can have in like five lines. I can have five lines of code. I can download a model from hugging face and then it'll give me an open say there's a 60% chance that this is bad or a good text like oh, interesting. So it really simplifies the amount of coding you have to do and like if you, if you learn these, I think you'd be honestly pretty, pretty powerful not powerful, but like you'd be a God. Yeah, infinite power. Yeah, you have the five, five rings of the five stones of fan. There you go. Those are tools. I think those are tools. I think artists probably don't know how to use, but I think would be useful if you want to learn in the future. But also, if you don't want to, don't do it, don't listen to me.
Speaker 2:No, do your own thing, so there we go. That's a keen, I guess, insight into the future that might be just around the corner, or a month from now.
Speaker 1:Yeah, who knows?
Speaker 2:So, as we wrap things, things as we wrap things up, is there anything that we missed? You want to say or you want to say to the phantom list is out there that are maybe sharpening their pitchforks, or signing up to look at. Python.
Speaker 1:Yeah well, put your pitchforks away. There's no need for that. I don't know Like I'm trying to. What would I say to someone who's afraid of the future? The future is uncertain. You likely will have to change. Right, I'm trying to think of how to proceed without getting a pitchfork. The future is uncertain, but you'll probably have to change. Technology always changes. Art always stays the same, or maybe it has a pattern. I don't have any sage wisdom. I'm sorry. No.
Speaker 2:At least that's very honest. Yeah, I don't know, we'll see what happens.
Speaker 1:Learning can be fun. Try and enjoy it Exactly Well that's it.
Speaker 2:I want to thank our guests for contributing on our journey so far. I want to thank you, the phantom listeners from here who's ramblin' around the water cooler Time to connect to time. As we talk more about the subject of AI in the animation industry, let's find out together. Don't forget, keep your eyes on the horizon. Goodbye, aborting transmission Hi.