Everything came together and the final reveal is here! I’ll outline all of the working components below and I’ve attached the source files for download. But if you can’t wait, cut to the chase and watch the video on youtube or perhaps checkout it out with all of the 2024 Silver Lining Film Festival videos. Otherwise I’ll start with a quick recap. I collaborated with GPT4 to come up with a script and choreography for 2 characters, a wood carver named Sandy and a squirrel named Whiskers. With some coding in Blender I worked out how to choreograph subtitles and object movements. And by using hand-drawn sketches I created a physical Sandy shadow puppet using black card, and a digital Whiskers puppet in Blender. For more detail you can read previous posts or start from the start with part 1.
The final production included the following components:
backdrop scene — This is drawn in Blender from a hand-drawn sketch.
animated objects — Also drawn within the same Blender scene are the 2 squirrel puppets, and a log carving animated in 4 frames, and some spotlights with animated luminosity to give the impression of sun rays through leaves. (image below)
choreography sheet — A master spreadsheet containing the timings of subtitles and squirrel movements.
chroma-key red screen — A green-screen backdrop on which to perform the shadow puppetry. I chose red for the chroma-key colour, because… I’m not sure why.
the recording setup — I positioned a webcam to focus on the chroma-key screen, making sure I didn’t get it the way while performing, and positioning a computer monitor so the result are visible while I’m performing. (image below)
The choreography including a good time buffer up front, to allow time for positioning the physical puppet and video editing, and a time buffer in the middle, to allow time to swap the fishing rod for an axe while the shadow puppet was off screen. I included movement “notes” subtitles for practicing, which could be switched off in the final version. In the end I setup OBS (Open Broadcast Software) to do a few things at once: input a video recording from Blender of the animated background scene, subtitles and squirrel puppetry; to take the webcam stream of the shadow puppet and overlay with the chroma-key red removed; and to record the final combined product.
In summary, this method is pretty solid. The only major change I’d make is with the master spreadsheet. I developed the python code for subtitles and object animation separately, so these use separate input files. While tweaking the timing and choreography I was constantly converting the master excel file into two CSV files, quite annoying, so in future I’ll fix the code to use one file. But that’s it in a nutshell, so here’s the final product…
For others pursuing this path, Blender is an amazing tool, so almost anything is possible. However, there are many challenges and sometimes mysterious behaviour. Expect to be scanning youtube, blog-posts and help documentation. Add features to your scene incrementally, save all previous versions, test everything! People will recommend having a powerful computer to render your scene quickly and perfectly. I don’t have a powerful computer, so I used quick and dirty Eevee rendering and recorded directly from the screen using OBS, which is good enough. So be encouraged, all of the above is quite accessible. I’ve shared my choreography master sheet, code and blender below for anyone to use. Let me know if you try this or are taking it to the next level. I might be able to provide some best-effort help with the coding, perhaps.
It’s time to start assembling all the components for the scene. This is requiring a lot of sketching, in the main for the 2 characters and the backdrop. For the woodcarver character I used a sketched template to cut out parts from black card stock. The parts included a body, 2 legs, an arm and a head. Sounds complicated, but it only has 2 control rods, one connected to the arm, the other connected to a leg, and the body position is easily controlled between the two rods. The head pivot I connected to the arm pivot with a 4-bar-linkage and a rough ratio of 3:1, so the head moves subtly with the arm. (Thanks to Alex and Olmsted’s wonderful video.) The final leg I left uncontrolled, thinking it could be subtly influenced by surface friction, gravity, and also the axle/pivot it shared with the controlled leg. After a few practices I decided to attach it to the the other leg with some thread (added after the below photo was taken), and then it could freely move but not too far apart.
For the squirrel I used a sketch as a “reference image” in Blender to create “grease pencil” drawings of the squirrel in parts. These drawings were simple dark shapes with a few cutouts. The body, the head and the tail were separate drawings, i.e. separate “grease pencil” objects in Blender. The head and tail were attached to the parent body at a pivot point, and all transforms were locked except for rotation. I decide not to use vertex groups or bones to animate the squirrel this time, to simplify the animation code as it manipulates objects. I gave the squirrel head a “constraint”, limiting the range of rotation, and also created a “driver” for the value of the rotation so that it pivots slowly in the opposite direction to body rotation. The effect is, if the squirrel is flat to the ground, it looks forward, but if it stands up it looks more towards the woodcarver. This is perhaps a digital equivalent of the 4-bar-linkage. So the movement file (that will eventually contain timings and instructions based on the AI’s choreography) only needs to control the squirrel and it’s tail, and the head will move sympathetically. Finally I duplicated the squirrel and gave the second squirrel an acorn, thinking the second squirrel can be swapped in after the acorn pickup, rather than having to worry about adding more Blender movement code to pick something up.
After all that the largest thing left is the scenery. I’ve pencil sketched it and again will use it as a reference image in Blender to create a grease pencil drawing. I’ll also need to create the block of wood that evolves into a statue, perhaps a separate set of drawings and just use Blenders standard animation (using the “dope sheet”) to fade in and out successive drawings.
I expect there to be only one more post after which the scene will be complete. This should cover how everything comes together, the Blender scene and objects, the movement and subtitles, and how I’ll integrated the physical shadow puppet with the Blender rendered scene.
So in part 1 we came up with our 2 characters, a wood carver (Sandy) and a squirrel (Whiskers). We co-created a script for the scene. And part 2 explored ways the characters could communicate (subtitles) and ways an AI could direct the movement of a character by describing what should happen. So for part 3 I want to share what happened when I asked the AI to describe the squirrel’s movements.
I confess I entered into this discussion by thinking of what questions I should ask, how to describe the things I wanted, and what question order would be best for getting what I want. I was thinking to deliberately lead the AI towards a successful conversation, a set of outputs. However, I had no idea what would result from the conversation. And I’ll explain my reaction, the AI provided far more than I expected, and it really challenged the way I imagined the story. As a result, I’m definitely rethinking the layout of the scene (which currently exists only in my mind), how objects in the scene will be located such as the statue, and how I imagined the squirrel moving around the scene. So this really does feel like a collaborative co-creation.
The following chat session continues on from the start of the chat session in part 1.
You: I think we should discuss the scenery in which the above scene takes place. I imagine the scene is framed by tall trees to the left and right. A large log is on the left of the screen and the stream is visible in the background. At the beginning Sandy is sitting on that log with a fishing rod and line in the stream. The ground is clear across the front of the scene where our characters will perform their interaction. Do you think that would suit our story? Is there anything else we need for either Sandy or the squirrel to act out the scene?
Bot: I think that sounds like a wonderfully picturesque setting for our story. The tall trees provide a sense of enclosure, making the scene feel intimate and private. The log becomes not only a seat for Sandy but also a stage where Whiskers can show off his acorn. The stream in the background contributes a peaceful ambiance, complementing the relaxed tone of the story.
For the characters, I think Sandy would need a bag or box for his carving tools and the block of wood. As for Whiskers, maybe there could be a little pile of acorns nearby at the beginning, which could be the source of the one he gifts to Sandy. This could signify his willingness to share what he has with his new friend.
You: Wonderful. So thinking of that scene can you list all of the locations, essentially the stage marks, that the little squirrel would move between?
Bot: Absolutely! Here are the stage marks for Whiskers, the squirrel:
1. Acorn Pile: To the right of the log, this is where Whiskers grabs one of his precious acorns to give to Sandy.
2. Log: This is where Whiskers gives the acorn to Sandy. It's also where they converse and where Whiskers poses for the statue.
3. Ground in Front of the Log: This is where Whiskers excitedly scampers in a circle when Sandy suggests making a statue.
4. Statue: Positioned near the log, this is where the statue ends up once Sandy's done carving it, and Whiskers comes here to admire the final masterpiece.
5. Sandy's Shoulder: Whiskers clambers up here when Sandy invites him to his cabin for supper.
Remember, Whiskers is a small creature and likes to dart around quickly, so he could also pop up on the log, around the ground near Sandy, or even in the trees at times to make the scene lively and dynamic.
You: What types of movement do you think would help bring the squirrel to life? You've mentioned scampering and darting around before.
Bot: Whiskers the squirrel is a ball of energy and curiosity, so his movements should reflect his lively character. Here are some types of movements that can help bring Whiskers to life:
1. Twitching: Squirrels are known for their constant twitching, especially their tails. This twitching can be used to show excitement or agitation.
2. Leaping: Squirrels are great acrobats. Leaping, whether it's from the ground to the log or among the trees, will showcase his agility.
3. Quick Head Movements: Squirrels are always alert and attentive. Quick, jerky head movements can reflect their high energy and constant vigilance.
4. Flicking Tail: Whiskers can flick his tail in excitement or use it for balance while he's on the move.
5. Nibbling: To illustrate his love for acorns, Whiskers could occasionally be seen nibbling his acorn, turning it around in his tiny paws.
6. Clasping: When Whiskers is holding the acorn or clinging onto Sandy's shoulder, his little paws can clasp and unclasp, highlighting his careful handling or secure grip.
These movements, combined with our little squirrel's expressive eyes and chittering sounds, will surely make Whiskers come alive in our story.
You: Considering this part of the above scene: Whiskers, from high in his tree, notices Sandy's tuneful whistling. His curiosity piqued, he decides to descend from his tree and scampers towards the stream, rustling some leaves as he goes.
Can you list Whiskers' movements as a series of steps, including moves from one of the above stage marks to another, any pauses, or any of the other movements you mentioned above?
Bot: Absolutely, let's break it down:
1. Whiskers begins in the tree (Stage Mark: Tree), his attention grabbed by Sandy's whistling. He twitches his tail in anticipation and flicks his ears towards the sound.
2. He decides to descend, leaping from branch to branch with agility and quickness.
3. He makes a pause on the last branch before the ground, his head making a quick back and forth movement as he calculates his jump.
4. He jumps from the tree onto the ground (Stage Mark: Ground near Tree).
5. Hearing the rustling leaves as he lands, he freezes for a moment, his tail twitching nervously.
6. After a pause to ensure all is safe, he briskly moves towards the log where Sandy is seated (Stage Mark: Log), scampering across the ground and rustling the leaves.
7. Arriving at the log, he pauses again, his quick head movements surveying Sandy before making his next move.
8. Finally gathering his courage, he leaps onto the log to interact with Sandy.
These series of movements not only showcase Whiskers' physical agility but also highlight his cautious, curious nature.
You: Brilliant. Now considering the next part of the above scene: Whiskers tilts his head, considering the invitation. After a moment, he scampers onto the log next to Sandy, settling down comfortably. He continues to watch the dancing fishing line with interest, every now and then glancing up at Sandy with a curious chirrup.
Can you list Whiskers' movements as a series of steps for this part, again including moves from one of the above stage marks to another, any pauses, or any of the other movements you mentioned above?
Bot: Of course! Let's break down Whiskers' movements for this part:
1. After hearing Sandy's invitation, Whiskers pauses on the log (Stage Mark: Log), tilting his head in contemplation. His quick, darting eyes show he's thinking.
2. Deciding to accept the invitation, he scampers along the log towards Sandy, his tail flicking in excitement.
3. Once next to Sandy, he settles down, taking a comfortable position. His body relaxes but his tail continues to twitch, indicating he's still alert.
4. His eyes fixate on the dancing fishing line in the stream (Stage Mark: Stream), reflecting his intrigue. He leans forward slightly, showing the intensity of his interest.
5. Occasionally, he breaks his gaze from the fishing line to glance up at Sandy, giving a curious chirrup. With each glance, he tilts his head a bit, signifying his curiosity.
6. His tiny paws clasp and unclasp, reflecting his desire to touch the fishing line but maintaining respectful distance from Sandy's activity.
These movements showcase Whiskers' curious and respectful nature. They demonstrate that Whiskers is both interested in Sandy's activity and aware of the boundary between them.
You: Great. Now consider this next part of the above scene: Whiskers perks up at the mention of carving. He chitters excitedly, scampering off the log and running towards one of the nearby trees. He comes back moment later, holding a fallen acorn in his paws. Whiskers drops the acorn in Sandy's lap, looking at it and then Sandy expectantly. Was his suggestion of carving an acorn statue too ridiculous?
Can you list Whiskers' movements as a series of steps for this part, once again including moves from one of the above stage marks to another, any pauses, or any of the other movements you mentioned above? It might be tricky for Whiskers to drop an acorn in Sandy's lap, so he could just hold on to it, presenting it to Sandy.
Bot: Absolutely, let's break down the movements for this part:
1. Whiskers perks up on the log (Stage Mark: Log), his ears pricking up and his tail twitching at the mention of carving.
2. He chitters excitedly, scampering off the log towards the acorn pile (Stage Mark: Acorn Pile).
3. After a quick search, he picks up an acorn, holding it carefully in his paws. His tail flicks as he looks at the acorn, pleased with his find.
4. He runs back to the log, his body low and tail streaming behind him.
5. When at the log again, he stands on his hind legs, lifting the acorn in his forepaws to present it to Sandy. He pauses in this position, his body a little tense as he waits for Sandy's reaction.
6. His eyes dart quickly between the acorn and Sandy, indicating his hope that Sandy will get his suggestion of carving an acorn statue. His tail gives a nervous twitch with the anticipation.
These actions clearly demonstrate Whiskers' excitement at the idea of an acorn carving and his hopeful anticipation of Sandy's response.
You: Wonderful. Ok, now consider the next part of the above scene: Whiskers looks at Sandy, surprise clear in his eyes. He chitters excitedly, scampering in a little circle before settling down again. He then poses with the acorn, as if giving Sandy his approval to proceed with the new project.
For this part can you list Whiskers' movements as a series of steps, including moves from one of the above stage marks to another, include any pauses, or any of the other movements you've mentioned?
Bot: Certainly! Here are Whiskers' movements for this scene:
1. Whiskers starts on the log (Stage Mark: Log), looking at Sandy with surprise in his eyes. His tail flicks and his head tilts slightly, showcasing his astonishment.
2. He releases an excited chitter and scrambles off the log onto the ground, scampering in a small circle (Stage Mark: Ground in Front of the Log). This movement emphasizes his uncontained joy.
3. After his brief bout of excitement, he pauses for a moment to catch his breath, his sides slightly heaving and his tail twitching.
4. He then bounds back on the log next to Sandy, settling down near him and the acorn.
5. He poses with the acorn, delicately holding it in his paws as he looks at Sandy, giving a pause as if to get Sandy's attention.
6. He then continues to hold the pose, his body stiffening a bit as if giving Sandy his approval for the new project, his tail flicking in excitement.
These movements show a range of emotions from Whiskers - his surprise at Sandy's acceptance, his joy at the prospect of an acorn statue, and his implicit approval for Sandy to begin.
You: Very good. So consider this next part of the above scene: Whiskers climbs onto the log, standing beside the statue. He chitters approvingly, his eyes wide with admiration. He runs his paws over the wooden version of himself, his tiny fingers tracing the intricate details that Sandy had carved.
Could you list Whiskers' movements as a series of steps for this part, including moves from one of the above stage marks to another, any pauses, and any other movements you mentioned above?
Bot: Certainly, let's break down Whiskers' movements:
1. Whiskers begins on the ground (Stage Mark: Ground in Front of the Log), looking at the statue on the log.
2. He clambers onto the log (Stage Mark: Log), his movements quick and agile as he approaches the statue.
3. Standing next to the statue, he pauses and looks it over, his tail twitching and eyes wide with admiration. His head may tilt from side to side as he takes in the details.
4. He then lets out an approving chitter, his rapid head movements indicating his excitement.
5. Next, he reaches out with his paws to touch the statue, pausing momentarily as if asking for permission.
6. He then runs his tiny fingers over the statue, tracing the intricate details carefully. His paws clasp and unclasp as they explore the carved wood.
7. Throughout this, he occasionally glances at Sandy, his eyes still wide with admiration and gratitude.
These movements show Whiskers' admiration for the statue and his recognition of Sandy's talent and effort.
You: Nice. And now consider the final part of the scene. With a swift nod, Whiskers accepts the invitation. He clings onto Sandy's shoulder as they make their way back to the cabin, their journey lit by the dwindling sunlight filtering through the autumn leaves. Their friendship promises many more shared adventures and warm evenings by the fireplace.
Once more can you list Whiskers' movements as a series of steps for this part, including moves from one of the above stage marks to another, any pauses, or any of the other movements you mentioned above? Perhaps have Whiskers follow Sandy and not cling to his shoulder as that might be hard for us to co-ordinate.
Bot: Absolutely, let's consider this final part:
1. Whiskers begins on the log (Stage Mark: Log), facing Sandy as he is invited for a warm evening by the fireplace.
2. He responds with a swift nod, his ears flicking and tail twitching to indicate his acceptance.
3. He then carefully hops down from the log onto the ground (Stage Mark: Ground).
4. Whiskers pauses for a moment, as if to gauge Sandy's direction.
5. Once Sandy begins moving, Whiskers starts to follow him closely, his movements quick and light as he scampers across the ground.
6. While following Sandy, he occasionally pauses and looks around, his movements cautious yet excited, capturing the dwindling sunlight and the autumn leaves around them.
7. As they reach the cabin (Stage Mark: Cabin), Whiskers pauses one last time, taking in the welcoming sight before scampering up the stairs after Sandy.
These movements show Whiskers' acceptance of the invitation and his anticipation for a warm evening by the fireplace, depicting the start of an exciting friendship full of shared adventures.
You: Wow, that's an awesome description of all the movements. I'll try to animate Whiskers for you following each of these steps and instructions, as much as I can. And I'll choreograph Sandy to perform together with Whiskers in the scene. Any final words of advice or things you'd like to see?
Bot: I'm glad to hear you found the descriptions helpful! For the animation, focus on capturing Whiskers' characteristics and emotions through his movements. He's described as curious, excited, respectful, and appreciative. Express these traits through his actions.
Keep his movements quick and nimble to reflect his energetic and lively nature. However, while he is active, he is also cautious and considerate, so remember to include moments of hesitations and pauses.
Lastly, it would be wonderful to see the bond between Whiskers and Sandy being portrayed strongly in the animation. Their friendship is a key element of the story.
I'm looking forward to seeing your animation. Good luck!
So with my new creative friend and collaborator on this project, GPT4, we’ve come up with a pretty good script (in part 1). But my friend has a few challenges being a language model. They have no limbs to control a puppet. And no real vision nor ability to get visual feedback, so I’m not sure precision movement is possible. But hey, life is full of challenges, so I’ve come up with a plan to give the AI as much control as I can.
I had always intended for the scene to have subtitles, as practically I don’t have to do voice acting, and having seen this before in shadow theatre I like the aesthetic, text feels like a story book. “Once upon a time…” etc. But what I didn’t expect, the AI somehow decided their character (a squirrel) doesn’t speak, although it does make some noises, and my character the wood carver has some lines. (Obviously a squirrel can’t talk, but understands English?) Playing around in a Blender test scene, I wrote some python code to update a simple Text object with changing text. Then I upgraded the code to read from a CSV file (a simple spreadsheet) that contains time-stamps and lines of text. So that’s a big tick, subtitles are possible! I’m never going to perform subtitles in real time, so copying out my character’s lines and the squirrel’s “noises” makes a lot of sense for me. I’ll need to guess the timings, and then practice the scene a few times to test the pace, but I think I’d be doing this anyway.
What if the squirrel’s movement worked a similar way? The AI appears to have a good imagination, so could the AI describe what the squirrel was doing for each part of the scene? Maybe it could say where the squirrel was, how it would be moving, and if it moved to somewhere else. This seems quite doable for the AI. And if I’m already testing timings for subtitles, timings for actions would line up with these. And somehow I think I could perform my character’s movements alongside the timed squirrel actions. (Still need to figure out how I’ll eventually do that.) If something doesn’t work, I can try to tweak the timings, or potentially ask the AI for ideas to fix that part of the scene.
I created a dummy squirrel object in my Blender test scene, from a flat image, and wrote some more python code to attempt to move it. Starting simple, I just want to move an object from one place on screen to another. So I reused the code to read another CSV file with time-stamps, but instead of subtitles this file contains instructions for movement. Initially the code understands only 2 instructions, one that names a location on screen (eg. “tree”, “ground”, “top-of-log”), and another that moves a named object from one named location to another. After figuring out how to do some basic vector arithmetic in python and Blender… success!
OK, there’s a lot more work to do. Clearly a squirrel doesn’t fly in a straight line, so at the very least it needs to walk, or hop, or run, or jump. It will need to change direction, sit up, perhaps move subtly on the spot, or while “talking”, or focus on an object like an acorn. And maybe it’ll need to do these things in a “squirrel-like” way. But I’ll ask the AI what it thinks about this next.
I’m starting off a new project. And this one is inspired by a recent UNIMA Youth Commission session about AI & Puppetry. The session prompted thinking about the faces of AI, and one day perhaps AI could be “person” with whom we will collaborate, not just a “fake” creator or tool for creators. While selfaware AGI (artificial general intelligence) is likely a long way off, I think we could try that collaboration right now. Or at least I’d like to know how far we can push it now and what sort of result might be possible. So here’s the rough plan…
Create a short story involving 2 characters, one created by me, the other by the AI.
Find a way for us both to manipulate elements of a scene. I’ll use Blender for this, having learned a lot during my last project.
Finally see if I get AI to choreograph puppet movements on screen. This could be the most interesting challenge, will take some thinking.
If others were interested or are experimenting themselves, leave a comment. How do feel about “tinkering with the garage door open”, letting us all know what you’re up to?
I’ll start by sharing a quote shared by Matt Smith from the UNIMA AI session.
My digital puppet collaborators are not machines, but they do learn about, and defer to, my human preferences while – at times – rebelling against my controlling nature. They experience glitches, and have accidents, and it is these that give them credibility. For me, a “successful collaboration between humans and machines” looks like artificial digital puppet life and intelligence
Lucy Childs (2022). The Quest for Life and Intelligence in Digital Puppets (Doctoral dissertation). p186-187
Choosing an AI (large language model)
I thought I’d read into this a bit, because you should know your tools, right? Well it got pretty technical and long-winded, so I’ve included what I found in a different blog post. In the end I felt pretty comfortable going with a later model of OpenAI’s GPT, specifically the gpt-4-0613 model. Oh, I also use the Android app ChatBoost because I can connect it to a paid OpenAI account and select a specific model to use for chat.
Part 1 – Creating a short story
Below I’ve included the discussion I had using gpt-4 to co-create a short story or scene. I think it went pretty well. Towards the end it stopped taking turns, perhaps decided to get a bit pushy with me 😉 but we got back on track.
You: Hi there. I'd like to try an experiment where we co-create the script for a scene together. The scene will involve two characters, one will be played by me, the other by you. The characters will be played by shadow puppets so they could be human, animals or fantasy creatures. I'll start by describing my character, then you can describe yours, then we can take turns saying what our character does next. Let me know if that makes sense to you.
Bot: Absolutely, that sounds like a fun and creative experiment! I'm ready to co-create this shadow puppet scene with you. Go ahead and describe your character.
You: My character is called Sandy. Sandy is a woodcarver who lives alone in a cabin by the woods. By day Sandy loves to walk through the woods, whistle tunes and work on carvings. At night Sandy loves to sit by the fire and eat fish caught from a stream. Your turn.
Bot: My character is named Whiskers. Whiskers is an adventurous squirrel who resides high in a tree just on the edge of the woods. During the day, he explores the woods, gathering nuts and playing with the other forest creatures. At night, he enjoys nestling in the branches of his tree, munching on acorns and listening to the sounds of the night. Now, let's start our scene.
You: Sandy is sitting on a log by a stream fishing. He is whistling a tune.
Bot: Whiskers, from high in his tree, notices Sandy's tuneful whistling. His curiosity piqued, he decides to descend from his tree and scampers towards the stream, rustling some leaves as he goes.
You: Sandy stops and say, "Well hello little squirrel. What brings you to this part of the woods on such a lovely Autumn day?"
Bot: Whiskers looks up at Sandy, eyes gleaming with curiosity. He makes a series of chittering noises and gestures towards the fishing line dancing on the surface of the stream. His way of asking what Sandy is doing.
You: "Ah, you mean my fishing line." Sandy holds up his line. "Well I'm trying to catch a fish for my supper. But I also don't mind sitting by the stream and enjoying the sun. Would you like to join me?"
Bot: Whiskers tilts his head, considering the invitation. After a moment, he scampers onto the log next to Sandy, settling down comfortably. He continues to watch the dancing fishing line with interest, every now and then glancing up at Sandy with a curious chirrup.
You: After a short while relaxing in the sun Sandy turns and says, "I have enjoyed your company little squirrel. And I would stay for longer, on any other day, but I'm afraid today I must get back to work. You see, the King has asked me to carve a statue as a present for his daughter. I have found the perfect piece of wood. But I still do not know what to carve."
Bot: Whiskers perks up at the mention of carving. He chitters excitedly, scampering off the log and running towards one of the nearby trees. He comes back moment later, holding a fallen acorn in his paws. Whiskers drops the acorn in Sandy's lap, looking at it and then Sandy expectantly. Was his suggestion of carving an acorn statue too ridiculous?
You: Sandy says, "My dear little friend, what a perfect idea! So simple and beautiful. Would you mind if I carved a statue of you holding the acorn?"
Bot: Whiskers looks at Sandy, surprise clear in his eyes. He chitters excitedly, scampering in a little circle before settling down again. He then poses with the acorn, as if giving Sandy his approval to proceed with the new project.
You: Sandy races off to get his tools and the wood to carve. In a moment he is back and working with joy and purpose on the block of wood. In what seems like no time at all, the wood is carved away, and what remains is a perfect statue of squirrel proudly holding a shiny acorn. Sandy sighs and steps back to look at his work.
Bot: Whiskers climbs onto the log, standing beside the statue. He chitters approvingly, his eyes wide with admiration. He runs his paws over the wooden version of himself, his tiny fingers tracing the intricate details that Sandy had carved.
"Well done, my friend," Sandy says to Whiskers, patting him gently. "This is a masterpiece, thanks to your inspiration."
The sun sets, painting the sky in hues of orange and purple. Happiness filled the air as Sandy packed up his tools and Whiskers scampered around the statue. It was a day well spent, and an unlikely friendship formed in the heart of the forest.
You: Hey, I was supposed to say what Sandy says. Couldn't hold back there, huh? It really does seem like a good place to finish the scene. But I'll add on more thing. Sandy says, "Would you like to come back to my cabin for supper? It's small but cozy with a warm fireplace."
Bot: With a swift nod, Whiskers accepts the invitation. He clings onto Sandy's shoulder as they make their way back to the cabin, their journey lit by the dwindling sunlight filtering through the autumn leaves. Their friendship promises many more shared adventures and warm evenings by the fireplace.
You: The end. Well done, you! It has been fun co-creating this story with you.
Bot: I enjoyed it too! Thank you for the creative scenario. I'm here if you want to create another story or need assistance with anything else.
As a part of a larger project I thought I’d read a bit into AI rather than just pick a tool and not know what I was doing. Well I learned a lot, which I’ll share here, but there’s probably still a lot more to learn. And I’ve found some references which I think are handy, or at least interesting for you to also browse, so I’ve included them below. Let’s discuss!
The number of parameters in a model seems roughly like the number of neurons or brain-power size, probably not a direct comparison between products as their approach differs. The training dataset size might make a big difference and also the make up of the training data used, as some focus on carefully selected content (expertise?) over broader content (wider human culture?). The majority of recent and accelerated growth in AI models seems to be due to these 2, the growing number of parameters and increasingly larger datasets. The number of tokens in a model I believe can be likened to it’s vocabulary size. And many models are trained on internet data available up to a certain date, or training data available date, so be aware of this if recent world information is important. Something else to be aware of, some interfaces and products say they will choose from different underlying models depending on the situation, in principle to optimise the outcome, but I imagine this might also depend the developer or user’s subscription budget. So you may not always know what you’re using. I’ve found you can try ask a model for more information about itself, but OpenAI for example will only tell you the training data available date, but it doesn’t seem able to (or willing to) provide all the details.
OpenAI’s GPT4 seems to still be a go-to for industry comparison, consistently at or near the top with the most complex model (largest number of parameters) and perhaps it’s the easiest to get started with (and/or pay for). Or perhaps that’s OpenAI’s marketing at work, as ChatGPT seems to be a name lots of people recognise and many people have tried, experts and non-experts. There are heaps of different benchmarks, but GPT4 seems consistently of high quality and often a benchmark that everyone aims to beat. But there is definitely no “right answer” as many AI’s are designed to be stronger in different areas.
Try to be aware of an AI’s capabilities, for example use this link to look up OpenAI model capabilities. Notice that if you just select GPT-4 on their web site or perhaps in other apps, that is currently shorthand for gpt-4-0613 and this model has 8k context tokens (8×1024, or roughly 8,000) and uses training data available up to Sept 2021. (correct at the time of writing this) A better alternative for long form creativity could be to choose a model with 32k tokens. OpenAI says that 1 token is the equivalent of about 0.75 words. So 8k is 6,144 words which is about 22 pages of a novel (assuming 280 words per page average), and 32k at 4 times larger would be about 88 pages of a novel. But what is context? AI is essentially “stateless”, meaning it has no memory, so without you seeing it happen the recent memory or history of your conversation (both prompts and responses) is sent each time to the AI as the “context” for your next prompt. That’s why you can come back to any chat conversation at a later time, the AI doesn’t magically remember where you left off, the software stores and sends the full context to the AI again. I hope I explained that well enough. So the maximum number of context tokens is essentially the maximum size of the AI’s short-term memory. Now, depending on how each chat algorithm works, as the context limit is reached they could either remove (forget) the oldest messages, or they could summarise (shorten) a section of the older conversations. If you’re having a quick chat or writing a short poem this probably doesn’t matter. But if you’re getting the AI to help write a manuscript having over “22 pages” of short-term memory could start to matter.
Maybe some final AI threads for others to follow, if interested, on the responsible development and/or use of AI. I think these things we still need to watch unfold, and see if the discussions, policies, companies and the world converge or diverge.
In 2021 UNESCO adopted an agreement to support 193 member states in implementing and reporting on a set of recommendations on the ethics of AI. Several companies have signed an agreement with UNESCO at the Global Forum on Ethics of AI in February 2024 to develop more ethical AI (Microsoft, Mastercard, Lenovo, GSMA, INNIT, LG, Salesforce, Telefonica). While OpenAI has a major partnership with Microsoft, they don’t appear to have a position on the UNESCO’s recommendations. In fact a recent UNESCO study seems to point fingers at biases within OpenAI’s older models and their closed development approach. The results however, for the more recent ChatGPT model (see figure 1) showed a significant shift towards more positive and neutral content generation. I’m sure UNESCO are hoping studies such as this draw more attention to their recommendations and may even influence global AI benchmarking. So I think it’s fair to say there has been a lot of effort put into OpenAI’s approach to safety and the pressure is not coming off anytime soon.
The final thread I’ll mention is the environmental impact of the computing power necessary to train these models. I think it’s clear that while the AI buzz is booming this will be an issue. This article suggests that in 2019 the power used for training one AI model is 20 times larger than the power an average American uses in a year, and the power used just keeps increasing for larger and larger models. Sound like a lot? Sure, but how many people then use that one model, millions? An AI once trained can be used over and over again, and in fact for some significant tasks AI is being used to significantly reduce the need for computational power (engineering and simulation, architecture and rendering). The buzz may be driving up speculative investment and growth, but eventually companies and models will only remain viable if their cost of growth remains lower than what people are willing to pay. The more a pre-trained model is used the better, and the more people who re-use it the better, and the more AI companies can reduce their training costs the better. So the end point of this thread isn’t clear to me, I don’t think the market will tolerate infinite growth, and I think there’s a chance the benefits could look similar to cloud computing – where the total size and growth is still very large, but the net benefit for applications moving to use the cloud is an 87% decrease in energy consumption (see this article).
This was a bit of an experiment inspired by 2 techniques. The first technique is use of digital projectors to throw rich background scenes onto a screen along with traditional shadow puppets casting shadows. The production Feathers of Fire is a great example of this (image below). The second technique is the use of electronic hand controllers, called “waldos”, originally used to control animatronic puppets, but now these are also used to control computer graphics (CG) puppets. Jim Henson’s Creature Shop as using this for shows such as Word Party (image below). In fact these waldos are in use together with motion capture of actors for the body of these CG character. So taking this a step further, could the hand controller be replaced by motion capture? And given a digital background scene could those hand gestures be used to puppeteer CG shadow puppets on top of that scene?
TL;DR – Well the answer is yes, to a limited degree, but I don’t think the results are good enough to persist with. Not without further investigation of more expensive equipment and software. And I guess that’s an extra question, is this accessible to everyone, can it be done cheaply then? Sadly no.
I used the free 3D modelling and animation software Blender, with the free BlendArMocap plugin, and sketches were done with pen and paper but eventually for digital drawings I used an XP-Pen tablet (not necessary but it makes Blender grease pencil drawing a lot easier). To summarise two of the key issues, the creator has discontinued development on BlendArMocap, and as-is the range and accuracy of motion control is limited.
If you want more details, or to try this for yourself, and I hope you’ll push this exploration even further — read on and I’ll go a bit deeper. I’d love to know what you discover too, so leave a comment. You can also watch my results on YouTube and decide for yourself. By the way, the characters occasionally jumping around was due to jerky mocap issue, it was not intended.
Is This Still Puppetry?
I felt I needed to answer this question. The World Encyclopedia of Puppetry Arts recognises virtual puppets. But isn’t what I’m doing just “animation” aided by motion capture? Well I hope you’ll see it as puppetry. The virtual puppets I created are very much like real world shadow puppets — essentially flat black figures placed against a screen with a few moving joints. The motion capture plugin was able to map my real world movements onto a virtual human skeleton (a “rig” inside Blender), which is like having a virtual twin. I decide to move this virtual puppeteer out of camera shot, but you still can see a screenshot of it below. Then in software I connected the puppets’ joints to the virtual puppeteer hand and finger movements, very much like connecting a marionette with strings to my fingers. I could watch the puppet move as I moved my real hand. The puppet did not emulate my motion capture, it’s more accurate to say that I was manipulating the puppet in real time. So I believe in essence this is puppetry, using manipulation to breathe life into an inanimate object. Brian Henson, while demonstrating a waldo device controlling a digital character on screen, compares this to muppets on a TV monitor.
“when we’re puppeteering we don’t think about our hand, we don’t think about our puppet. We look at a monitor and there’s a puppet on the monitor, and we make the puppet do stuff. So eventually you’re in the training, you’re forgetting that anything is happening up here (a raised puppet arm). All this has to happen automatically as you’re deciding what that puppet does. And it’s the same thing that’s happening here (points to the digital character on screen) but at a more sophisticated level.”
Brian Henson
Building all the Components
In thinking about putting this together I wanted it to be a real test. I wanted it to tell a story, to have multiple characters and to look decent. And perhaps because I started with a sketched story board, I decided for ease of transition that the main character, a magi, would essentially stay in place while the background would scroll from one scene to the next. Of course, some characters would be fixed to the scene, essentially scrolling with the background. So the final setup in Blender (screenshot below) included a stationary camera facing the scene, perfectly positioned to capture one background scene panel at a time, with stationary lighting that spotlit the current panel, a row of the background panels that moved from the far right to the left most panel as the story progressed, my virtual puppeteer twin just off camera to the right, the Magi puppet sitting just in front of the background panels (an opaque, black material, reflecting no light so appearing as a shadow), and a number of other moving puppets tied to their specific panels. You can see on the overview screenshot below the Magi just starting out kneeling in front of the fire on the first panel, the right most scene.
The concept puppets started out as sketches. Then in Blender I traced each of the shapes as a separate grease pencil drawing for each puppet. In the case of puppets with moving parts, I drew shapes for each of the parts/limbs then joined them. The Magi was the most complicated, requiring the parts to be joined/rigged with movable “bones” as part of their own armature or skeleton (screenshot below). The person in the sky is was another traced puppet but with only one moving joint, an arm that raises and points.
The sky person was done using a special effect. The puppet object was a thick outline that used Blender’s Geometry Nodes to replace the outline with a star field. It used a Driver that changed with a time transition, to disperse the stars randomly at the beginning and to coalesce them back to the outline at the end. A similar effect was used for creating the dove shaped comet in the sky. The tiger that transformed into a woman was simply two shadow puppets joined at right-angles, so a quick 90 degree rotation transforms from one shape into the other (screenshot below).
Tech Tips: for turning a Blender grease pencil object into a puppet.
Add a new object “Grease Pencil > Blank”. For drawing the puppet setup the main material, rename to say “BlackPaper”, to have Stroke settings of Line, Solid, Base Color of black, Self Overlap. Turn on Fill with settings of Solid, and a Base Color (something like dark blue initially, but change later to black to create shadow) ensuring Alpha is max 1.0. Add a new material call it “BlackPaperHoles” with Strokes settings Line, Solid, BaseColour black. Turn on Fill with settings Solid, Base Color black alpha 1.0 and set Holdout, which cuts/masks out anything else in the drawing. (You could create a new layer for holes and make it as a mask for the first layer, but there’s no difference, and Holdout can keep the holes on the same layer. You can also set Holdout for the Line settings if you want find cutouts.) You can then draw over this with a new material that has a Solid Fill but with the Holdout option selected, which will cut holes in your original material. Useful for eyes, mouth, and other detailing. Go into Draw Mode, toggle on Automerge (to easily continue lines, or connect ends, but perhaps turn off when doing fine work or holes), then adjust your pencil Radius (say 5px) and Strength to 1.0, and Advanced Active Smoothing to 0.35 (or .6 say). You can create other coloured materials to add some luminous parts of the puppet.
In the Data Properties I created different Layers for different moving parts, to ensure they overlap in the right order, and also for easily turning off layers to work on specific limbs. In this section you will also turn off Lights for each layer, so scene lighting will not affect the puppet as a shadow. You can change to Sculpt Mode to smooth or manipulate or clean up any lines. You can also change into Edit Mode to manipulate or delete individual points on the line curves. I used Edit Mode to select all vertices associated with a limb/part (hide layers or materials to make doing that easier), then under Data Properties I created a Vertex Group by Assigning the selected points, and created one for each limb, prefixed with “SHAPE_” and then it’s name. Later you can more easily reselect all these vertices to Assign them to an Armature Deform vertex group.
To “rig” the puppet, back in Object Mode Add an Armature Single Bone. I won’t go into this in too much detail, as there’s lot of tutorials on this. This first bone should represent the fixed bone controlling the whole body, so place, scale and rotate this into a good spot, then Extrude child bones from the end to create structure and joints, naming them all usefully. In Pose Mode, lock the transformations down to prevent movement in unexpected dimensions (eg. lock location if it’s not connected to a parent bone, lock scale so it will only scale if the parent does, lock rotation in all planes except the pane you expect movement). In Object Mode, select the Grease Pencil object then shift click to select the Armature last, press Ctrl-P to Parent > Armature Deform with Empty Groups. The Grease Pencil object should now have an “Armature” modifier added to it and each bone should have been added as a Vertex Group of the same name. In Edit Mode in the Data Properties > Vertex Groups properties, select a limb group you created already, press the Select button (all of the vertices should get selected), select the correct bone group to attach it to, press the Assign button. Do this for each bone and limb you need to attach.
The scene panels themselves were just flat planes in Blender that use Material Nodes to add a sky color ramp. Using Geometry Nodes a star field was added to the first panel. Using a sketch I traced a Grease Pencil drawing of each scene on top of each panel (screenshots of one scene sketch and final below). Some darker grease pencil objects were used between panels to help hide the transition from one scene to the next. The sun and the moon had low-power point-lights added to them, to give the effect of them glowing and shining light on the material/scene surrounding them. The campfire was similar but with a Noise modifier added to the light’s power so that it flickered.
Rigging the Motion Capture
By following the BlendArMocap plugin documentation and videos I was easily able to create a human Armature in Blender (a Human meta-rig), generate a poseable skeleton rig , then connect the rig to motion capture to the Mediapipe target for Hands. If you Start Detection your computer webcam then starts motion tracking, then you stop detection, and following a Transfer it translates the motions to positions of the skeleton rig along the animation timeline. These can be seen along Blender’s Animation view timeline (the Dope Sheet). So you can easy delete the movements and start again, or commence further capture from a specific position in the timeline. After your first Detection and Transfer, the plugin seems to be permanently connected and you can animate the rig in real-time during motion capture. But at this point, you can just see the human rig (the digital skeleton, virtual puppeteer twin) copying and recording your hand movements.
To connect those human movements to the puppets I used a feature called Drivers which I applied to the puppets’ Transforms. Almost any setting value in Blender can be setup to be controlled by a Driver. A Driver can be a value on a curve, it can use an Input Variable that’s copied from another object, it can be a simple formula, or it can call complex python functions. The position of the rig bones are input variables that can be used, for example selecting the object “cgt_ring_finger_pip.R” gives access to the values associated with the ring-finger proximal interphalangeal joint (anatomically name) on the right hand, and using the variable type “X Rotation” is a good representation of that ring-finger’s movement. The value tends to change from about 0.25 radians for an open hand to 1.5 for a clenched fist, so you’ll need to map this range onto an appropriate range for the setting you are driving/controlling. For example, say we set xrot as the name of that ring-finger input variable, then a simple formula to map values on the range of 0.25-to-1.5 onto a driver setting range of 0.0-to-1.0 would be “(xrot – 0.25)/(1.5 – 0.25)“. To make life interesting, Blender seems to default to reading input angles in Euler radians, but for objects the transform rotation settings seem to default to Quaternions… so you will have to experiment with ranges. (Perhaps I should have figured out how to input/set in the same units, or converted using trigonometry somehow.)
Tech Tips: BlendArMocap values representing all the hand movements.
In Blender the cgt_DRIVERS and cgt_HANDS collections are created by a “Transfer” operation in BlendArMocap. These contain all of the moving parts of the rig. Parts follow the naming convention “cgt_****_finger_###.%” or “cgt_thumb_###.%” or “cgt_wrist_.%” where **** = index, middle, ring, pinky (specific finger) and ### = dip, pip, mcp, tip, cmp, ip (joint of the finger) and % = L, R (left or right). Parts with an added “.D” suffix appear to be directly connect to those without the suffix, but the values are less sensitive, so I don’t recommend using them. The following are the values I recommend using and the range of values, which you’ll need to take into account when using them to drive other objects (such as puppet bones) in Blender.
Wrists:
right wrist use Y Euler Rotation meaningfully ranges from 0.3 (facing outwards) down to -2.4 radians normally, but can go down to as much as -3.7
eg. select variable: cgt_wrist.R > Y Rotation, Auto Euler, World Space
left wrist use Y Euler Rotation meaningfully ranges from -0.3 (facing outwards) to 2.6 normally, but can go up to as much as 3.5
Fingers:
use X Rotation of the PIP joint, which ranges for most fingers usually from 0.26 (open) to 1.49 radians (clenched), but index finger can go lower 0.1, the middle finger cannot open as much down to 0.6, and the pinky finger cannot clench as much up to 1.22. Different hands may vary and need tuning.
eg. var: cgt_ring_finger_pip.R > X Rotation, Auto Euler, World Space
Thumbs:
use X Rotation of the IP join for thumbs, which ranges usually from 0.06 (open) to 0.94 radians (clenched), but can go as high as 1.1. 0.45 seems a reliable mid-point to to detect if it’s closed.
eg. var: cgt_thumb_ip.R > X Rotation, Auto Euler, World Space
Next, I started to map out which of the available hand movements should controls what puppet or motion. And then I was able to write out the hand choreography needed to act out the show. (I didn’t have a lot of time for practice, so the puppet movements were not second nature for me.) The right hand was dedicated to the controlling the Magi.
thumb open – magi stands up
index finger close – magi is left of screen
middle fingers down – hand is out
little fingers up – head is looking up
little fingers relaxed – head is forward
little fingers down – head is down
hand clenched (all together) – the magi kneels down, is towards the left of screen, and the hand is out
hand relaxed open (all together) – the magi stands up, is towards the right of screen, and the hand is down, head is up slightly looking forward
The left hand was used to control everything else.
fingers closed – triggers the stars coalescing to form the sky person
thumb close – the sky person’s arm moves to point
fingers open – the first time this happens, it triggers the stars coalescing to form the dove/comet
twisting the wrist – this essentially moves the background panels along to the next scene. It’s a bit complicated, as while the wrist is twisted it actually progresses the magi location along the linear story panels, the sequence of background panels. So initially it progresses the magi towards the left of screen, but once it approaches the left edge of the panel, it moves all the panels right (and the magi with it).
open thumb & fingers – to start with the tiger is in the bushes
thumb close – the tiger rises to the right
fingers close – the tiger-woman puppet rotates, essentially transitioning to show the woman
twist the wrist – also turns the woman around (as the background moves)
Wrap Up Findings
In the end I was happy with the results, as I think it was the best I could achieve considering my inexperience and working within a time limit. (I was working toward a deadline to be part of the UNIMA Australia Silver Linings Film Festival.) I probably should have spent more time practicing controlling the puppets, as the Magi video up on youtube was only my 3rd take. Experienced puppeteers using waldos can get used to a new puppet setup in a few hours, I believe.
Important finding #1 on limited movement – Fingers alone have a relatively limited number of dimensions or degrees of motion. BlendArMocap only records finger movements and wrist rotation, barely 6 degrees (6D) of motion per hand, or at least I cannot move all my fingers independently, so perhaps it’s 5D. The motion capture did not seem to accurately recognise fine movements, but it is also possible that the resistance of a physical puppet and screen serves to steady hand movement. Taking a comparative guess about traditional shadow puppetry, you’re often moving smaller characters around a screen (2D), optionally rotating a character (1D), distance to the screen (zoom or fade, 1D), and say a movable head and arm (2D), and I’m leaving out any rag-doll movement of legs perhaps… so this is could be a total of 6D per puppet with 2 or more sticks. I’ve definitely seen Chinese shadow masters manipulating puppets with more than 2 sticks in one hand, and using both hands and feet. Keeping all this in mind, BlendArMocap seems like a limited tool.
Important finding #2 on pros and cons of BlendArMocap – The BlendArMocap plugin does however have some nice features. The capture function manipulates the skeletal rig in real time, so you can see the puppet movement almost as a live performance. It also records the location of the rig at corresponding times/frames for later playback as an animation, or rendering to a video. However there is one major drawback, live motion capture does not progress the Blender time cursor to keep in sync with other movement. So if objects in the environment also move on the timeline, you are not performing “live” with these moving objects. Another example of that drawback, you might record a first puppet moving successfully, but then wish to go back and record a second puppet acting together with the first. While the second puppet would record in real time, the Blender timeline does not progress during capture, so you would not see the first puppet move.
In the end this meant only one set of puppets could be recorded, so the limited number of dimensions in finger movements ended up limiting the total number and movements of the puppets. Also all animation in the environment needed to be the result of motion capture movement or it would not appear while performing live. Any animation that relied on transitions over a fixed period of time, such as a scene change, then needed to take into account what was the latest captured frame (the capture time) and how many frames had been skipped if any, as both real-time capture and real-time playback will drop frames if needed. The final render will not drop frames, but unless these discrepancies are accounted for, scene changes could look very different from performance, to playback, to final video render. In the end simple driver animations became calls to relatively complicated python code. And I still hadn’t worked out all the issues. Some things like the stars coalescing (first scene, triggered by hand movement) this happened quickly during rendering, but slowly during live capture, perhaps due to geometry nodes being used for the transition of a large number of individual stars.
Other technical difficulties also needed quite a bit of coding to attempt to reduce the issues. Smoothing jerky mocap movements. Utilities to cap movement range, or unevenly distributed ranges (displaced midpoint functions). Utilities to latch transitions in one direction based on a motion capture trigger (such as rotating a wrist to transition scenes). It’s quite a lot but if you’d like copies of the code just let me know.
The BlendArMocap plugin uses Google’s MediaPipe for the realtime capture of hand and finger movements, which was originally designed for the identification of hand gestures, so fine movement capture may not be possible. The location and angle of the hand would be the most intuitive way to move a shadow puppet and would provide greater control. While these seem to be recorded in the MediaPipe code, the plugin does not record these values. And at this point the developer has abandoned support for the plugin, adding to the evidence that it’s probably not a good choice going forward.
Important finding #3 regarding equipment – The frame skipping issues may be reduced by having a faster PC. At the very least rates of change might become similar. Smoothness of mocap might also be reduced by a faster PC, as there’s probably a lot going on, eg. video capture, capture processing, translation to rig movements, and then normal Blender animation. But depending on the complexity of the scenes, dropping frames might not be avoidable.
Rokoko Studio appeared to be a Blender compatible alternative. Rokoko does appear to capture very fine movements, as well as 2 more degrees of movement than BlendArMocap – angle of the wrist up and down and side to side – which could intuitively map to shadow puppet location on screen but this would still lack puppet rotation angle. However, hand accuracy in Rokoko Studio is not possible without purchasing hardware such as the Rokoko Smartgloves, and live streaming to Blender seems to require a paid software plan as well. (The total price is perhaps a bargain for professional artists, but it’s more than I’m willing to pay for experimenting.)
And in conclusion – I learned a lot, but I’m still left trying to think outside of the box for a better answer. How else could I be doing this? Using a proper and hand-built waldo? Animate the scenes based on standard animation timelines, and somehow persist with mocap puppeteering without the screen? Perhaps a green screen performance of real shadow puppets could be merged with Blender animation, using software such as OBS (Open Broadcaster Software)?