The final version of the video, for those interested in just what these models can achieve, generated using just the one actual photo from the first post in this thread, is here.
Switched back to the model I've had the most success with, Kling 2.5, and this was what it produced near the end of the first segment (the last frame is actually looking away from the camera).
Much closer to the actual stuffed animal in both face and body.