Claude Code + BlenderMCP Scene Agent

August 31, 2025

I packaged several models as MCP tools and ran them with Claude Code and BlenderMCP to make a scene generation agent.

What Does This Mean?

Model Context Protocol is a mechanism for surfacing additional tools to LLMs. As a large language model, Claude is aware of Blender as a concept, but can't natively interact with the program. MCP lets us surface arbitrary tools for Claude to interact with, enabling the development of more powerful and autonomous agents.

The popular BlenderMCP addon surfaces several useful Blender tools: get_scene_info, get_object_info, get_viewport_screenshot, and execute_blender_code. This handful of commands alone considerably uplevels Claude's Blender power: now Claude can ideate scenes, execute Blender Python to generate scene data (meshes, materials, lighting, etc.), and inspect the state of the scene.

Blender MCP also surfaces some remote tools like PolyHaven and Sketchfab asset libraries, and Hyper3D, a paid service with an impressive asset generation model. But for the sake of learning and exercising my new GPU, I set about writing MCP tools to run additional generative models locally.

Setting up BlenderMCP Across WSL

After 10 years of not developing on a Windows machine, I was pleased to learn that nowadays, you can easily run Linux VMs on Windows via WSL (Windows Subsystem for Linux). I was able to run VSCode (with the highly useful Blender extension) and Blender in Windows, but Claude Code and Python model inference scripts in Linux. This workflow is about 90% there, but there were some very subtle sharp knives to watch out for.

1. Windows Localhost Is Not Linux Localhost
Under normal operation, BlenderMCP listens to localhost:9876. Then your MCP client (e.g. Claude) will invoke tools by talking to a socket on localhost:9876 using uvx blender-mcp. However, if you run Blender in Windows and Claude in WSL, they do not have the same localhost IP address and will be unable to communicate. This should be configurable via the BLENDER_HOST param; I tried claude mcp add blender --env BLENDER_HOST=$(ip route | awk '/default/ { print $3 }') -- uvx blender-mcp but the environment variable didn't appear to be passed. So I hacked around this by replacing all instances of localhost with my WSL localhost, ip route | awk '/default/ { print $3 }', then called the locally changed MCP server via: claude mcp add blender -- python /mnt/c/Users/name/blender-mcp/src/blender_mcp/server.py

2. Struggles to Read Images
Claude Code supports image understanding, but it usually didn't work across WSL. WSL has a shared directory, known to WSL as /mnt/c/Users/$USER/ and to Windows as C:\Users\$USER\. So if I tried to drag an image from my Windows file browser into Claude Code, the Windows filepath was passed, and Claude couldn't read it using Linux file utilities.

Tool Development

generate_blender_code

For learning's sake, I wanted to finetune a base model to generate Blender code locally. I started with deepseek-coder-6.7b-base and was surprised by my initial interactions:

$ ollama run deepseek-coder:6.7b-base
>>> write a python function to reverse a string
.sample string: 1234abcdoutput :dcba
<jupyter_code>
def rev(s):
    st=""
    for i in s:
        st =i+st   #or  st=i + str     --by join method
    return st
rev("1234abcd")
str[::-1]#--by slicing operator
s="1234abcd"
l=list(s)
l.reverse() # -- by list method
"".join(l)
#using while loop
def rev_whilwloop(string):
    rstr = ""
    length = len(string) - 1
    while length >= 0:
        rstr += string[length]
        length -= 1
    return rstr
rev_whilwloop("1234abcd")
<jupyter_output>
<empty_output>

The model generated various string reversal methods of varying quality, along with commentary, method calls, and unrelated tokens, rather than the neatly formatted chatbot output we're accustomed to. Let's unpack the model name to understand what's going on:

deepseek: The model was created by the company DeepSeek
deepseek-coder: This specific model is Deepseek Coder, trained on a large corpus of code to be used for code generation tasks
deepseek-coder-6.7b: This model has 6.7 billion parameters. Model developers release different quantizations of their model, enabling the same model to fit on different hardware, at the expense of inference quality. The 6.7b model can comfortably fit on my RTX 5080's 16 GB on VRAM, and is expected to generate decent code, but not at the quality level of a paid service's larger model running on industry-grade hardware
deepseek-coder-6.7b-base: This is a base model. A base model truly is base: it is predicting the next token in the sequence, as per its training data, without any additional conditioning to act like the chatbots we've grown accustomed to

From this point, we have several high level control knobs:

Prompt engineering: set the prompt to give the model more context and make it act more like your specific use case (e.g. chatbot, code completion tool, etc.)
Parameter engineering: we can set temperature, sampling parameters, and penalties to encourage the model to produce more constrained/deterministic responses or more creative responses
Fine tuning: this is like sending your model to vocational school. First, produce a dataset of model input and desired output, then run a Performance Efficient Fine Tuning (PEFT) algorithm; generally, Low Order Rank Adaptation (LORA). Much like model training, LORA will run gradient descent to minimize a loss function based on your dataset, but it's much faster than training a model from scratch. The output artifact is a LORA adapter: a much smaller matrix that is intercalated at various layers of the base model to influence its output.

I wrote a script that runs LORA on the base model, with a Claude-generated dataset of about 200 examples. Yes, I used an LLM to write a script to tune another LLM with LLM-generated data. Example data:

instruction: Create a cube in Blender using Python
output: # Create a cube\nbpy.ops.mesh.primitive_cube_add(location=(0, 0, 0))

The model performance improved from 0% to about 50% accuracy after finetuning. Not good enough for production, but it felt impressive with such a small training set. Considering that prior to tuning, the model wouldn't even generate actual code for Blender themed prompts.

prompt: Add a black material to the selected object
pre-tuning response:
    I would add a black material to this object, as it looks like there's not enough contrast between its color and background.
post-tuning response:
    # Add black material to active object
    if bpy.context.active_object and bpy.context.active_object.type == 'MESH':
        mat = bpy.data.materials.new(name="BlackMaterial")
        mat.use_nodes = True
        principled = mat.node_tree.nodes.get('Principled BSDF')
        if principled:
            principled.inputs['Base Color'].default_value = (0, 0, 0, 1)
        # Assign to object
        if bpy.context.active_object.data.materials:
            bpy.context.active_object.data.materials[0] = mat
        else:
            bpy.context.active_object.data.materials.append(mat)

After this exercise, I downloaded BlenderLLM, a more sophisticated tuning of Qwen2.5-Coder-7B-Instruct, then packaged it as a CLI tool using pipx, and wrote a NodeJS server to invoke it as an MCP tool:

// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    let result;
    
    switch (name) {
      case "generate_blender_code":
        // generateBlenderCode calls the inference CLI
        result = await generateBlenderCode(args.prompt); 
        return {
          content: [
            {
              type: "text",
              text: result
            }
          ]
        };
    ...

The final agent didn't make much use of this. I suppose Claude Code is already good enough at generating Blender code and iterating on feedback that it doesn't need to delegate to an MCP tool.

generate_image, generate_texture

Following the above pattern, generate_image is a wrapper around an inference script running Stable Diffusion XL. I wanted to try fine tuning SDXL to be able to generate PBR (physically based rendering) textures: images that look like the base color image, but encode specific 3D information like height or normals.

Base Color

Ambient Occlusion

Displacement

Normal Map

Roughness

I downloaded some textures from AmbientCG, but wasn't able to vibecode a LORA script for SDXL due to lack of familiarity with diffusion model training, so I went with a simpler approach inspired by this post: generate a color texture image, then use basic image processing to generate each PBR map. For example:

def generate_displacement(color_image):
    """
    Generate Displacement map from color image.
    
    Displacement represents surface height variations.
    Uses luminance as a proxy for surface elevation.
    """
    # Convert to luminance
    luminance = rgb_to_luminance(color_image)
    
    # Apply bilateral filter to smooth while preserving edges
    displacement = cv2.bilateralFilter(luminance.astype(np.uint8), 9, 75, 75)
    
    # Enhance height variation with gamma correction
    displacement = displacement.astype(np.float32) / 255.0
    displacement = np.power(displacement, 0.8)  # Gamma < 1 enhances midtones
    displacement = (displacement * 255).astype(np.uint8)
    
    return displacement

This is a naive approach. It's pretty fake to guess at PBR textures from the color data, since the whole point of the PBR textures is to convey 3D information that can't be derived from the color data. There are models that can generate better PBR maps based on actual 3D knowledge within the network (such as ldm3d-4c for depth map generation or MeshGen), but this was good enough for prototyping a generate_texture tool.

generate_lores_mesh

I explored a few different approaches for generating mesh assets, for the prompt "3D model of a wooden hammer".

Code Models

We can generate plain code for creating and manipulating meshes via sequential Blender API calls. On the left, Claude produced a mesh that looks somewhat like a hammer in 240 lines of code, and on the right, BlenderLLM produced a mesh that looks slightly more like a hammer, in a more succinct 40 lines of code.

LLaMA-Mesh

The LLaMA-Mesh approach is cool: this is a tuning of LLaMA, a language model. They're using a language model to generate vertices and faces, one token at a time, relying on spatial reasoning of the language model. The output is a valid OBJ file, thanks to that file format's simplicity:

  # Simple cube OBJ file
  # 8 vertices for a unit cube
  v -0.5 -0.5 -0.5
  v  0.5 -0.5 -0.5
  v  0.5  0.5 -0.5
  v -0.5  0.5 -0.5
  v -0.5 -0.5  0.5
  v  0.5 -0.5  0.5
  v  0.5  0.5  0.5
  v -0.5  0.5  0.5

The downside is that language models don't appear to have great spatial reasoning skills: the output topology is messy, and this doesn't appear to extend beyond simple meshes. It took almost an hour to do inference on my GPU (but since other tools have made practical use of this, there must have been a bug in my inference code).

Shap-E

OpenAI's Shap-E consistently generated nice low-poly, reminiscent of old-school Runescape assets. This is the model I ultimately used for this MCP tool.

InstantMesh

InstantMesh produces some beautiful results, though the output mesh frequently has holes not present in the input. This is a single image to 3D mesh model; the above example used this CC0 glazed porcelain hammer as input. I wanted to write a pipeline that converts text -> image via SDXL, then image -> mesh via InstantMesh. However, the dependencies were too messy for me to package this as a CLI script. I had to do a bit of patching just to get the basic demo working; the dependency nvdiffrast didn't support newer GPUs, so I forced the script to use earlier CUDA and GCC versions:

+#!/usr/bin/env python
+# Set environment variables BEFORE any imports
 import os
+os.environ['TORCH_CUDA_ARCH_LIST'] = '8.9'  # Force Ada Lovelace compatibility for Blackwell GPUs
+os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
+os.environ['FORCE_CUDA'] = '1'
+# Try to use system GCC 11 if available, otherwise disable CUDA rendering
+import subprocess
+try:
+    result = subprocess.run(['gcc-11', '--version'], capture_output=True, text=True)
+    if result.returncode == 0:
+        os.environ['CC'] = 'gcc-11'
+        os.environ['CXX'] = 'g++-11'
+        print("Using GCC 11 for CUDA compatibility")
+    else:
+        raise FileNotFoundError
+except (FileNotFoundError, subprocess.SubprocessError):
+    print("WARNING: Compatible GCC version not found. nvdiffrast may fail to compile.")
+    print("Consider installing GCC 12: sudo apt install gcc-12 g++-12")
+    # Fall back to disabling CUDA extensions
+    os.environ['NVDIFFRAST_FORCE_OPENGL'] = '1'
+
 import imageio
 import numpy as np
 import torch
+
+# Monkey patch PyTorch's CUDA capability detection for nvdiffrast compatibility
+def patched_get_device_capability(device=None):
+    """Return Ada Lovelace capability instead of Blackwell for nvdiffrast compatibility"""
+    return (8, 9)  # Ada Lovelace instead of (12, 0) Blackwell
+
+if torch.cuda.is_available():
+    original_get_device_capability = torch.cuda.get_device_capability
+    torch.cuda.get_device_capability = patched_get_device_capability
+    print("Patched PyTorch CUDA capability detection for nvdiffrast compatibility")

describe_scene

I wrote a wrapper around Qwen2.5-VL-3B-Instruct, a popular model for computer vision. The hope was that the agent would run get_viewport_screenshot, then describe_scene to understand qualitatively what the scene looks like, and adapt based on feedback. But the feedback from describe_scene isn't detailed enough to be actionable. The agent already knows "I'm building a desert scene in Blender", and all describe_scene says, "This looks like a screenshot from Blender depicting a desert scene".

Agent Creation

I use Claude's agent generation CLI, then added an addendum to the generated prompt with details on tool use:

Tool Use:

If you want to generate a simple mesh, execute Blender code to achieve this. If you want to generate a more complex mesh, invoke generate_lores_mesh and then import the resulting file into the Blender scene and place it as you wish. The lores meshes do have color data, so you must plug the "Color Attribute" node with the value "Color" (NOT the "Attribute" node) into the Base Color on these meshes. An imported mesh will have its center of mass located at Z = 0, so you ought to move these up so they don't clip with the ground.

If you want to create an interesting distant background, invoke generate_image and place the resulting image at an appropriate place in the scene. I recommend that you import background and distant midground images as planes and orient them facing the camera to achieve the illusion of depth.

If you want to add a texture on an object, invoke generate_image, then invoke generate texture to produce PBR texture images, then create a material that loads them into Blender. This is mandatory for objects that should have a detailed repeating texture, like rocks, sand, stone walls, and grass.

If you want to understand the current state of the scene, invoke get_viewport_screenshot, then call describe_scene with the resulting image filepath. Please do this at the very end so that you can tell me what the computer vision says about the scene.

Demo Analysis

0:00
initial prompt

ask the blender agent to create a scene set in a desert nomad's camp. there are men gathered around a well, drinking tea. there are animals grazing on scant pasture, like goats, and camels, and there are horses as well. there are objects of varying height that create a sense of depth in the scene, such as tents, rugged topography, and occasional plants. there are sand dunes in the mid ground and distant mountains in the background.

0:23
Agent blocked out the scene some initial meshes and a background plane.

0:38
Agent added generated assets of a tent and a man, and duplicated them, but the outputs of the mesh generator are oriented facing down, so the objects are partially submerged. I could fix this specific bug and correct the orientation in the Shap-E script, but in general, this agent doesn't know when a mesh is facing the wrong way, or is clipping another mesh.

1:32
Agent generates a sand texture with generate_image and generate_texture. I'm pleased with the visual quality for something that was autogenerated.

1:45
Agent tries to setup camera and Blender crashes. This was the first Blender crash I saw across 5 runs of the agent, but I kept it in the demo to show how the agent resumes after failure. I manually restarted Blender, resumed listening to the MCP server, and reprompted the agent. Sadly, it looks like we lost the sand texture here.

2:51
Agent adds some code-generated horses, seemingly unware it's already added mesh-generated horses.

3:17
Agent tried modeling some desert plants and is now in edit mode. It tries to add lighting, but Blender is context specific so this isn't a valid operation in edit mode.

3:42
Agent adds materials to everything.

3:46
Agent is done, now I inspect its work. I tell it to fix some materials it overwrote, and to correct the direction of the imported meshes.

4:31
I tell the agent to apply another fix on the material nodes.

We Haven't Won Yet

I learned a lot from this project, and am eager to try the following next:

Multiple agents: have a composition agent fully generate and place the mesh assets, then a materials agent with specialized context assign materials to the objects in the scene, then a lighting agent, etc.
Concept art to scene generation: start with an image to text model to create a detailed description of the scene based on concept art, and have a tighter feedback loop where the image to text model critiques the render output until it satisfactorily resembles the concept art
Have the agent iterate over various render settings, toggling parameters like depth of field, contrast, etc. for look development
Have the agent generate many different meshes of the same prompt so I can select the one I like best
Add context to the agent to automate specific common workflows, like the billboard trees with custom normals pipeline
Add MCP tools for generative animation, investigate UniMoCap, T2M-X, and Meta's Make-An-Animation paper
Add MCP tools for generative rigging, investigate HumanRig and UniRig
Experiment with prompts for highly experimental mathematical artwork; I suspect the agent would be good at placing cubes in interesting symmetric ways, and producing unique motion graphics
Experiment with prompts involving geometry nodes
Create a database or search index of scene data across all projects, so the agent can help manage my asset library
Hook the mesh generation pipeline into a retopology tool; so far I haven't seen a mesh generation model that produces good mesh topology

Discuss this post on Hacker News