Claude Code + BlenderMCP Scene Agent

August 31, 2025

I packaged several models as MCP tools and ran them with Claude Code and BlenderMCP to make a scene generation agent.

What Does This Mean?

Model Context Protocol is a mechanism for surfacing additional tools to LLMs. As a large language model, Claude is aware of Blender as a concept, but can't natively interact with the program. MCP lets us surface arbitrary tools for Claude to interact with, enabling the development of more powerful and autonomous agents.

The popular BlenderMCP addon surfaces several useful Blender tools: get_scene_info, get_object_info, get_viewport_screenshot, and execute_blender_code. This handful of commands alone considerably uplevels Claude's Blender power: now Claude can ideate scenes, execute Blender Python to generate scene data (meshes, materials, lighting, etc.), and inspect the state of the scene.

Blender MCP also surfaces some remote tools like PolyHaven and Sketchfab asset libraries, and Hyper3D, a paid service with an impressive asset generation model. But for the sake of learning and exercising my new GPU, I set about writing MCP tools to run additional generative models locally.

Setting up BlenderMCP Across WSL

After 10 years of not developing on a Windows machine, I was pleased to learn that nowadays, you can easily run Linux VMs on Windows via WSL (Windows Subsystem for Linux). I was able to run VSCode (with the highly useful Blender extension) and Blender in Windows, but Claude Code and Python model inference scripts in Linux. This workflow is about 90% there, but there were some very subtle sharp knives to watch out for.

1. Windows Localhost Is Not Linux Localhost
Under normal operation, BlenderMCP listens to localhost:9876. Then your MCP client (e.g. Claude) will invoke tools by talking to a socket on localhost:9876 using uvx blender-mcp. However, if you run Blender in Windows and Claude in WSL, they do not have the same localhost IP address and will be unable to communicate. This should be configurable via the BLENDER_HOST param; I tried claude mcp add blender --env BLENDER_HOST=$(ip route | awk '/default/ { print $3 }') -- uvx blender-mcp but the environment variable didn't appear to be passed. So I hacked around this by replacing all instances of localhost with my WSL localhost, ip route | awk '/default/ { print $3 }', then called the locally changed MCP server via: claude mcp add blender -- python /mnt/c/Users/name/blender-mcp/src/blender_mcp/server.py

2. Struggles to Read Images
Claude Code supports image understanding, but it usually didn't work across WSL. WSL has a shared directory, known to WSL as /mnt/c/Users/$USER/ and to Windows as C:\Users\$USER\. So if I tried to drag an image from my Windows file browser into Claude Code, the Windows filepath was passed, and Claude couldn't read it using Linux file utilities.

Tool Development

generate_blender_code

For learning's sake, I wanted to finetune a base model to generate Blender code locally. I started with deepseek-coder-6.7b-base and was surprised by my initial interactions:

$ ollama run deepseek-coder:6.7b-base
>>> write a python function to reverse a string
.sample string: 1234abcdoutput :dcba
<jupyter_code>
def rev(s):
    st=""
    for i in s:
        st =i+st   #or  st=i + str     --by join method
    return st
rev("1234abcd")
str[::-1]#--by slicing operator
s="1234abcd"
l=list(s)
l.reverse() # -- by list method
"".join(l)
#using while loop
def rev_whilwloop(string):
    rstr = ""
    length = len(string) - 1
    while length >= 0:
        rstr += string[length]
        length -= 1
    return rstr
rev_whilwloop("1234abcd")
<jupyter_output>
<empty_output>

The model generated various string reversal methods of varying quality, along with commentary, method calls, and unrelated tokens, rather than the neatly formatted chatbot output we're accustomed to. Let's unpack the model name to understand what's going on:

From this point, we have several high level control knobs:

I wrote a script that runs LORA on the base model, with a Claude-generated dataset of about 200 examples. Yes, I used an LLM to write a script to tune another LLM with LLM-generated data. Example data:

instruction: Create a cube in Blender using Python
output: # Create a cube\nbpy.ops.mesh.primitive_cube_add(location=(0, 0, 0))

The model performance improved from 0% to about 50% accuracy after finetuning. Not good enough for production, but it felt impressive with such a small training set. Considering that prior to tuning, the model wouldn't even generate actual code for Blender themed prompts.

prompt: Add a black material to the selected object
pre-tuning response:
    I would add a black material to this object, as it looks like there's not enough contrast between its color and background.
post-tuning response:
    # Add black material to active object
    if bpy.context.active_object and bpy.context.active_object.type == 'MESH':
        mat = bpy.data.materials.new(name="BlackMaterial")
        mat.use_nodes = True
        principled = mat.node_tree.nodes.get('Principled BSDF')
        if principled:
            principled.inputs['Base Color'].default_value = (0, 0, 0, 1)
        # Assign to object
        if bpy.context.active_object.data.materials:
            bpy.context.active_object.data.materials[0] = mat
        else:
            bpy.context.active_object.data.materials.append(mat)

After this exercise, I downloaded BlenderLLM, a more sophisticated tuning of Qwen2.5-Coder-7B-Instruct, then packaged it as a CLI tool using pipx, and wrote a NodeJS server to invoke it as an MCP tool:

// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    let result;
    
    switch (name) {
      case "generate_blender_code":
        // generateBlenderCode calls the inference CLI
        result = await generateBlenderCode(args.prompt); 
        return {
          content: [
            {
              type: "text",
              text: result
            }
          ]
        };
    ...

The final agent didn't make much use of this. I suppose Claude Code is already good enough at generating Blender code and iterating on feedback that it doesn't need to delegate to an MCP tool.

generate_image, generate_texture

Following the above pattern, generate_image is a wrapper around an inference script running Stable Diffusion XL. I wanted to try fine tuning SDXL to be able to generate PBR (physically based rendering) textures: images that look like the base color image, but encode specific 3D information like height or normals.

Base Color texture
Base Color
Ambient Occlusion texture
Ambient Occlusion
Displacement texture
Displacement
Normal Map texture
Normal Map
Roughness texture
Roughness

I downloaded some textures from AmbientCG, but wasn't able to vibecode a LORA script for SDXL due to lack of familiarity with diffusion model training, so I went with a simpler approach inspired by this post: generate a color texture image, then use basic image processing to generate each PBR map. For example:

def generate_displacement(color_image):
    """
    Generate Displacement map from color image.
    
    Displacement represents surface height variations.
    Uses luminance as a proxy for surface elevation.
    """
    # Convert to luminance
    luminance = rgb_to_luminance(color_image)
    
    # Apply bilateral filter to smooth while preserving edges
    displacement = cv2.bilateralFilter(luminance.astype(np.uint8), 9, 75, 75)
    
    # Enhance height variation with gamma correction
    displacement = displacement.astype(np.float32) / 255.0
    displacement = np.power(displacement, 0.8)  # Gamma < 1 enhances midtones
    displacement = (displacement * 255).astype(np.uint8)
    
    return displacement

This is a naive approach. It's pretty fake to guess at PBR textures from the color data, since the whole point of the PBR textures is to convey 3D information that can't be derived from the color data. There are models that can generate better PBR maps based on actual 3D knowledge within the network (such as ldm3d-4c for depth map generation or MeshGen), but this was good enough for prototyping a generate_texture tool.

generate_lores_mesh

I explored a few different approaches for generating mesh assets, for the prompt "3D model of a wooden hammer".

Code Models

Claude generated hammer mesh BlenderLLM generated hammer mesh

We can generate plain code for creating and manipulating meshes via sequential Blender API calls. On the left, Claude produced a mesh that looks somewhat like a hammer in 240 lines of code, and on the right, BlenderLLM produced a mesh that looks slightly more like a hammer, in a more succinct 40 lines of code.

LLaMA-Mesh

LLaMA-Mesh generated hammer mesh

The LLaMA-Mesh approach is cool: this is a tuning of LLaMA, a language model. They're using a language model to generate vertices and faces, one token at a time, relying on spatial reasoning of the language model. The output is a valid OBJ file, thanks to that file format's simplicity:

  # Simple cube OBJ file
  # 8 vertices for a unit cube
  v -0.5 -0.5 -0.5
  v  0.5 -0.5 -0.5
  v  0.5  0.5 -0.5
  v -0.5  0.5 -0.5
  v -0.5 -0.5  0.5
  v  0.5 -0.5  0.5
  v  0.5  0.5  0.5
  v -0.5  0.5  0.5

The downside is that language models don't appear to have great spatial reasoning skills: the output topology is messy, and this doesn't appear to extend beyond simple meshes. It took almost an hour to do inference on my GPU (but since other tools have made practical use of this, there must have been a bug in my inference code).

Shap-E

Shap-E generated hammer mesh

OpenAI's Shap-E consistently generated nice low-poly, reminiscent of old-school Runescape assets. This is the model I ultimately used for this MCP tool.

InstantMesh

InstantMesh generated hammer mesh

InstantMesh produces some beautiful results, though the output mesh frequently has holes not present in the input. This is a single image to 3D mesh model; the above example used this CC0 glazed porcelain hammer as input. I wanted to write a pipeline that converts text -> image via SDXL, then image -> mesh via InstantMesh. However, the dependencies were too messy for me to package this as a CLI script. I had to do a bit of patching just to get the basic demo working; the dependency nvdiffrast didn't support newer GPUs, so I forced the script to use earlier CUDA and GCC versions:

+#!/usr/bin/env python
+# Set environment variables BEFORE any imports
 import os
+os.environ['TORCH_CUDA_ARCH_LIST'] = '8.9'  # Force Ada Lovelace compatibility for Blackwell GPUs
+os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
+os.environ['FORCE_CUDA'] = '1'
+# Try to use system GCC 11 if available, otherwise disable CUDA rendering
+import subprocess
+try:
+    result = subprocess.run(['gcc-11', '--version'], capture_output=True, text=True)
+    if result.returncode == 0:
+        os.environ['CC'] = 'gcc-11'
+        os.environ['CXX'] = 'g++-11'
+        print("Using GCC 11 for CUDA compatibility")
+    else:
+        raise FileNotFoundError
+except (FileNotFoundError, subprocess.SubprocessError):
+    print("WARNING: Compatible GCC version not found. nvdiffrast may fail to compile.")
+    print("Consider installing GCC 12: sudo apt install gcc-12 g++-12")
+    # Fall back to disabling CUDA extensions
+    os.environ['NVDIFFRAST_FORCE_OPENGL'] = '1'
+
 import imageio
 import numpy as np
 import torch
+
+# Monkey patch PyTorch's CUDA capability detection for nvdiffrast compatibility
+def patched_get_device_capability(device=None):
+    """Return Ada Lovelace capability instead of Blackwell for nvdiffrast compatibility"""
+    return (8, 9)  # Ada Lovelace instead of (12, 0) Blackwell
+
+if torch.cuda.is_available():
+    original_get_device_capability = torch.cuda.get_device_capability
+    torch.cuda.get_device_capability = patched_get_device_capability
+    print("Patched PyTorch CUDA capability detection for nvdiffrast compatibility")

describe_scene

I wrote a wrapper around Qwen2.5-VL-3B-Instruct, a popular model for computer vision. The hope was that the agent would run get_viewport_screenshot, then describe_scene to understand qualitatively what the scene looks like, and adapt based on feedback. But the feedback from describe_scene isn't detailed enough to be actionable. The agent already knows "I'm building a desert scene in Blender", and all describe_scene says, "This looks like a screenshot from Blender depicting a desert scene".

Agent Creation

I use Claude's agent generation CLI, then added an addendum to the generated prompt with details on tool use:

Tool Use:

Demo Analysis

0:00
initial prompt

ask the blender agent to create a scene set in a desert nomad's camp. there are men gathered around a well, drinking tea. there are animals grazing on scant pasture, like goats, and camels, and there are horses as well. there are objects of varying height that create a sense of depth in the scene, such as tents, rugged topography, and occasional plants. there are sand dunes in the mid ground and distant mountains in the background.

0:23
Agent blocked out the scene some initial meshes and a background plane.

0:38
Agent added generated assets of a tent and a man, and duplicated them, but the outputs of the mesh generator are oriented facing down, so the objects are partially submerged. I could fix this specific bug and correct the orientation in the Shap-E script, but in general, this agent doesn't know when a mesh is facing the wrong way, or is clipping another mesh.

1:32
Agent generates a sand texture with generate_image and generate_texture. I'm pleased with the visual quality for something that was autogenerated.

1:45
Agent tries to setup camera and Blender crashes. This was the first Blender crash I saw across 5 runs of the agent, but I kept it in the demo to show how the agent resumes after failure. I manually restarted Blender, resumed listening to the MCP server, and reprompted the agent. Sadly, it looks like we lost the sand texture here.

2:51
Agent adds some code-generated horses, seemingly unware it's already added mesh-generated horses.

3:17
Agent tried modeling some desert plants and is now in edit mode. It tries to add lighting, but Blender is context specific so this isn't a valid operation in edit mode.

3:42
Agent adds materials to everything.

3:46
Agent is done, now I inspect its work. I tell it to fix some materials it overwrote, and to correct the direction of the imported meshes.

4:31
I tell the agent to apply another fix on the material nodes.

We Haven't Won Yet

I learned a lot from this project, and am eager to try the following next:


Discuss this post on Hacker News