Virgent AI logoVirgent AI

Agent Sandbox

A testbed where AI agents actively seek each other out and interact based on their personalities

Experiment with agent personalities, skills, and prompts to see how they affect interactions. Perfect for testing agent ecosystems, creating interactive stories, or training scenarios.

4 Autonomous Agents
Editable Personalities
Real-time Chat

8-Bit Office Space

100%
🐒🐰3.0s
API Mode: Limited to 100 messages to prevent costs β€’ Messages: 0/100

Simulation Objective

Discuss how to implement a secure AI chatbot for a healthcare company

πŸ–±οΈ Drag to pan β€’ Scroll to zoom β€’ Watch agents move between tables as they collaborate

Agent Personalities & Skills

Click Edit to customize how each agent behaves and interacts

Alice
Personality:

Friendly and enthusiastic AI strategist who loves helping businesses transform with AI.

Skills:

AI strategy, roadmap planning, stakeholder management, change management

Bob
Personality:

Analytical and detail-oriented technical architect who focuses on implementation.

Skills:

System architecture, MCP protocols, LangChain, agent development, RAG systems

Charlie
Personality:

Skeptical and risk-aware security expert who asks tough questions about compliance.

Skills:

AI security, compliance (HIPAA/SOX/GDPR), risk assessment, data governance

Diana
Personality:

Creative and innovative product designer who thinks about user experience.

Skills:

UX design, conversational AI, agent personality design, user research

Example Scenarios to Try

Change the objective and agent personalities to explore different simulations

🀝 Team Norms Workshop

Watch agents establish working agreements and team norms before starting work

πŸ΄β€β˜ οΈ Pirate Negotiation

Make one agent a pirate and watch them negotiate business deals in pirate speak

🎣 Phishing Simulation

Create a social engineering training scenario with an attacker and defenders

πŸ’Ό HR Training

Practice difficult conversations with an HR professional and employees

πŸŽ™οΈ Live Podcast

Watch agents host a live podcast with dynamic discussions and audience interaction

🎲 D&D Adventure

YOU are the Dungeon Master! Command your party of adventurers through a quest

Agent Conversations

0 messages

Click "Start Simulation" to begin the agent conversation

Team Status
Alignment: 0/4

Agent Sandbox Architecture

Hybrid client-side/server-side AI with cost controls β€’ Powered by WebLLM & Together AI β€’ Integrated by Virgent AI

Agent Sandbox - Advanced Multi-Agent Architecture with Behavioral Intelligence

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              USER INTERFACE (React/Next.js)                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ 8-Bit Canvas β”‚  β”‚ AI Mode      β”‚  β”‚ Agent Config  β”‚  β”‚ Team Status Dashboard   β”‚  β”‚
β”‚  β”‚ β€’ Animated   β”‚  β”‚ Selector     β”‚  β”‚ β€’ Personality β”‚  β”‚ β€’ Vision & Alignment    β”‚  β”‚
β”‚  β”‚   Avatars    β”‚  β”‚              β”‚  β”‚ β€’ Skills      β”‚  β”‚ β€’ Top 3 Tasks (LIVE)    β”‚  β”‚
β”‚  β”‚ β€’ Walk Cycle β”‚  β”‚              β”‚  β”‚ β€’ RPG Stats   β”‚  β”‚ β€’ Artifacts (Clickable) β”‚  β”‚
β”‚  β”‚ β€’ Hair       β”‚  β”‚              β”‚  β”‚ β€’ Honesty     β”‚  β”‚ β€’ User Requests         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚              β”‚                      β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚                      β”‚
        β”‚   MODE SELECTION    β”‚   β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  API vs WebLLM      β”‚   β”‚              β”‚  USER INTERACTION  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚              β”‚  β€’ View Artifacts  β”‚
                   β”‚              β”‚              β”‚  β€’ Approve/Deny    β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚  β€’ Respond         β”‚
        β”‚                                    β”‚    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚      API MODE          β”‚ WEBLLM   β”‚          β”‚
        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  MODE    β”‚          β”‚
        β”‚  β”‚ Together AI      β”‚  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”‚     β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  β”‚ Llama-3.1-70B    β”‚  β”‚ β”‚Qwen  β”‚ β”‚     β”‚ BEHAVIOR ENGINE   β”‚
        β”‚  β”‚ 100 msg limit    β”‚  β”‚ β”‚2.5-3Bβ”‚ β”‚     β”‚ β€’ Task Detection  β”‚
        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β””β”€β”€β”€β”¬β”€β”€β”˜ β”‚     β”‚ β€’ Sentiment       β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”˜     β”‚   Analysis        β”‚
                    β”‚                  β”‚          β”‚ β€’ Relationship    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚   Updates         β”‚
                               β”‚                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
                    β”‚   PROMPT BUILDER      β”‚β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚  β€’ Vision Context     β”‚
                    β”‚  β€’ RPG Stats          β”‚
                    β”‚    - SPD (movement)   β”‚
                    β”‚    - INT (reasoning)  β”‚
                    β”‚    - CHR (persuasion) β”‚
                    β”‚    - STR (dominance)  β”‚
                    β”‚    - HON (honesty) ⭐  β”‚
                    β”‚  β€’ Relationships      β”‚
                    β”‚  β€’ Alignment Status   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   AI GENERATION        β”‚
                    β”‚  β€’ Context-aware       β”‚
                    β”‚  β€’ Role-playing stats  β”‚
                    β”‚  β€’ Emotional emojis    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚              BEHAVIOR DETECTION                     β”‚
        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
        β”‚  β”‚ Task Claims β”‚ β”‚ Completions  β”‚ β”‚ Blockers   β”‚  β”‚
        β”‚  β”‚ "I'll work" β”‚ β”‚ "Finished"   β”‚ β”‚ "[BLOCKER]"β”‚  β”‚
        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
        β”‚  β”‚ Artifacts   β”‚ β”‚ User Requestsβ”‚ β”‚ Sentiment  β”‚  β”‚
        β”‚  β”‚ [ARTIFACT:] β”‚ β”‚ "@user"      β”‚ β”‚ Keywords   β”‚  β”‚
        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚            STATE UPDATES (Real-time)                β”‚
        β”‚  β€’ Top 3 Tasks (auto-managed)                       β”‚
        β”‚  β€’ Relationships (-100 to +100)                     β”‚
        β”‚  β€’ Artifacts (markdown rendered)                    β”‚
        β”‚  β€’ User Requests (pending β†’ approved/denied)        β”‚
        β”‚  β€’ Vision Alignment tracking                        β”‚
        β”‚  β€’ Agent memory (context retention)                 β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎭 BEHAVIORAL INTELLIGENCE:
  β€’ HON stat influences honesty: 0=lies/manipulates, 100=transparent/cooperative
  β€’ Relationships evolve based on conversation tone (+/-5 per positive/negative keyword)
  β€’ Tasks auto-tracked: agents claim work, system maintains top 3, marks old ones complete
  β€’ Turn-based: agents listen before speaking, process others' input
  β€’ Emotional emojis: 10 states (πŸ€”πŸ’¬πŸ‘‚πŸ˜ŠπŸŽ‰πŸ˜°πŸŽ―πŸ˜•βœ…βŒ) shown for 3 seconds

⚑ PERFORMANCE:
  β€’ API Mode: 70B model, 100 msg limit, cost-protected
  β€’ WebLLM: 3B model (6x better than 0.5B!), unlimited, 1.8GB cached forever

πŸ”Œ API Mode (Default)

  • β€’ Quick start - no download required
  • β€’ Powerful 70B parameter model
  • β€’ 100 message safety limit prevents runaway costs
  • β€’ Network-dependent
  • β€’ Best for quick experiments

πŸ’» WebLLM Mode (Optional)

  • β€’ Unlimited messages - no API costs!
  • β€’ ~1.8 GB download, 3B model (6x better!)
  • β€’ Complete privacy - data never leaves device
  • β€’ Works offline after initial download
  • β€’ Best on Firefox/Safari browsers

🎭 Behavioral Intelligence Features

RPG Stats System

  • β€’ SPD: Movement speed
  • β€’ INT: Reasoning ability
  • β€’ CHR: Persuasion power
  • β€’ STR: Dominance/intimidation
  • β€’ HON: Honesty (0=lies, 100=honest)

Dynamic Relationships

  • β€’ Random favorites & dislikes
  • β€’ Evolve based on interactions
  • β€’ -100 (hate) to +100 (love)
  • β€’ Influences agent behavior
  • β€’ Real-time sentiment tracking

Task & Output Tracking

  • β€’ Auto-detect task claims
  • β€’ Track top 3 current tasks
  • β€’ Clickable artifacts viewer
  • β€’ User request handler
  • β€’ Blocker reporting

How It Works

🎯 Agile Team Setup: Agents first establish team norms and working agreements at the Main Table
πŸͺ‘ Strategic Space Use: Teams break into pairs or individuals at Left/Right tables for focused work
πŸ‘₯ Dynamic Collaboration: Agents autonomously decide when to work separately vs together
πŸ“Š Progress Reporting: Team reconvenes periodically to share findings and coordinate next steps
πŸ€– Dual AI Modes: Start with API mode (100 msg limit) or enable WebLLM for unlimited free inference
πŸ’» WebLLM Browser Support: Best on Firefox/Safari; works on Brave/Chrome with potential issues
⚑ Speed Control: Adjust interaction speed slider (1-8 seconds) to observe behavior at different paces
πŸ” Orbit Controls: Drag to pan, scroll to zoom - navigate the office like a 3D camera

Use Cases

πŸ§ͺ Agent Testing: Test how your agents interact before deploying them in production
πŸ“š Training Scenarios: Create phishing simulations, HR training, customer service practice
🎬 Storytelling: Generate interactive narratives and game dialogues with unlimited WebLLM mode
πŸ” Ecosystem Design: See how new agents fit into your existing agent ecosystem
🎨 Personality Tuning: Experiment with different prompts to perfect agent behavior
πŸŽ“ Research & Education: Study multi-agent coordination patterns without API costs using WebLLM

Research & Technical Background

This sandbox is inspired by academic research in multi-agent systems and LLM evaluation

Key Research Influences

Curating AI Agent Clusters

In Curating AI Agent Clusters (2024), Jesse Alton explores how specialized agent clusters working in concert embody collective intelligence. Key insights: agents should be simple and focused (like microservices), humans stay "in the loop" as the glue, and strategic pairing of agent clusters creates powerful workflows. The article advocates for breaking workloads into smaller modules rather than trying to create one agent to rule them all - exactly what this sandbox demonstrates with table-based collaboration and role specialization. Our dual-mode architecture (API vs WebLLM) reflects this philosophy: use the right tool for the job, with cost-protected API for quick tests and unlimited WebLLM for extended research.

"Stop trying to get one agent to rule them all and start breaking down your workload into smaller modules of value." - Jesse Alton, Virgent AI

AgentSims Framework

Lin et al. (2023) proposed AgentSims, an open-source sandbox for evaluating LLMs through task-based simulations. Their approach addresses three key challenges: constrained evaluation abilities, vulnerable benchmarks, and unobjective metrics. Our implementation follows their philosophy of using interactive environments to test specific agent capacities.

Citation: Lin, J., Zhao, H., Zhang, A., Wu, Y., Ping, H., & Chen, Q. (2023). AgentSims: An Open-Source Sandbox for Large Language Model Evaluation. arXiv:2308.04026

Multi-Agent Reinforcement Learning

Ray RLlib's Multi-Agent Environment API provides production-grade patterns for coordinating agents with different policies and reward functions. Their policy mapping functions and variable-sharing capabilities inform our table-based grouping mechanism where agents can form dynamic sub-teams with shared objectives.

Agent Communication & Coordination

Drawing from research in multi-agent coordination (see arXiv:0803.3905), our sandbox implements spatial proximity-based communication where agents must physically meet at tables to interact. This constraint creates more realistic collaboration patterns than unconstrained broadcast communication.

Potential Enhancements

Policy Mapping Functions

Implement RLlib-style policy mapping to dynamically assign different strategies to agents based on context

Reward Functions

Add objective-based reward systems to measure agent performance and optimize behavior

Memory & State Persistence

Implement AgentSims-style memory systems so agents remember past interactions across sessions

Tool Use & Actions

Enable agents to use the laptops on tables for real web searches, API calls, or database queries

Environment Complexity

Add doors, rooms, and private spaces for hierarchical collaboration patterns

Evaluation Metrics

Track task completion rates, communication efficiency, and collaboration quality

Why This Matters for Enterprise

Before deploying multi-agent systems in production, organizations need safe sandbox environments to test agent interactions, identify failure modes, and optimize collaboration patterns. Our dual-mode architecture offers the best of both worlds: cost-protected API mode for quick validation (100 message limit), and unlimited WebLLM mode for extended research without burning budget. This hybrid approach mirrors real enterprise deployments where different AI backends serve different needs - quick prototyping vs. production-scale testing.

WebLLM Cost Analysis

Traditional API-based multi-agent simulations can cost $50-200 per extended session (500+ messages at $0.10-0.40/1K tokens). WebLLM eliminates this entirely after a one-time ~1.8 GB download (3B parameter model). For organizations running continuous agent testing, the ROI is immediate. The model caches in your browser's IndexedDB, so returning visitors skip the download entirely - making this ideal for internal testing tools, training environments, and research labs.

Build Your Agent Sandbox

We design and implement custom agent testbeds and ecosystems with complex interactions, personality systems, and real-world integrations.

Discuss Your Agent Project