- Voice Agents - Talk to a voice agent about information stored in Box.
- Box integration - Search and knowledge retrieval from documents stored in Box.
- GitHub integration - Report issues by talking to a voice agent.
Prerequisites
Before starting, ensure you have:- Sign up and install the
autonomycommand. - A Box developer account with API credentials.
- A GitHub personal access token.
- Docker running on your machine.
Project Structure
File Structure:
autonomy-and-box/
|-- autonomy.yaml # Deployment configuration
|-- secrets.yaml # Your API credentials (gitignored)
|-- secrets.yaml.example # Template for credentials
|-- images/
| |-- main/
| |-- Dockerfile # Container definition
| |-- main.py # Application entry point
| |-- box.py # Box API client
| |-- github.py # GitHub issue creation tool
| |-- index.html # Voice interface
| |-- requirements.txt
|
|-- scripts/
|-- upload_docs_to_box.py # Utility to populate Box
Step 1: Clone the Repository
git clone https://github.com/build-trust/autonomy-and-box.git
cd autonomy-and-box
Step 2: Configure Box Credentials
Create a Box application in the Box Developer Console:- Create a new Custom App.
- Select Server Authentication (Client Credentials Grant).
- Under Configuration, note your:
- Client ID.
- Client Secret.
- Enterprise ID.
cp secrets.yaml.example secrets.yaml
secrets.yaml:
secrets.yaml
BOX_CLIENT_ID: "your_box_client_id"
BOX_CLIENT_SECRET: "your_box_client_secret"
BOX_ENTERPRISE_ID: "your_box_enterprise_id"
GITHUB_TOKEN: "your_github_token"
GITHUB_REPO: "your-org/your-repo"
Never commit
secrets.yaml to version control. Itβs already in .gitignore.Step 3: Configure GitHub Access
Create a GitHub Personal Access Token withrepo scope to allow issue creation.
Add the token and target repository to your secrets.yaml:
GITHUB_TOKEN: "ghp_xxxxxxxxxxxxxxxxxxxx"
GITHUB_REPO: "your-org/your-repo"
Step 4: Upload Documents to Box
The application searches documents stored in a Box folder. Use the included script to populate Box with sample documentation:cd scripts
pip install box-sdk-gen httpx
python upload_docs_to_box.py
- Fetches documentation from
autonomy.computer/docs/llms.txt. - Parses all markdown file URLs.
- Creates a
docsfolder in Box. - Uploads all documentation files.
Step 5: Understand the Application Code
The Main Application
The application creates a voice-enabled agent with access to a knowledge base and GitHub tools:images/main/main.py
from autonomy import (
Node,
Agent,
Model,
Knowledge,
KnowledgeTool,
NaiveChunker,
HttpServer,
Tool,
)
async def main(node: Node):
# Create knowledge base for document search
knowledge = Knowledge(
name="autonomy_docs",
searchable=True,
model=Model("embed-english-v3"),
max_results=5,
max_distance=0.4,
chunker=NaiveChunker(max_characters=1024, overlap=128),
)
# Create tools
knowledge_tool = KnowledgeTool(knowledge=knowledge, name="search_autonomy_docs")
github_tool = Tool(create_github_issue)
# Start the voice-enabled agent
await Agent.start(
node=node,
name="autonomy-docs",
instructions=INSTRUCTIONS,
model=Model("claude-sonnet-4-v1", max_tokens=256),
tools=[knowledge_tool, github_tool],
voice={
"voice": "alloy",
"instructions": VOICE_INSTRUCTIONS,
"vad_threshold": 0.7,
"vad_silence_duration_ms": 700,
},
)
# Load documents from Box
await load_documents_from_box(knowledge)
Box Integration
The Box client handles authentication and document retrieval:images/main/box.py
from box_sdk_gen import BoxClient, BoxCCGAuth, CCGConfig
class Box:
def __init__(self):
self.client = BoxClient(
auth=BoxCCGAuth(
config=CCGConfig(
client_id=environ["BOX_CLIENT_ID"],
client_secret=environ["BOX_CLIENT_SECRET"],
enterprise_id=environ["BOX_ENTERPRISE_ID"],
)
)
)
async def extract_text_representation(self, file_id: str) -> str:
"""Download file content from Box."""
return await self.box_call(box_file_download_content, self.client, file_id)
async def list_folder_items(self, folder_id: str):
"""List items in a Box folder."""
return await self.box_call(self.client.folders.get_folder_items, folder_id)
GitHub Issue Tool
The GitHub tool allows the agent to create issues based on user requests:images/main/github.py
async def create_github_issue(title: str, body: str, labels: str = "") -> str:
"""
Create a GitHub issue in the configured repository.
Args:
title: The title of the issue
body: The detailed description of the issue
labels: Comma-separated list of labels to apply (optional)
Returns:
A message indicating success or failure with the issue URL
"""
url = f"https://api.github.com/repos/{GITHUB_REPO}/issues"
async with httpx.AsyncClient() as client:
response = await client.post(url, headers=headers, json=data)
if response.status_code == 201:
issue_data = response.json()
return f"Successfully created issue #{issue_data['number']}: {issue_data['html_url']}"
Agent Instructions
The agent has two sets of instructions - one for the primary agent and one for the voice interface:INSTRUCTIONS = """
You are an expert assistant that answers questions about Autonomy.
You have access to a knowledge base containing complete documentation.
Use the search_autonomy_docs tool to find accurate information before answering.
IMPORTANT: Keep your responses concise - ideally 2-4 sentences. This assistant
is primarily used through a voice interface, so brevity is essential.
You also have the ability to create GitHub issues when users want to:
- Report bugs or problems.
- Request new features.
- Ask for documentation improvements.
"""
VOICE_INSTRUCTIONS = """
You are a voice interface for an Autonomy documentation assistant.
# Personality
- Friendly and approachable, like a helpful colleague
- Concise and clear - respect the user's time
- Confident but not condescending
# Critical Rules
1. Before answering ANY question, say a filler phrase first.
Pick one randomly: "Good question." / "Right, so." / "That's a good question."
2. THEN delegate to the primary agent for the actual answer.
3. NEVER answer questions from your own knowledge - always delegate.
"""
Step 6: Deploy the Application
Deploy to Autonomy Computer:autonomy zone deploy
autonomy.yaml defines the infrastructure:
autonomy.yaml
name: boxdocs
pods:
- name: main-pod
public: true
size: big
containers:
- name: main
image: main
env:
- BOX_CLIENT_ID: secrets.BOX_CLIENT_ID
- BOX_CLIENT_SECRET: secrets.BOX_CLIENT_SECRET
- BOX_ENTERPRISE_ID: secrets.BOX_ENTERPRISE_ID
- BOX_FOLDER_PATH: "docs"
- GITHUB_TOKEN: secrets.GITHUB_TOKEN
- GITHUB_REPO: secrets.GITHUB_REPO
The
size: big setting allocates more resources for the embedding model and voice processing.Step 7: Access the Voice Interface
Once deployed, open your zone URL in a browser:https://${CLUSTER}-boxdocs.cluster.autonomy.computer
autonomy cluster show
Using the Application
Voice Commands
Try these voice interactions:- βWhat is Autonomy?β - Searches the knowledge base and responds.
- βHow do I create an agent?β - Retrieves relevant documentation.
- βI found a bug, help me report itβ - Creates a GitHub issue.
- βCan you file a feature request for better logging?β - Creates a GitHub issue.
API Access
You can also interact via HTTP:curl --request POST \
--header "Content-Type: application/json" \
--data '{"message":"What are tools in Autonomy?"}' \
"https://${CLUSTER}-boxdocs.cluster.autonomy.computer/agents/autonomy-docs?stream=true"
Refresh Knowledge Base
The knowledge base automatically refreshes every hour. To manually refresh:curl --request POST \
"https://${CLUSTER}-boxdocs.cluster.autonomy.computer/refresh"
How It Works
Document Loading
When the application starts:- Connects to Box using CCG authentication.
- Navigates to the configured folder path (
docs). - Recursively lists all files in the folder.
- Downloads each fileβs text content.
- Chunks documents and generates embeddings.
- Stores embeddings in the knowledge base.
Voice Flow
When a user speaks:- Browser captures audio via Web Audio API.
- Audio streams to the agent via WebSocket.
- Voice Activity Detection (VAD) detects speech boundaries.
- Speech is transcribed and sent to the voice agent.
- Voice agent delegates to the primary agent.
- Primary agent searches knowledge and/or creates issues.
- Response is synthesized to speech.
- Audio streams back to the browser.
Knowledge Search
When searching documents:- Query is embedded using Cohereβs embed-english-v3.
- Vector similarity search finds relevant chunks.
- Top 5 results within distance threshold (0.4) are returned.
- Agent uses retrieved context to answer.
Configuration Options
Voice Settings
Customize voice behavior inmain.py:
voice={
"voice": "alloy", # Voice model: alloy, echo, fable, onyx, nova, shimmer
"instructions": VOICE_INSTRUCTIONS,
"vad_threshold": 0.7, # Speech detection sensitivity (0.0-1.0)
"vad_silence_duration_ms": 700, # Silence before end of speech
}
Knowledge Settings
Tune document search:knowledge = Knowledge(
name="autonomy_docs",
searchable=True,
model=Model("embed-english-v3"),
max_results=5, # Number of results to return
max_distance=0.4, # Similarity threshold (lower = stricter)
chunker=NaiveChunker(
max_characters=1024, # Chunk size
overlap=128 # Overlap between chunks
),
)
Environment Variables
| Variable | Description |
|---|---|
BOX_CLIENT_ID | Box OAuth client ID |
BOX_CLIENT_SECRET | Box OAuth client secret |
BOX_ENTERPRISE_ID | Box enterprise ID |
BOX_FOLDER_PATH | Path to documents folder in Box |
MAX_DOCUMENTS | Limit documents loaded (0 = all) |
GITHUB_TOKEN | GitHub personal access token |
GITHUB_REPO | Target repository (owner/repo) |
Build with a coding agent
See the guide on building Autonomy apps using coding agents.Troubleshooting
Learn More
Voice
Give agents the ability to listen and speak.
Knowledge bases
Give agents the ability to search a corpus of documents.
Tools
Give agents the ability to take actions.
File structure
How to organize an application built with the Autonomy Framework.
