VOOZH about

URL: https://www.sitepoint.com/continuedev-for-developers-the-complete-local-ai-coding-assistant-setup/

โ‡ฑ Continue.dev for Developers: The Complete Local AI Coding Assistant Setup


This metrics tool terrifies bad developers

Start free trial

This metrics tool terrifies bad developers

Start free trial
SitePoint Premium
Stay Relevant and Grow Your Career in Tech
  • Premium Results
  • Publish articles on SitePoint
  • Daily curated jobs
  • Learning Paths
  • Discounts to dev tools
Start Free Trial

7 Day Free Trial. Cancel Anytime.

How to Set Up Continue.dev as a Local AI Coding Assistant

  1. Install Ollama for your operating system and start the service with ollama serve.
  2. Pull the required models: a 7B code model for autocomplete, a larger model for chat, and an embeddings model.
  3. Install the Continue.dev extension or plugin in VS Code, JetBrains, or Neovim.
  4. Create the ~/.continue/config.json file with your model provider, context providers, and slash commands.
  5. Configure tab autocomplete settings including debounce delay, max tokens, and temperature for low-latency suggestions.
  6. Enable context providers like @codebase, @file, and @terminal for codebase-aware prompts.
  7. Add custom slash commands for team workflows such as code review and test generation.
  8. Version-control your config.json in a dotfiles repository, omitting any API keys.

AI coding assistants have reshaped how developers write software, but they route proprietary source code through third-party cloud services and carry recurring subscription costs. Continue.dev offers a different path: an open-source, local-first AI coding assistant that works across VS Code, JetBrains IDEs, and Neovim. This article walks through a fully configured local AI coding assistant running in your preferred editor, tuned for JavaScript, React, and Node.js development.

Note: This guide assumes Continue.dev v0.9.x (verify your installed version) and Ollama v0.3.x or later. Config keys and behavior may differ across versions. Pin your versions to ensure reproducibility.

Table of Contents

Why Local AI Coding Assistants Matter Now

AI coding assistants have reshaped how developers write software. Tools like GitHub Copilot, Cursor, and Codeium reduce boilerplate keystrokes and speed up common patterns, but they route proprietary source code through third-party cloud services and carry recurring subscription costs. For developers working under strict data governance policies, or those who simply want full control over their toolchain, the costs add up: data exposure risk, $10-20/month per seat, and vendor lock-in to a single provider's model choices. Continue.dev offers a different path: an open-source, local-first AI coding assistant that works across VS Code, JetBrains IDEs, and Neovim.

Continue.dev acts as middleware between an editor and any large language model, whether running locally through Ollama or hosted in the cloud. It supports tab autocomplete, inline chat, contextual code retrieval, and custom slash commands, all governed by a single JSON configuration file. This delivers codebase-aware retrieval comparable to Copilot's core features, without mandatory cloud dependencies.

This article walks through a fully configured local AI coding assistant running in your preferred editor, tuned for JavaScript, React, and Node.js development. Every configuration file shown is copy-paste ready.

What Is Continue.dev and How Does It Work?

Architecture Overview

Continue.dev operates across three layers. The outermost layer is the editor extension or plugin, which handles UI rendering, keybindings, and inline diff presentation. Beneath that sits the configuration layer, defined by a single config.json file that specifies which models to use, what context providers to enable, and which slash commands are available. The innermost layer is the model provider layer, which communicates with the actual LLM backend.

Continue.dev connects your editor to an LLM; it ships no model of its own. It routes prompts and code context from the editor to whatever model provider the developer configures: Ollama, LM Studio, llama.cpp, any OpenAI-compatible API endpoint, or cloud providers like Anthropic and OpenAI directly. To swap models or providers, change a few lines in the config file. Nothing needs reinstalling.

Continue.dev connects your editor to an LLM; it ships no model of its own. It routes prompts and code context from the editor to whatever model provider the developer configures.

Key Features at a Glance

When you pause mid-line, Continue.dev offers tab autocomplete with inline ghost text suggestions, comparable to Copilot's core experience. Highlighting code and pressing the chat keybinding lets you ask questions or request modifications without leaving the editor. Context providers such as @codebase, @file, @docs, @terminal, and @git inject relevant project context into prompts, which reduces hallucinated references and improves answer accuracy. Slash commands like /edit, /comment, and /share trigger predefined prompt templates. The entire system is extensible through custom model configurations, custom commands, and custom context providers.

Prerequisites and Local Model Setup with Ollama

Hardware Requirements

Running LLMs locally demands hardware that machines from 2020 onward with 8+ GB RAM can generally provide, though with trade-offs. 8 GB of system RAM is the practical floor for 7B models in Q4 quantization; 16 GB is recommended for comfortable headroom. For chat-oriented tasks where response quality matters more, 13B to 34B parameter models perform significantly better but require 32 GB of RAM or a dedicated GPU with sufficient VRAM. An NVIDIA GPU with at least 8 GB of VRAM cuts inference latency by roughly 5-10x compared to CPU-only execution. Apple Silicon Macs benefit from unified memory, which eliminates CPU-to-GPU copy overhead and makes the full system memory available to the GPU for inference.

Ensure at least 20 GB of free disk space for the recommended model set (codellama:7b-code is approximately 4 GB, deepseek-coder-v2:16b approximately 9 GB, plus the embeddings model).

For the setup described in this article, a 7B model handles autocomplete duties while a larger model handles chat. This split targets autocomplete suggestions arriving within 200-500 ms while preserving answer quality for interactive conversations.

Installing Ollama and Pulling Models

Ollama is the recommended local model runtime for Continue.dev. It manages model downloads, quantization, and serves models via a local HTTP API on port 11434.

# macOS (via Homebrew)
brew install ollama
# Linux
# Option 1 (recommended): Download the binary directly from GitHub releases:
# https://github.com/ollama/ollama/releases
# Option 2: Use the install script after reviewing and verifying it:
curl -fsSL https://ollama.com/install.sh -o ollama_install.sh
# Review the script contents before executing:
less ollama_install.sh
# Verify the checksum against the digest published at https://github.com/ollama/ollama/releases
sha256sum ollama_install.sh
# Only after confirming the checksum matches, run:
sh ollama_install.sh
# Windows: download the installer from https://ollama.com/download
# Start the Ollama service
ollama serve
# Verify the service is running before pulling models
curl http://localhost:11434/api/tags
# Pull recommended models for coding
# Autocomplete model โ€” small, fast, optimized for fill-in-the-middle
ollama pull codellama:7b-code
# Chat model โ€” larger, better reasoning for interactive tasks
ollama pull deepseek-coder-v2:16b
# Alternative chat model if hardware is constrained
ollama pull llama3.1:8b
# Embeddings model for @codebase semantic search
ollama pull nomic-embed-text
# Verify models are available
ollama list

After these commands complete, Ollama should be serving models at http://localhost:11434. The codellama:7b-code model is specifically trained for code completion with fill-in-the-middle capability, making it well suited for autocomplete. The deepseek-coder-v2:16b model provides stronger reasoning for chat-based code generation and refactoring tasks. The nomic-embed-text model is a small embedding model used for the @codebase semantic search feature.

Installation Across VS Code, JetBrains, and Neovim

VS Code Installation

In VS Code, open the Extensions panel (Ctrl+Shift+X), search for "Continue," and install the extension published by Continue.dev. On first launch, Continue.dev presents a setup wizard that walks through selecting a model provider. For a local-only setup, select Ollama as the provider. The extension creates its configuration file at ~/.continue/config.json, which is the central artifact for all subsequent customization.

The Continue.dev sidebar panel appears in the left activity bar, providing access to the chat interface. Tab autocomplete activates automatically once a tabAutocompleteModel is defined in the configuration.

JetBrains Installation (IntelliJ, WebStorm, PyCharm)

In any JetBrains IDE, navigate to Settings โ†’ Plugins โ†’ Marketplace and search for "Continue." Install the plugin and restart the IDE. The JetBrains integration reads from the same ~/.continue/config.json file as VS Code, meaning any configuration done for one editor applies to the other automatically.

JetBrains-specific quirks exist. The plugin's tool window appears in the right sidebar by default rather than the left. Indexing-heavy operations in JetBrains IDEs can occasionally conflict with Continue.dev's own file indexing for the @codebase context provider, leading to elevated memory usage during initial project opens. Disabling JetBrains' built-in AI Assistant plugin, if installed, avoids keybinding conflicts.

Neovim Installation

Neovim integration requires a plugin manager. The following is an example lazy.nvim spec. Verify the exact repo path, build command, and function names against the official Continue.dev Neovim documentation before using, as the plugin packaging may change between versions. Node.js and npm are required for the build step.

-- In your lazy.nvim plugin spec (e.g., lua/plugins/continue.lua)
-- WARNING: Verify this spec against https://docs.continue.dev/installation/neovim
-- The repo path, build command, and function names below must match
-- the current official instructions. The VS Code extension build path
-- is NOT correct for Neovim โ€” check the repo for the proper build target.
return {
 "continuedev/continue",
 build = "make", -- Verify the correct build target in the repo root
 lazy = false,
 config = function()
 -- Open Continue.dev chat panel
 vim.keymap.set("n", "<leader>cc", function()
 vim.cmd("ContinueChat")
 end, { desc = "Continue Chat" })
 -- Send highlighted code to Continue.dev
 vim.keymap.set("v", "<leader>cs", function()
 vim.cmd("ContinueSendToChat")
 end, { desc = "Send to Continue Chat" })
 -- Trigger inline edit on selection
 vim.keymap.set("v", "<leader>ce", function()
 vim.cmd("ContinueInlineEdit")
 end, { desc = "Continue Inline Edit" })
 -- Accept autocomplete suggestion
 -- WARNING: Verify function names against the installed plugin's documentation.
 -- Run :function /continue# in Neovim to list available functions.
 vim.keymap.set("i", "<Tab>", function()
 if vim.fn["continue#has_suggestion"]() == 1 then
 return vim.fn["continue#accept_suggestion"]()
 else
 -- Must return a terminal keycode, not a raw string, in expr maps
 return vim.api.nvim_replace_termcodes("<Tab>", true, false, true)
 end
 end, { expr = true })
 end,
}

This configuration binds the core Continue.dev actions to leader-key combinations and wires Tab to accept autocomplete suggestions when they are available. The Neovim integration reads from the same ~/.continue/config.json as the other editors.

Unified Configuration: One Config, Every Editor

The single most powerful aspect of Continue.dev's architecture is that ~/.continue/config.json is shared across VS Code, JetBrains, and Neovim. Model, context provider, and slash command settings are shared. Editor-specific behavior (keybindings, sidebar placement, plugin-specific commands) varies per editor. This file is plain JSON and version-controllable. Keeping it in a dotfiles repository means a developer's AI assistant configuration travels with them across machines. Do not commit config.json to a public repository if it contains API keys for cloud providers.

Note: On Windows, the config path resolves to %USERPROFILE%\.continue\config.json.

Note: Continue.dev uses a JSONC parser that supports // comments in config.json. If you process this file with standard JSON tools (e.g., JSON.parse in Node.js), strip comments first or use a JSONC-aware parser. The reference configs in this article use JSONC syntax for readability.

Core Configuration: Connecting Continue.dev to Local Models

The config.json Anatomy

The configuration file has several top-level keys. models defines the chat models available in the sidebar. tabAutocompleteModel specifies the model used for inline ghost-text suggestions. contextProviders registers the context sources available via @ mentions. slashCommands defines built-in command shortcuts. customCommands allows teams to create their own prompt templates.

{
 "models": [
 {
 "title": "DeepSeek Coder V2",
 "provider": "ollama",
 "model": "deepseek-coder-v2:16b",
 "contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
 "completionOptions": {
 "temperature": 0.3,
 "maxTokens": 2048
 },
 "requestOptions": {
 "timeout": 60000
 }
 },
 {
 "title": "Llama 3.1 8B",
 "provider": "ollama",
 "model": "llama3.1:8b",
 "contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
 "completionOptions": {
 "temperature": 0.5,
 "maxTokens": 1024
 },
 "requestOptions": {
 "timeout": 60000
 }
 }
 ],
 "tabAutocompleteModel": {
 "title": "CodeLlama 7B",
 "provider": "ollama",
 "model": "codellama:7b-code",
 "contextLength": 4096,
 "completionOptions": {
 "maxTokens": 256,
 "temperature": 0.1,
 "stop": ["
", "```"]
 }
 },
 "contextProviders": [
 { "name": "file" },
 { "name": "codebase" },
 { "name": "terminal" },
 { "name": "git" }
 ],
 "slashCommands": [
 { "name": "edit", "description": "Edit selected code" },
 { "name": "comment", "description": "Add comments to code" },
 { "name": "share", "description": "Export chat to markdown" }
 ]
}

This configuration gives the developer two chat models to switch between (DeepSeek Coder V2 for heavy reasoning tasks, Llama 3.1 for faster responses), CodeLlama for autocomplete, and a working set of context providers and slash commands. The requestOptions.timeout value of 60000 milliseconds accommodates the slower inference times typical of local model execution.

Configuring Tab Autocomplete

Developers trigger autocomplete more than any other assistant feature, so tuning it for local latency matters most. Getting it right for local models prevents the most common source of frustration: laggy or irrelevant suggestions.

{
 "tabAutocompleteOptions": {
 "debounceDelay": 500,
 "multilineCompletions": "auto",
 "maxPromptTokens": 1024,
 "disableInFiles": ["*.md", "*.txt"],
 "useCache": true
 },
 "tabAutocompleteModel": {
 "title": "CodeLlama 7B",
 "provider": "ollama",
 "model": "codellama:7b-code",
 "contextLength": 4096,
 "completionOptions": {
 "maxTokens": 256,
 "temperature": 0.1,
 "stop": ["
", "```"]
 }
 }
}

The debounceDelay of 500 milliseconds prevents the model from being invoked on every keystroke, which matters when inference runs on local hardware. Disabling autocomplete for Markdown and text files avoids wasted computation where code suggestions add no value. Keeping maxTokens at 256 for autocomplete ensures suggestions arrive quickly. The low temperature of 0.1 produces deterministic, predictable completions rather than creative but potentially wrong ones. Setting useCache to true lets Continue.dev reuse recent completions when the context has not changed.

Adding Context Providers

Context providers are what elevate Continue.dev from a simple chat wrapper to a codebase-aware assistant. The @codebase provider uses embeddings to perform semantic search across the entire project, allowing the model to answer questions about code it has never seen in the current prompt window.

Caution: The @terminal context provider injects terminal output into prompts. Avoid using it in sessions where secrets, API keys, or credentials may appear in stdout.

{
 "contextProviders": [
 {
 "name": "codebase",
 "params": {
 "nRetrieve": 15,
 "nFinal": 5,
 "useReranking": true
 }
 },
 { "name": "file" },
 { "name": "docs" },
 { "name": "terminal" },
 { "name": "git" }
 ],
 "embeddingsProvider": {
 "provider": "ollama",
 "model": "nomic-embed-text"
 }
}

The embeddingsProvider block configures Continue.dev to use the nomic-embed-text model through Ollama for generating embeddings locally. No data leaves the machine. The nRetrieve parameter controls how many code chunks are initially retrieved from the vector index, while nFinal limits how many are actually sent to the chat model after reranking. This keeps prompt sizes manageable for models with limited context windows. Note: useReranking uses Continue.dev's built-in reranking logic. Verify against your installed version's documentation whether additional configuration is required.

Hands-On: Using Continue.dev with a JavaScript/React/Node.js Project

Inline Chat for Code Generation

Highlighting code or placing the cursor in a file and pressing Cmd+L (macOS) or Ctrl+L (Windows/Linux) opens the inline chat panel. (This is the default keybinding; check your editor's keybinding settings if it conflicts with an existing shortcut.) The model receives the selected code as context along with the developer's natural language prompt.

// Prompt: "Create a debounced search input component using useState and useEffect"
// Generated output:
import { useState, useEffect } from 'react';
function DebouncedSearchInput({ onSearch, delay = 300, placeholder = "Search..." }) {
 const [query, setQuery] = useState('');
 const [debouncedQuery, setDebouncedQuery] = useState('');
 useEffect(() => {
 const timer = setTimeout(() => {
 setDebouncedQuery(query);
 }, delay);
 return () => clearTimeout(timer);
 }, [query, delay]);
 useEffect(() => {
 // Call onSearch for all values including empty string
 // so callers can react to input being cleared
 onSearch(debouncedQuery);
 }, [debouncedQuery, onSearch]);
 return (
 <input
 type="text"
 value={query}
 onChange={(e) => setQuery(e.target.value)}
 placeholder={placeholder}
 />
 );
}
export default DebouncedSearchInput;

The model produces a functional component using standard React hooks. Code quality scales with model size: for example, the 16B DeepSeek Coder V2 correctly handles useCallback dependency arrays and avoids stale closure bugs that the 8B Llama 3.1 model tends to miss.

Inline Editing with /edit

Selecting a function and invoking /edit triggers a targeted refactoring workflow. Continue.dev shows a diff of proposed changes that can be accepted or rejected inline.

// Before: Express route handler without proper async error handling
app.get('/api/users/:id', (req, res) => {
 db.query('SELECT * FROM users WHERE id = ?', [req.params.id])
 .then(user => {
 if (!user) {
 return res.status(404).json({ error: 'User not found' });
 }
 res.json(user);
 })
 .catch(err => {
 console.log(err);
 res.status(500).send('Server error');
 });
});
// After /edit prompt: "Refactor to async/await with proper error handling middleware pattern"
app.get('/api/users/:id', async (req, res, next) => {
 try {
 const id = parseInt(req.params.id, 10);
 if (!Number.isInteger(id) || id <= 0) {
 return res.status(400).json({ error: 'Invalid user ID' });
 }
 const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
 if (!user) {
 return res.status(404).json({ error: 'User not found' });
 }
 res.json(user);
 } catch (err) {
 next(err);
 }
});

The refactored handler uses async/await, validates the id parameter before querying, wraps the logic in try/catch, and delegates errors to Express error-handling middleware via next(err) rather than logging and sending a generic response. This kind of structural refactoring is more convenient with /edit than a separate chat window because the diff appears inline, right where you are reading the code.

This kind of structural refactoring is more convenient with /edit than a separate chat window because the diff appears inline, right where you are reading the code.

Using Context Providers for Codebase-Aware Answers

Typing @codebase in the chat prompt triggers semantic search across the project. A prompt like "Using @codebase, explain how authentication middleware flows in this project" causes Continue.dev to embed the query, search the local vector index, retrieve the most relevant code chunks, and include them in the prompt to the chat model. Typing @file followed by a filename injects that specific file's contents into the prompt context.

Note that @codebase requires initial indexing on first use. Indexing a large project can take several minutes and spike CPU usage.

Custom Slash Commands for Team Workflows

Custom commands allow teams to standardize prompt patterns across developers.

{
 "customCommands": [
 {
 "name": "review",
 "description": "Perform a code review on the selected code",
 "prompt": "Review the following code for bugs, security issues, performance problems, and adherence to best practices. Provide specific, actionable feedback with line references:
{{{ input }}}"
 },
 {
 "name": "testgen",
 "description": "Generate Jest tests for the selected code",
 "prompt": "Generate comprehensive Jest unit tests for the following code. Include edge cases, error scenarios, and mock external dependencies where appropriate. Use describe/it blocks:
{{{ input }}}"
 }
 ]
}

Continue.dev replaces the {{{ input }}} placeholder with the currently selected code or whatever is included in the chat. (Verify this syntax against the Continue.dev custom commands documentation for your installed version, as placeholder syntax may vary.) Teams can commit these commands to a shared repository, ensuring every developer uses consistent review criteria and test generation patterns.

Performance Optimization and Troubleshooting

Reducing Latency with Local Models

GPU offloading is the single largest factor in local inference speed. Ollama automatically uses available GPU resources when appropriate drivers are installed (CUDA for NVIDIA, ROCm for AMD, Metal for Apple Silicon). GPU layer offloading is configured via model parameters or Modelfile settings; consult the Ollama documentation for the current recommended approach. More layers on the GPU means faster inference at the cost of VRAM.

Model quantization directly affects the speed-quality trade-off. Q4_K_M quantization reduces model size and memory requirements significantly while retaining roughly 95-98% of FP16 perplexity on code benchmarks (see llama.cpp quantization comparisons for methodology). Q8 quantization preserves more quality but requires roughly twice the memory. A 16B Q8 model requires approximately 16 GB VRAM, which exceeds most consumer GPUs; on most consumer hardware, Q4_K_M is the practical maximum for this model size. For autocomplete with a 7B model, Q4_K_M is sufficient.

Setting requestOptions.timeout to at least 60000 in the configuration prevents timeouts during longer generations on CPU-bound hardware.

Common Issues and Fixes

Verify that ollama serve is running and that http://localhost:11434/api/tags returns a model list; a common mistake is pulling a model but not having the Ollama service active. Slow autocomplete usually means the model is too large for the available hardware. Reduce contextLength and maxTokens in the tabAutocompleteModel block to compensate. Context window errors manifest as truncated or nonsensical responses, and they typically mean the contextLength value in the config exceeds the model's actual trained context window; match the config value to the model's specification. In JetBrains IDEs, disabling the built-in "AI Assistant" plugin and reducing the IDE's background indexing frequency can resolve resource contention.

Implementation Checklist: Your Complete Setup Reference

Setup Checklist

  • Install Ollama for your operating system
  • Run ollama serve to start the local model server
  • Verify the service is running: curl http://localhost:11434/api/tags
  • Pull codellama:7b-code for autocomplete
  • Pull deepseek-coder-v2:16b (or llama3.1:8b) for chat
  • Pull nomic-embed-text for local embeddings
  • Verify all models with ollama list
  • Install Continue.dev extension/plugin in your editor
  • Create or edit ~/.continue/config.json with the reference config below
  • Test tab autocomplete in a JavaScript file
  • Test inline chat with Cmd/Ctrl+L
  • Test @codebase context provider by asking a project-level question
  • Add custom slash commands for your team's workflow
  • Version-control your config.json in your dotfiles repository (omit API keys)

Complete Reference config.json

{
 "models": [
 {
 "title": "DeepSeek Coder V2",
 "provider": "ollama",
 "model": "deepseek-coder-v2:16b",
 "contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
 "completionOptions": {
 "temperature": 0.3,
 "maxTokens": 2048
 },
 "requestOptions": {
 "timeout": 60000
 }
 },
 {
 "title": "Llama 3.1 8B",
 "provider": "ollama",
 "model": "llama3.1:8b",
 "contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
 "completionOptions": {
 "temperature": 0.5,
 "maxTokens": 1024
 },
 "requestOptions": {
 "timeout": 60000
 }
 }
 ],
 "tabAutocompleteModel": {
 "title": "CodeLlama 7B",
 "provider": "ollama",
 "model": "codellama:7b-code",
 "contextLength": 4096,
 "completionOptions": {
 "maxTokens": 256,
 "temperature": 0.1,
 "stop": ["
", "```"]
 }
 },
 "tabAutocompleteOptions": {
 "debounceDelay": 500,
 "multilineCompletions": "auto",
 "maxPromptTokens": 1024,
 "disableInFiles": ["*.md", "*.txt"],
 "useCache": true
 },
 "contextProviders": [
 {
 "name": "codebase",
 "params": {
 "nRetrieve": 15,
 "nFinal": 5,
 "useReranking": true
 }
 },
 { "name": "file" },
 { "name": "docs" },
 { "name": "terminal" },
 { "name": "git" }
 ],
 "embeddingsProvider": {
 "provider": "ollama",
 "model": "nomic-embed-text"
 },
 "slashCommands": [
 { "name": "edit", "description": "Edit selected code" },
 { "name": "comment", "description": "Add comments to code" },
 { "name": "share", "description": "Export chat to markdown" }
 ],
 "customCommands": [
 {
 "name": "review",
 "description": "Perform a code review on the selected code",
 "prompt": "Review the following code for bugs, security issues, performance problems, and adherence to best practices. Provide specific, actionable feedback with line references:
{{{ input }}}"
 },
 {
 "name": "testgen",
 "description": "Generate Jest tests for the selected code",
 "prompt": "Generate comprehensive Jest unit tests for the following code. Include edge cases, error scenarios, and mock external dependencies where appropriate. Use describe/it blocks:
{{{ input }}}"
 }
 ]
}

This single file delivers a fully functional local AI coding assistant. Copy it to ~/.continue/config.json. Model, context provider, and slash command settings are shared. Editor-specific behavior (keybindings, sidebar placement) varies per editor.

What's Next: Cloud Hybrids, MCP, and the Continue.dev Roadmap

Continue.dev's configuration model makes hybrid setups straightforward. A common pattern is running a local 7B model for low-latency autocomplete while routing complex chat queries to a cloud model like Claude or GPT-4. To add a cloud model, insert one entry in the models array with the appropriate provider and API key. Do not commit config.json to version control if it contains API keys. Use environment variable substitution where supported.

Model Context Protocol (MCP) support is an active area of development, enabling Continue.dev to connect to external tools and data sources beyond the editor. This opens possibilities for integrating with databases, documentation systems, and CI/CD pipelines directly from the chat interface.

For developers who want AI-assisted coding without compromising on data sovereignty, Continue.dev stands out among alternatives like Tabby, Cody, and Codeium's self-hosted tier as an actively maintained, local-first option.

Continue.dev remains fully open source under the Apache 2.0 license. The project's GitHub repository and Discord server are the primary venues for community contributions and support. For developers who want AI-assisted coding without compromising on data sovereignty, Continue.dev stands out among alternatives like Tabby, Cody, and Codeium's self-hosted tier as an actively maintained, local-first option.

๐Ÿ‘ SitePoint Team
SitePoint Team

Sharing our passion for building incredible internet things.

SitePoint Premium
Stay Relevant and Grow Your Career in Tech
  • Premium Results
  • Publish articles on SitePoint
  • Daily curated jobs
  • Learning Paths
  • Discounts to dev tools
Start Free Trial

7 Day Free Trial. Cancel Anytime.

Stuff we do
Contact
About
Connect
Subscribe to our newsletter

Get the freshest news and resources for developers, designers and digital creators in your inbox each week

ยฉ 2000 โ€“ 2026 SitePoint Pty. Ltd.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Privacy PolicyTerms of Service