ModelGate: Secure and Unified API Gateway for Local LLM Deployment
As local large language model (LLM) deployment becomes increasingly common, simply exposing inference services like Ollama or vLLM to external users introduces serious security, quota, and operational challenges. What developers need is not just a model serverโbut a secure, unified, and OpenAI-compatible gateway layer.
The Challenges of Direct Local Model Exposure
Open-source tools such as Ollama, vLLM, and llama.cpp have significantly lowered the barrier for running large models locally. However, productionizing these services reveals several structural pain points.
๐ Security Risks
Directly exposing an inference service to the public internet means:
- Anyone can access your model endpoint.
- There is no built-in access control.
- Usage cannot be traced or audited reliably.
This is unacceptable in enterprise or multi-user environments.
๐ฅ Multi-Tenant Management Complexity
In real-world deployments, you often need:
- Separate API Keys for different teams or applications.
- Distinct token quotas and rate limits.
- Usage tracking and cost allocation.
Implementing these directly inside model servers is complex, error-prone, and hard to maintain.
โ๏ธ Operational Overhead
Without a unified gateway:
- Each service must implement authentication and rate limiting independently.
- No centralized management interface exists.
- Observability and monitoring are fragmented.
Operational costs quickly escalate.
๐ OpenAI API Compatibility
Many existing applications are already built on top of the OpenAI API.
To migrate them to local modelsโor to build hybrid deployments (local + OpenAI cloud)โdevelopers need a fully compatible API layer that enables zero-cost switching.
The Solution: ModelGate
ModelGate is an OpenAI-compatible API gateway purpose-built for local LLM deployment.
Developed in Go, ModelGate provides a secure and controlled way to expose local or hybrid model servicesโwhile maintaining full compatibility with the OpenAI API format.
Core Capabilities
๐ Security First
- API Key authentication (SHA256 hashed storage)
- Per-key IP whitelist support
- HTTPS deployment support
๐ Fine-Grained Quota & Rate Control
- Token-based quota allocation per user
- Redis-based rate limiting (RPM / Burst)
- Full token consumption tracking per request
๐ Multi-Backend Support
ModelGate supports multiple inference backends:
- Ollama (local inference)
- vLLM (high-performance inference)
- llama.cpp (lightweight runtime)
- OpenAI (hybrid cloud deployment)
- API3 and other third-party APIs
This allows seamless hybrid model strategies.
๐ Zero-Cost Migration from OpenAI
Existing applications only need to change the base_url:
# Original OpenAI usage
client = OpenAI(
api_key="xxx",
base_url="https://api.openai.com/v1"
)
# Switch to ModelGate
client = OpenAI(
api_key="your-modelgate-key",
base_url="http://your-server:8080/v1"
)
````
No other changes required.
### ๐ Flexible Management Interfaces
* Web Admin UI for visual management
* CLI tools for automation and scripting
* RESTful Admin APIs for deep integration
---
## Architecture Overview
text โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ Client โโโโโโถโ ModelGate โโโโโโถโ Ollama โ โ (OpenAI โ โ (Gateway) โ โ / vLLM โ โ SDK) โ โ โ โ / llama.cppโ โโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ โผ โผ โผ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ SQLite โ โ Redis โ โ Admin โ โ (Data) โ โ (Rate) โ โ UI/API โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
### Adapter Pattern Design
ModelGate adopts an adapter pattern:
* Each backend (Ollama, vLLM, etc.) has its own adapter.
* Core gateway logic is decoupled from backend implementations.
* New model backends can be added easily.
---
## Quick Start
### One-Click Deployment
bash
Recommended: Docker Compose
docker-compose up -d
Or manual build
make build ./modelgate
### Configuration
Edit `configs/config.yaml`:
yaml server: port: 8080
admin: api_key: “your-admin-key”
adapters: ollama: base_url: http://localhost:11434 vllm: base_url: http://localhost:8000
### Create a User API Key
bash ./modelgate-cli key create -n “user1” -q 1000000 -r 60
### Send a Request
bash
curl -X POST http://localhost:8080/v1/chat/completions
-H “Authorization: Bearer your-user-key”
-H “Content-Type: application/json”
-d ‘{
“model”: “qwen3:8b”,
“messages”: [{“role”: “user”, “content”: “ไฝ ๅฅฝ”}]
}’
```
Real-World Use Cases
1๏ธโฃ Enterprise Internal Model Governance
Background
Multiple AI teams need access to internally deployed models.
With ModelGate
- Independent API Keys per team
- QPS / daily call / token quotas
- Internal IP restrictions
- Real-time monitoring (usage, latency, failures)
- Internal cost allocation
Value
Enables enterprise-grade governance and prevents resource abuse.
2๏ธโฃ SaaS Platform with Hybrid Deployment
Background
Tiered service model:
- Free users โ local models
- Paid users โ GPT-4 or cloud models
With ModelGate
- Identity-based routing
- Multi-model load balancing
- Unified billing system
- Unified monitoring
- Consistent API abstraction
Value
Achieves hybrid cloud deployment with optimized cost-performance balance.
3๏ธโฃ Productizing a Vertical Model API
Background
A team trains a vertical model (medical, legal, financial) and wants to commercialize it.
With ModelGate
- Full authentication and access control
- OpenAI-compatible API packaging
- Token-based or subscription billing
- Usage analytics and reporting
- Admin dashboard
- Rate limiting and traffic protection
Value
Transforms internal models into a scalable, monetizable API platform.
4๏ธโฃ Local Multi-Instance OpenClaw Scheduling
Background
A researcher runs multiple OpenClaw instances and wants isolated quotas and scheduling control.
With ModelGate
- Separate API Keys per instance
- Load balancing across local models
- Resource monitoring per instance
- Strategy-based routing
- Per-instance rate limits
- Behavioral usage analysis
Value
Enables controlled multi-agent experimentation in local environments.
Comparison
| Feature | ModelGate | OpenAI API | Nginx |
|---|---|---|---|
| Multi-Backend Support | โ | โ | โ |
| API Key Management | โ | โ | โ |
| Token Quota | โ | โ | โ |
| Rate Limiting | โ | โ | Basic |
| Usage Statistics | โ | โ | โ |
| Web Admin UI | โ | โ | โ |
| Deployment Complexity | Low | None | Medium |
| Open Source | โ | โ | โ |
Roadmap
- Plugin system
- Monitoring and alerting
- Distributed deployment
- More backend integrations
Conclusion
ModelGate addresses the security and governance challenges of local LLM deployment.
Instead of forcing developers to reinvent authentication, quota management, and rate limiting for each model service, ModelGate provides a unified and production-ready gateway layer.
Developers can now focus on building models and applicationsโwhile ModelGate handles the infrastructure.
Open Source Repository: https://github.com/derekwin/ModelGate
Star and contributions are welcome.