OneInfer: Easily Deploy Inference Models Locally (DeepSeek, llama, Qwen)

Jinyao Liu

2025-01-30

2025

Page content

OneInfer: All-in-One Model Inference Tool

With the rapid development of machine learning and deep learning, model management and deployment have become important and complex challenges. To simplify this process, we introduce OneInfer, an open-source command-line tool designed to help users efficiently manage and serve various types of machine learning models. It supports both local and remote model management and can easily serve models with different inference backends. In this article, we will give a detailed introduction to the features of OneInfer, its advantages, and how to use it.

Features of OneInfer

1. Support for Multiple Model Platforms

One of the core advantages of OneInfer is its support for multiple model platforms. Users can download models from various platforms, such as Hugging Face and ModelScope, rather than being limited to a single platform. For example, OneInfer supports downloading pre-trained models from Hugging Face and ModelScope, allowing users to easily switch between platforms to meet different needs. In contrast, Ollama is limited to its own platform.

2. Support for Multiple Inference Backends

OneInfer is not limited to processing only language models. It also supports a variety of inference backends. Whether it’s language models, visual models, or other non-LLM models, OneInfer can easily handle them. This means OneInfer can support a wider range of application scenarios, extending beyond traditional natural language processing tasks. For example, it will support tasks such as image classification and object recognition in the future, providing users with more options and freedom.

3. Convenient Local Deployment

OneInfer aims to provide an out-of-the-box convenient experience. Users can directly download models from Hugging Face or ModelScope and serve them without complex configuration or compilation. This convenient approach makes model management and deployment much easier, especially for users who need to experiment and develop quickly.

4. Supported Platforms

OneInfer supports both Linux and macOS platforms, enabling development, building, and deployment on both operating systems. Whether you are developing on Linux or managing models on macOS, OneInfer provides a consistent experience, helping you efficiently manage and deploy models.

Using OneInfer

OneInfer is easy to use via the command line. Here are some common operations:

1. Add a Model

OneInfer supports adding models from multiple platforms, including Hugging Face, ModelScope, or local files. To add a model to OneInfer, simply run the following command:

oneinfer add <model_repo> <platform_name> <file_name>

For example, to download a model from ModelScope:

oneinfer add unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF modelscope DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf

To download a model from Hugging Face:

oneinfer add RepoId huggingface modelname

To add a local model:

oneinfer add localmodelname local 
# Then input the file path
./test/fakemodel.bin

2. List Added Models

To list all the models that have been added to OneInfer, use the following command:

oneinfer ls

3. Run a Model

You can start a specific model by running oneinfer run with the model name:

oneinfer run modelname [-p (default 8080)] [-h (default 127.0.0.1)]

For example:

oneinfer run -m DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf

This will call the OneInfer server and start the model server.

4. View Running Model Status

To view the status of all running models, use:

oneinfer ps

This will list the currently running models along with their status.

5. Stop a Model or Server

To stop a running model, use:

oneinfer stop <model_uid>

To stop the entire OneInfer server:

oneinfer stop serve

This will stop the server and all running models.

Difference from Ollama

OneInfer offers several significant advantages over Ollama:

Broader model platform support: OneInfer supports downloading models from platforms like Hugging Face and ModelScope, while Ollama is limited to its own platform.
Support for more inference backends: OneInfer will support a variety of inference backends, not only for language models but also for visual models and other non-LLM models, further expanding its use cases.

These differences make OneInfer more flexible and scalable, offering users more choices and freedom.

Future Development Plans

OneInfer will continue to add more features in the future, including:

Supporting additional inference backends.
Expanding support for more types of models.
Providing ready-to-use packaged applications, allowing users to download and use it immediately without the need for compilation.

Conclusion

OneInfer is a powerful tool designed to simplify model management and deployment. With its rich features and flexible platform and inference backend support, OneInfer offers users a more scalable and customizable experience. If you need an easy-to-use, efficient, and flexible tool to manage and run machine learning models, OneInfer is definitely worth trying.

Interested developers can visit our GitHub repository to try it out and contribute: GitHub - OneInfer.

We welcome everyone to use OneInfer and enjoy the convenience of managing machine learning models!