Why did we open-source our inference engine? Read the post
Performance
Quality
Model Params Throughput Latency NDCG@10 F1 AP
Alibaba-NLP/gte-Qwen2-7B-instruct
Encode · Dense · Qwen2
7.6B 3.5K tok/s 845.9ms
GritLM/GritLM-7B
Encode · Dense · Mistral
7.2B 1.4K tok/s 2.1s
Linq-AI-Research/Linq-Embed-Mistral
Encode · Dense · Mistral
7.1B 2.9K tok/s 817.9ms
Salesforce/SFR-Embedding-2_R
Encode · Dense · Mistral
7.1B 2.9K tok/s 682.5ms
Salesforce/SFR-Embedding-Mistral
Encode · Dense · Mistral
7.1B 3.0K tok/s 887.5ms
intfloat/e5-mistral-7b-instruct
Encode · Dense · Mistral
7.1B 3.0K tok/s 915.3ms
vidore/colqwen2.5-v0.2
Encode · Multi-Vec · Qwen2
7.0B 7.6 mpix/s 1.9s
nvidia/llama-nemoretriever-colembed-3b-v1
Encode · Multi-Vec · llama_nemoretrievercolembed
4.4B 0.7 img/s 6.2s
Qwen/Qwen3-Reranker-4B
Score · Score · Qwen3
4.0B
Qwen/Qwen3-Embedding-4B
Encode · Dense · Qwen3
4.0B 5.7K tok/s 464.5ms
vidore/colpali-v1.3-hf
Encode · Multi-Vec · PaliGemma
3.0B 23.0 mpix/s 581.7ms
Qwen/Qwen3-VL-Embedding-2B
Encode · Dense · qwen3_vl
2.1B 494 tok/s 35.9ms
Qwen/Qwen3-VL-Reranker-2B
Score · Score · qwen3_vl
2.1B
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Encode · Dense · Qwen2
1.8B 12.3K tok/s 261.1ms
mixedbread-ai/mxbai-rerank-large-v2
Score · Score · Qwen2
1.5B 2.3K tok/s 1.1s
NovaSearch/stella_en_1.5B_v5
Encode · Dense · Qwen2
1.5B 12.8K tok/s 265.9ms
zai-org/GLM-OCR
Extract · Text (Markdown) · GLM-OCR
1.3B
lightonai/LightOnOCR-2-1B
Extract · Text (Markdown) · LightOnOCR
1.0B
laion/CLIP-ViT-H-14-laion2B-s32B-b79K
Encode · Dense · CLIP
986M 380 tok/s 461.1ms
PaddlePaddle/PaddleOCR-VL-1.5
Extract · Text (Markdown) · PaddleOCR-VL
959M
google/siglip-so400m-patch14-384
Encode · Dense · SigLIP
878M 399 tok/s 452.5ms
google/siglip-so400m-patch14-224
Encode · Dense · SigLIP
877M 400 tok/s 394.1ms
Qwen/Qwen3-Embedding-0.6B
Encode · Dense · Qwen3
596M 20.6K tok/s 156.9ms
Qwen/Qwen3-Reranker-0.6B
Score · Score · Qwen3
596M
BAAI/bge-m3
Encode · Dense /Sparse /Multi-Vec · XLM-RoBERTa
568M 33.2K tok/s 93.4ms
BAAI/bge-m3
Score · Dense /Sparse /Multi-Vec · XLM-RoBERTa
568M 2.8K tok/s 56.8ms
BAAI/bge-reranker-v2-m3
Score · Score · XLM-RoBERTa
568M 30.0K tok/s 93.5ms
BAAI/bge-reranker-large
Score · Score · XLM-RoBERTa
560M 6.6K tok/s 41.4ms
intfloat/multilingual-e5-large
Encode · Dense · XLM-RoBERTa
560M 29.8K tok/s 108.6ms
intfloat/multilingual-e5-large-instruct
Encode · Dense · XLM-RoBERTa
560M 29.4K tok/s 106.9ms
jinaai/jina-colbert-v2
Encode · Multi-Vec · XLM-RoBERTa
559M 28.5K tok/s 105.7ms
jinaai/jina-colbert-v2
Score · Multi-Vec · XLM-RoBERTa
559M 1.4K tok/s 226.1ms
mixedbread-ai/mxbai-rerank-base-v2
Score · Score · Qwen2
494M 6.0K tok/s 454.0ms
nomic-ai/nomic-embed-text-v2-moe
Encode · Dense · NomicBERT
475M 13.0K tok/s 149.6ms
numind/NuNER_Zero
Extract · Entities · DeBERTa
449M
NovaSearch/stella_en_400M_v5
Encode · Dense · ModernBERT
435M 27.1K tok/s 115.7ms
EmergentMethods/gliner_large_news-v2.1
Extract · Entities · DeBERTa
435M
Ihor/gliner-biomed-large-v1.0
Extract · Entities · DeBERTa
435M
jackboyla/glirel-large-v0
Extract · Relations · DeBERTa
435M
urchade/gliner_large-v2.1
Extract · Entities · DeBERTa
435M
urchade/gliner_multi_pii-v1
Extract · Entities · DeBERTa
435M
openai/clip-vit-large-patch14
Encode · Dense · CLIP
428M 977 tok/s 228.0ms
google/siglip2-base-patch16-224
Encode · Dense · SigLIP
375M 1.6K tok/s 68.5ms
mixedbread-ai/mxbai-colbert-large-v1
Encode · Multi-Vec · BERT
335M 43.3K tok/s 74.9ms
mixedbread-ai/mxbai-colbert-large-v1
Score · Multi-Vec · BERT
335M 4.0K tok/s 45.6ms
intfloat/e5-large-v2
Encode · Dense · BERT
335M 33.2K tok/s 86.6ms
Alibaba-NLP/gte-multilingual-base
Encode · Dense · ModernBERT
305M 55.1K tok/s 63.1ms
Snowflake/snowflake-arctic-embed-m-v2.0
Encode · Dense · gte
305M
google/embeddinggemma-300m
Encode · Dense · Gemma 3
303M 79.6K tok/s 55.7ms
urchade/gliner_multi-v2.1
Extract · Entities · DeBERTa
289M
jinaai/jina-reranker-v2-base-multilingual
Score · Score · XLM-RoBERTa
278M 8.3K tok/s 32.0ms
BAAI/bge-reranker-base
Score · Score · XLM-RoBERTa
278M 5.0K tok/s 33.2ms
mynkchaudhry/Florence-2-FT-DocVQA
Extract · text_regions · Florence-2
271M
IDEA-Research/grounding-dino-base
Extract · Bounding Boxes · Swin
233M 0.8 mpix/s 785.8ms
microsoft/Florence-2-base
Extract · text_regions · Florence-2
232M
fastino/gliner2-base-v1
Extract · Entities · extractor
208M
urchade/gliner_medium-v2.1
Extract · Entities · DeBERTa
195M
IDEA-Research/grounding-dino-tiny
Extract · Bounding Boxes · Swin
172M 0.9 mpix/s 532.6ms
google/owlv2-base-patch16-ensemble
Extract · Bounding Boxes · CLIP
155M 1.0 mpix/s 954.6ms
laion/CLIP-ViT-B-32-laion2B-s34B-b79K
Encode · Dense · CLIP
151M 1.0K tok/s 219.4ms
openai/clip-vit-base-patch32
Encode · Dense · CLIP
151M 958 tok/s 234.0ms
Alibaba-NLP/gte-reranker-modernbert-base
Score · Score · ModernBERT
150M 6.2K tok/s 41.9ms
lightonai/GTE-ModernColBERT-v1
Encode · Multi-Vec · ModernBERT
149M 28.0K tok/s 103.9ms
lightonai/GTE-ModernColBERT-v1
Score · Multi-Vec · ModernBERT
149M 231 tok/s 313.4ms
lightonai/Reason-ModernColBERT
Encode · Multi-Vec · ModernBERT
149M 33.0K tok/s 82.2ms
lightonai/Reason-ModernColBERT
Score · Multi-Vec · ModernBERT
149M
Alibaba-NLP/gte-modernbert-base
Encode · Dense · ModernBERT
149M
ibm-granite/granite-embedding-english-r2
Encode · Dense · ModernBERT
149M
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
Encode · Sparse · ModernBERT
137M 34.2K tok/s 93.7ms
opensearch-project/opensearch-neural-sparse-encoding-v1
Encode · Sparse · BERT
133M 48.7K tok/s 69.0ms
naver-clova-ix/donut-base-finetuned-cord-v2
Extract · text_regions · Encoder-Decoder
110M
naver-clova-ix/donut-base-finetuned-docvqa
Extract · text_regions · Encoder-Decoder
110M
naver/splade-cocondenser-selfdistil
Encode · Sparse · BERT
110M 40.0K tok/s 72.4ms
naver/splade-v3
Encode · Sparse · BERT
110M 29.6K tok/s 83.7ms
numind/NuNER_Zero-span
Extract · Entities · DeBERTa
110M
prithivida/Splade_PP_en_v2
Encode · Sparse · BERT
110M 57.5K tok/s 55.4ms
colbert-ir/colbertv2.0
Encode · Multi-Vec · BERT
110M 43.0K tok/s 65.7ms
colbert-ir/colbertv2.0
Score · Multi-Vec · BERT
110M 3.8K tok/s 51.4ms
intfloat/e5-base-v2
Encode · Dense · BERT
109M 53.2K tok/s 57.9ms
Extract · Parsed Document · Docling
80M
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill
Encode · Sparse · DistilBERT
67M 49.1K tok/s 63.3ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill
Encode · Sparse · DistilBERT
67M 50.1K tok/s 60.7ms
opensearch-project/opensearch-neural-sparse-encoding-v2-distill
Encode · Sparse · DistilBERT
67M 44.2K tok/s 63.3ms
urchade/gliner_small-v2.1
Extract · Entities · DeBERTa
60M
ibm-granite/granite-embedding-small-english-r2
Encode · Dense · ModernBERT
48M
answerdotai/answerai-colbert-small-v1
Encode · Multi-Vec · BERT
33M 59.1K tok/s 47.9ms
answerdotai/answerai-colbert-small-v1
Score · Multi-Vec · BERT
33M 1.7K tok/s 121.7ms
cross-encoder/ms-marco-MiniLM-L-12-v2
Score · Score · BERT
33M 8.2K tok/s 31.7ms
intfloat/e5-small-v2
Encode · Dense · BERT
33M 58.3K tok/s 49.7ms
mixedbread-ai/mxbai-edge-colbert-v0-32m
Encode · Multi-Vec · ModernBERT
32M 45.9K tok/s 59.7ms
mixedbread-ai/mxbai-edge-colbert-v0-32m
Score · Multi-Vec · ModernBERT
32M
ibm-granite/granite-embedding-30m-sparse
Encode · Sparse · RoBERTa
30M 31.9K tok/s 105.2ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini
Encode · Sparse · BERT
23M 51.1K tok/s 54.5ms
cross-encoder/ms-marco-MiniLM-L-6-v2
Score · Score · BERT
23M 52.4K tok/s 45.1ms
sentence-transformers/all-MiniLM-L6-v2
Encode · Dense · BERT
23M 55.3K tok/s 53.3ms
rasyosef/splade-mini
Encode · Sparse · BERT
11M 56.3K tok/s 56.0ms
knowledgator/gliner-bi-base-v2.0
Extract · Entities ·
null
knowledgator/modern-gliner-bi-base-v1.0
Extract · Entities ·
null

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.