The 'MiniCPM' edge model series is a world-leading, lightweight, and high-performance LLM. Since its release in February 2024, it has been widely tested and acclaimed by the global open-source community for its "achieving more with less" efficiency and outstanding on-device performance. It has repeatedly topped GitHub and Hugging Face trending charts, becoming one of the most popular LLMs on Hugging Face in 2024. The 'MiniCPM' has partnered with industry benchmark leaders, emerging as an indispensable player in driving innovation across sectors such as AIPC, AI phones, intelligent cabins, and embodied robots.
Unbelievably Strong for 4B size edge Model on your device! ChatGPT-level Basic Performance
Surpassing GPT-3.5, Qwen2-7B, GLM4-9B
New Architecture, New Benchmark of LLM Knowledge Density
Light! Fast! On-Device Friendly Only 2GB of memory after quantization Versatile and Sharp as a Swiss Army KnifeSurpassing Kimi! Infinite Long Text 32, 128, 256, 512K... Unlimited Context Expansion
GPT-4o-level Function Calling Surpassing GPT-3.5, GLM4-9B, Close to GPT-4o
Superior RAG External Attachment Set Number One in Chinese Retrieval, Results Generation Surpassing Llama3-8B Learn More
Leapfrogging Global Benchmark WorksSurpassing Mistral-7B, Llama2-13B, Gemma-7B-it, ChatGLM3-6B and other small open-source models.High Efficiency and Low CostSupports CPU inference, Fine-tuning with consumer-GPU
Inference speed up to 33 tokens/s
Inference cost as low as 1 $ = 12,500,000 tokens*Inference speed is the actual performance of the Intel Core ULTRA9 processor
*Inference cost calculation: Snapdragon 855 chip (costing about 82 Dollars) with 7.5 tokens/sLearn More
Smaller Size, Everywhere ScenariosHalf the Size, Performance Surpassing Llama2-13B
Inference speed 25 tokens/s, 25 times the human speaking speed
60% reduction in inference cost 1 $ = 30,300,000 tokens*Apple A17 Pro is $130. If using meta, the maximum speed is 25 tokens/s
Assuming the chip is used for 5 years, the inference cost is (25x3600x24x365x5)/130 = 30.3 Million Tokens/DollarLearn More
Edge-Side GPT-4oReal-time streaming, end-to-end
Full-modal, all SOTA
The best edge visual general model
The best audio general model Continuous watching, real videos
Not just a single frame-based model
Real-time listening, truly smooth
Hear clearly, understand distinctly
Natural speaking, emotional engagement
Real-time interruptions without confusionFull Capability, End-to-EndHigh performance, low latency
More natural, more coherent
Context understanding
Interruptible at any time
Noise resistance
Easy deployment and maintenanceLearn More
Top On-Device Multimodal, Comprehensive PerformanceCompetitive to GPT-4V, SOTA performance in Real-time Video Understanding, Multiple Images Understanding, and Single Image Understanding, among models below 20B. It's the first appearance of real-time understanding on the edge model on the device.Light! Fast! On-Device Friendly!Only 6GB of memory on the device side after quantization
On-Device inference speed up to 18 tokens/s, 33% faster
Supports Llama.cpp, ollama, vllm inferenceOthersExtremely low hallucination, better than GPT-4o and GPT-4V, based on self-developed RLHF-V efficient alignment technology.Learn More
Strongest on-device multimodal general capabilitySurpassing Gemini Pro, GPT-4VStrongest on-device OCRComparable to GPT-4V benchmark model OCRBench surpasses GPT-4o, GPT-4V, Claude 3V Opus Gemini Pro and other benchmark models on ranking9 times clearer pixels
Precise recognition on difficult
image and long textOur self-developed high-definition image decoding technology enables on-device lossless recognition of 1.8 million pixel high-definition images, supporting any aspect ratio. It excels in interpreting and reasoning difficult images, as well as providing precise OCR recognition and extraction of long image and text content.
Multimodal acceleration of on-device systemImage encoding is 150 times faster! Efficient operation on mobile phones at 6-8 tokens/sSupports 30+ languagesAdded mainstream languages such as German, French, Spanish, Italian, Russian and moreLearn More
Strongest on-device OCR Breakthrough in multimodal model capabilityGPT-4V/ Recognition on image and text comparable to Gemini Pro.Hallucination level on par with GPT-4VLearn More
Efficiency FirstWe believe the best model is the one with superior power, faster speed and lower costEfficiency comes from mastering the science of large language models (LLMs), with knowledge density as the key principle.
As knowledge density grows, it becomes a core competitive advantage, unlocking vast potential for edge intelligence and applications.
Modelbest LawMoore’s Law
The capability density of LLMs increases exponentially over time. Since 2023, the maximum capability density of LLMs doubles approximately every 3.3 months.Capability density: The ratio of effective parameter size to actual parameter size. Effective parameter size refers to the minimum number of parameters required for the reference model (e.g., MiniCPM) to achieve performance equivalent to the given target model.
Chip circuit density doubles every 18 monthsThe number of transistors that can be accommodated on an integrated circuit doubles approximately every two years.