A tale of programming: Why DeepSeek-R1 is so powerful?
Introduction
On January 2025, DeepSeek R1 irrupted as a new Large Language Model (LLM), claiming that it had been developed by using substantially less resources: processing power, money, etc.; than comparable models available on the stage. And even more: it is open source, under MIT license.
I intensively use LLMs to get assistance researching and coding. Therefore, I usually download these models to execute them offline on my laptop, as a personal effort to reduce the environmental impact, considering the consumption of resources as power, water, etc.; so I did that again, and due to the processing power capabilities of the laptop, it was downloaded the minimal version of the model: DeepSeek-R1-Distill-Qwen-1.5B from Hugging Face.
After installing DeepSeek R1, I was favourably surprised by the differences about how the model generates answers to the prompts. See Figure 1 below.
Therefore, I looked for additional references about this model, trying to understand why it has achieved this noticeable performance, considering the resources that were used; so I reviewed in detail how it was built, by reading the "DeepSeek-V3 Technical Report".
Key factors
In my opinion, there are four factors that explain why DeepSeek R1 has achieved this outstanding results:
1. Research: Searching "DeepSeek" on arxiv.org shows about 77 papers published since september 2023.
2. Focus: DeepSeek R1 works using less parameters than other models, incentivizing reasoning capabilities in its model. Read section 2. "Approach" on "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning".
3. Code optimization: The developers coded significant sections of DeepSeek-R1 by using Parallel Thread Execution (PTX) instead of programming by using the CUDA-toolkit. Notice that at the PTX level, the GPUs can be optimized as a Single Instruction Multiple Data (SIMD) computer; and even more: the GPUs can transfer data among them without the intervention of the CPU.4. Data Compression: By using Vector Quantization (VQ) of the parameters, the volume of data transferred is significantly reduced. Note: I have personally used the VQ algorithm to compress echocardiographic video sequences. You can review my GitHub repository CompressionVQ.
Conclusion
When it is mandatory to solve a problem, and there are not enough resources, your only alternative is to intensively use the best and most powerful computers available: human brains!