LLMs in (March)2025
Open source is the new black
Alexandre Strube
March 26th, 2025
Website
https://helmholtz-blablador.fz-juelich.de
EVERYTHING CHANGED
OpenAI is no longer the only game in town
The GPT-4 barrier was broken by Open Source (The
2023 version of GPT-4 is #63 on LM Arena)
Training costs are WAY down
Inference costs, too
The rise of China 🇨🇳
China is now a major player in AI
And they do a lot of open source!
2020: USA 11 LLMs, China 2 LLMs
2021: 31 LLMs on both countries (I need more recent
data)
Diagram of thought
Trying to overcome limitations of
Chain-Of-Thought (Andrew Chi-Chih Yao from Tsinghua
University)
Huawei
Probably the most sanctioned company in the
world
Selling AI Chips
Ascend 910C: claimed to be on par with Nvidia’s
H100
(In practice, their chips are closed to A100)
Already made by SMIC (previous models were made in
Taiwan by TSMC)
Has LLMs since 2023 on Huawei Cloud
Can’t download, but can fine-tune and download the
finetuned models
Servers in EU comply with EU regulations/AI
act
China Telecom
Already has a 1 trillion parameter model using
Huawei chips
TeleChat2-115b
Meanwhile, Deutsche Telekom gives me 16mbps in
Jülich, like I had in 2005 in Brazil 😥
Using copper , like the ancient
aztecs or whatever 🗿𓂀𓋹𓁈𓃠𓆃𓅓𓆣
Baichuan AI
Was the first model on Blablador after Llama/Vicuna
in 2023
Has a 1 trillion parameter model for a year
already
Baichuan 2 is Llama 2-level, Baichuan 4 (closed) is
ChatGPT-4 level (china-only)
Baidu
Used to be their AI leader, playing catch-up
Fell behind BECAUSE THEY ARE CLOSED SOURCE
Doing a lot of AI research
Ernie 4.5 from March 2025 (version 3 is open) -
They swear they’ll open them in June 30th(?)
Ernie-Health
Miaoda (no-code dev)
I-Rag and Wenxin Yige (text to image)
Ernie X1 is 50% of DeepSeek’s price, same
reasoning
100,000+ GPU clusters
Yi (01.ai)
Yi 1.5 pre-trained on 3.5T tokens and then produces
it’s own data and re-train on it
Was the best model at the beginning of 2024 on
benchmarks
IMHO was not so good in practice 😵💫
Yi Lightning is #12 on LmArena as of
26.01.2025
Yi VL, multimodal in June (biggest vision model
available, 34b)
Yi Coder was the best code model until 09.2024
Went quiet after that (probably busy making money,
last update 11.2024)
https://01.ai
StepFun
Step-2 is #8 on LLM Arena (Higher than Claude,
Llama, Grok, Qwen etc)
Step-1.5V is a strong multimodal model
Step-1V generates images
No open source LLM available
Alibaba
Qwen2.5-1M: 1m tokens for context
Qwen2.5-VL: (21.03.2025) 32b parameters video
model: can chat through camera, play games, control your phone etc
BLABLADOR: alias-code for Qwen2.5-coder
Better than Llama
Open weights on HuggingFace, modelscope and free
inference too
28.01.2025: Qwen 2.5 MAX released (only on their
website and api)
QwQ-32 is a reasoning model, available on Blablador
(#13 on LM Arena)
LHM: From picture to animated 3d in seconds
CEO was warning yesterday about “data center
bubble”
InternLM
InternLM-3 is an 8B model with 4x less training
time than Llama 3
Released in 2025.01.15
Strube’s opinion: Previous versions were bad, need
to check again
Tencent
Hunyuan-DiT LLM and can create images too
Hunyuan-T1 (24.03.2025) is a Mamba-like model with
reasoning, beating o1
The Five: Home Robot
Zhipu
Real-time video conversations in commercial
products, scientific paper reader, etc
Open source:
CodeGeex (runs directly on VSCode),
LLM: GLM4, GLM-130B
Visual model: CogVLM 17b/CogAgent
18b/GLM-4v-Plus
Image generator: GofView-3-Plus
CogVideoX: Video generator with 5b parameters
Auto Agent: AutoGLM (does things for you on your
phone or web)
https://zhipuai.cn/en/
ByteDance: Doubao
Multi-modal
200x cheaper than OpenAI’s GPT-4o
Video generation diffusion models:
Doubao-PixelDance and Doubao-Seaweed
OpenBMB
MiniCPM-o 2.6: A GPT-4o-level multimodal model with
8B parameters
MiniCPM-V 2.6: A image/video/ocr model with 8B
parameters.
You ain’t
here to hear about ModelScope or OpenBMB
DeepSeek R1
DeepSeek R1
Launched in 2025.01.21
Took the world by storm (Commercial usage is 97%
cheaper than OpenAI-o1)
R1 is on par with OpenAI-o1
675B parameters and distilled to 70, 32, and
8B
Costed a fraction of other models to train
MIT LICENSE
Free chat
When something
seems to be too good to be true…
DeepSeek R1
Does very well on standard benchmarks
Not so well on not-so-popular hard benchmarks
(Strube’s opinion: maybe it was trained on the
benchmarks too?)
DeepSeek R1 on AIW Benchmark
Source: Jenia’s Twitter
DeepSeek R1 on AIW
Benchmark
Source: Jenia’s Twitter
Other opinions:
DeepSeek R1
A VERY good open model
Not so good as they claim
Everybody is reimplementing its techniques, as it’s
so much cheaper than everything from 2024
HuggingFace is creating Open-R1 at https://github.com/huggingface/open-r1
Will shake the whole industry in ways I can’t
fathom
Pickle is hungry
THEY DON’T STOP AND
I CAN’T KEEP UP WITH THIS
New model relased in January
DeepSeek Janus Pro 7B in 01.02
It not only understands multi-modalities, but also
generates them
Best model understanding images, best model
generating images
https://huggingface.co/deepseek-ai/Janus-Pro-7B
THEY DON’T STOP
AND I CAN’T KEEP UP WITH THIS
DeepSeekMath 7B (05.02.2024!!!) created “Group
Relative Policy Optimization” (GRPO) for advanced math reasoning
Now GRPO is widely used to improve Reinforcement
Learning on general LLMs
DeepSeek-V3-0324 came out monday, doesn’t even have
a README yet (post-training update?)
The LLM ecosystem: 🇨🇦
Cohere AI
Aya: group of models, from 8 to 32b parameters,
multi-modal, multi langage etc
Strube’s opinion: they were not so good in
2024
C4AI Command A: 111b model
The Open LLM ecosystem: 🇺🇸
Google: Gemini is closed and state-of-the-art,
gemma is open and good
Gemini 2.5 released 24.03.2025, #1 on LM Arena
Microsoft has tiny, bad ones (but I wouldn’t bet
against them - getting better)
They are putting money on X’s Grok
Twitter/X has Grok-3 for paying customers, Grok-1
is enormous and “old” (from march)
Colossus supercomputer has 200k gpus, aiming for
1M
Grok was #1 on the leaderboard until yesterday
Apple is going their own way + using ChatGPT
Falling behind? Who knows?
The LLM ecosystem: 🇺🇸
Meta has Llama, and somehow it’s making money out
of it
Training Llama4 with lessons learned from
DeepSeek
Anthropic is receiving billions from Amazon, MS,
Pentagon etc but Claude is completely closed
Amazon released its “Nova” models in February. 100%
closed, interesting tiering (similar to blablador)
Nvidia has a bunch of intersting stuff, like
accelerated versions of popular models, and small speech/translation
models among others
OpenAI is not open at all. Published a new image
generator model yesterday.
The “open” LLM ecosystem: 🇺🇸
Outside of Academia, there’s OLMo
from AllenAI
Has training code, weights and data, all open
Intellect-1
was trained collaboratively
Up to 112 H100 GPUs simultaneously
They claim overall compute utilization of 83%
across continents and 96% when training only in the USA
Fully open
Nous Research
Hermes 3, announced in January 2025
Fine-tuned from Llama 3.1 on synthetic data
Training live, on the interent
The LLM ecosystem 🇪🇺
Mistral.ai just came with a new small model
Fraunhofer/OpenGPT-X/JSC has Teuken-7b
DE/NL/NO/DK: TrustLLM
SE/ES/CZ/NL/FI/NO/IT: OpenEuroLLM
A lot of people tell me: “HuggingFace is French and
is a great success”
HuggingFace was based in NYC from the
beginning
🇪🇺
Beginning of February, the French government
released their LLM.
(Side note: No model in blablador has ever said that the “square root
of goat is one”)
Potato
Take the slides with you
https://go.fzj.de/2025-eum
Questions?
No dogs have been harmed for this
presentation
LLMOps resource
A No-BS Database of How Companies Actually Deploy LLMs in Production:
300+ Technical Case Studies, Including Self-Hosted LLMs in https://www.zenml.io/llmops-database
“I think the complexity of Python package management holds down AI
application development more than is widely appreciated. AI faces
multiple bottlenecks — we need more GPUs, better algorithms, cleaner
data in large quantities. But when I look at the day-to-day work of
application builders, there’s one additional bottleneck that I think is
underappreciated: The time spent wrestling with version management is an
inefficiency I hope we can reduce.”
Andrew Ng, 28.02.2024
“Building on top of open source can mean hours wrestling with package
dependencies, or sometimes even juggling multiple virtual environments
or using multiple versions of Python in one application. This is
annoying but manageable for experienced developers, but creates a lot of
friction for new AI developers entering our field without a background
in computer science or software engineering.”
Andrew Ng, 28.02.2024
What can you do with it?