LLMs in (March)2025

Open source is the new black

Alexandre Strube

March 26th, 2025

Website

https://helmholtz-blablador.fz-juelich.de

Play around! 🐶

OUTLINE

Present
Future

Present

EVERYTHING CHANGED

OpenAI is no longer the only game in town
The GPT-4 barrier was broken by Open Source (The 2023 version of GPT-4 is #63 on LM Arena)
Training costs are WAY down
Inference costs, too

The rise of China 🇨🇳

China is now a major player in AI
And they do a lot of open source!
2020: USA 11 LLMs, China 2 LLMs
2021: 31 LLMs on both countries (I need more recent data)

Diagram of thought

Trying to overcome limitations of Chain-Of-Thought (Andrew Chi-Chih Yao from Tsinghua University)

Huawei

Probably the most sanctioned company in the world
Selling AI Chips
Ascend 910C: claimed to be on par with Nvidia’s H100
(In practice, their chips are closed to A100)
Already made by SMIC (previous models were made in Taiwan by TSMC)
Has LLMs since 2023 on Huawei Cloud
- Can’t download, but can fine-tune and download the finetuned models
Servers in EU comply with EU regulations/AI act

China Telecom

Already has a 1 trillion parameter model using Huawei chips
TeleChat2-115b
Meanwhile, Deutsche Telekom gives me 16mbps in Jülich, like I had in 2005 in Brazil 😥
Using copper, like the ancient aztecs or whatever 🗿𓂀𓋹𓁈𓃠𓆃𓅓𓆣

Baichuan AI

Was the first model on Blablador after Llama/Vicuna in 2023
Has a 1 trillion parameter model for a year already
Baichuan 2 is Llama 2-level, Baichuan 4 (closed) is ChatGPT-4 level (china-only)

Baidu

Used to be their AI leader, playing catch-up
- Fell behind BECAUSE THEY ARE CLOSED SOURCE
Doing a lot of AI research
Ernie 4.5 from March 2025 (version 3 is open) - They swear they’ll open them in June 30th(?)
Ernie-Health
Miaoda (no-code dev)
I-Rag and Wenxin Yige (text to image)
Ernie X1 is 50% of DeepSeek’s price, same reasoning
100,000+ GPU clusters

Yi (01.ai)

Yi 1.5 pre-trained on 3.5T tokens and then produces it’s own data and re-train on it
Was the best model at the beginning of 2024 on benchmarks
IMHO was not so good in practice 😵‍💫
Yi Lightning is #12 on LmArena as of 26.01.2025
Yi VL, multimodal in June (biggest vision model available, 34b)
Yi Coder was the best code model until 09.2024
Went quiet after that (probably busy making money, last update 11.2024)
https://01.ai

StepFun

Step-2 is #8 on LLM Arena (Higher than Claude, Llama, Grok, Qwen etc)
Step-1.5V is a strong multimodal model
Step-1V generates images
No open source LLM available

Alibaba

Qwen2.5-1M: 1m tokens for context
Qwen2.5-VL: (21.03.2025) 32b parameters video model: can chat through camera, play games, control your phone etc
BLABLADOR: alias-code for Qwen2.5-coder
Better than Llama
Open weights on HuggingFace, modelscope and free inference too
28.01.2025: Qwen 2.5 MAX released (only on their website and api)
QwQ-32 is a reasoning model, available on Blablador (#13 on LM Arena)
LHM: From picture to animated 3d in seconds
CEO was warning yesterday about “data center bubble”

InternLM

InternLM-3 is an 8B model with 4x less training time than Llama 3
Released in 2025.01.15
Strube’s opinion: Previous versions were bad, need to check again

Tencent

Hunyuan-DiT LLM and can create images too
Hunyuan-T1 (24.03.2025) is a Mamba-like model with reasoning, beating o1
The Five: Home Robot

Zhipu

Real-time video conversations in commercial products, scientific paper reader, etc
Open source:
CodeGeex (runs directly on VSCode),
LLM: GLM4, GLM-130B
Visual model: CogVLM 17b/CogAgent 18b/GLM-4v-Plus
Image generator: GofView-3-Plus
CogVideoX: Video generator with 5b parameters
Auto Agent: AutoGLM (does things for you on your phone or web)
https://zhipuai.cn/en/

ByteDance: Doubao

Multi-modal
200x cheaper than OpenAI’s GPT-4o
Video generation diffusion models: Doubao-PixelDance and Doubao-Seaweed

OpenBMB

MiniCPM-o 2.6: A GPT-4o-level multimodal model with 8B parameters
MiniCPM-V 2.6: A image/video/ocr model with 8B parameters.

ModelScope

The chinese version of HuggingFace
Every model mentioned so far here is there
https://www.modelscope.cn

You ain’t here to hear about ModelScope or OpenBMB

🐕‍🦺🐩🐕🐶

DeepSeek R1

Launched in 2025.01.21
Took the world by storm (Commercial usage is 97% cheaper than OpenAI-o1)
R1 is on par with OpenAI-o1
675B parameters and distilled to 70, 32, and 8B
Costed a fraction of other models to train
MIT LICENSE
Free chat

When something seems to be too good to be true…

It probably is.

DeepSeek R1

Does very well on standard benchmarks
Not so well on not-so-popular hard benchmarks
(Strube’s opinion: maybe it was trained on the benchmarks too?)

DeepSeek R1 on AIW Benchmark

DeepSeek R1 on AIW Benchmark

Other opinions:

DeepSeek R1

A VERY good open model
Not so good as they claim
Everybody is reimplementing its techniques, as it’s so much cheaper than everything from 2024
HuggingFace is creating Open-R1 at https://github.com/huggingface/open-r1
Will shake the whole industry in ways I can’t fathom

Community notes matter

THEY DON’T STOP AND I CAN’T KEEP UP WITH THIS

New model relased in January
DeepSeek Janus Pro 7B in 01.02
It not only understands multi-modalities, but also generates them
Best model understanding images, best model generating images
https://huggingface.co/deepseek-ai/Janus-Pro-7B

THEY DON’T STOP AND I CAN’T KEEP UP WITH THIS

DeepSeekMath 7B (05.02.2024!!!) created “Group Relative Policy Optimization” (GRPO) for advanced math reasoning
Now GRPO is widely used to improve Reinforcement Learning on general LLMs
DeepSeek-V3-0324 came out monday, doesn’t even have a README yet (post-training update?)

The LLM ecosystem: 🇨🇦

Cohere AI
- Aya: group of models, from 8 to 32b parameters, multi-modal, multi langage etc
  - Strube’s opinion: they were not so good in 2024
- C4AI Command A: 111b model
  - Text-only, never tried

The Open LLM ecosystem: 🇺🇸

Google: Gemini is closed and state-of-the-art, gemma is open and good
- Gemini 2.5 released 24.03.2025, #1 on LM Arena
Microsoft has tiny, bad ones (but I wouldn’t bet against them - getting better)
- They are putting money on X’s Grok
Twitter/X has Grok-3 for paying customers, Grok-1 is enormous and “old” (from march)
- Colossus supercomputer has 200k gpus, aiming for 1M
- Grok was #1 on the leaderboard until yesterday
Apple is going their own way + using ChatGPT
- Falling behind? Who knows?

The LLM ecosystem: 🇺🇸

Meta has Llama, and somehow it’s making money out of it
- Training Llama4 with lessons learned from DeepSeek
Anthropic is receiving billions from Amazon, MS, Pentagon etc but Claude is completely closed
Amazon released its “Nova” models in February. 100% closed, interesting tiering (similar to blablador)
Nvidia has a bunch of intersting stuff, like accelerated versions of popular models, and small speech/translation models among others
OpenAI is not open at all. Published a new image generator model yesterday.

The “open” LLM ecosystem: 🇺🇸

Outside of Academia, there’s OLMo from AllenAI
- Has training code, weights and data, all open
Intellect-1 was trained collaboratively
- Up to 112 H100 GPUs simultaneously
- They claim overall compute utilization of 83% across continents and 96% when training only in the USA
- Fully open
Nous Research Hermes 3, announced in January 2025
Fine-tuned from Llama 3.1 on synthetic data
Training live, on the interent

The LLM ecosystem 🇪🇺

Mistral.ai just came with a new small model
Fraunhofer/OpenGPT-X/JSC has Teuken-7b
DE/NL/NO/DK: TrustLLM
SE/ES/CZ/NL/FI/NO/IT: OpenEuroLLM

🇪🇺

A lot of people tell me: “HuggingFace is French and is a great success”
HuggingFace was based in NYC from the beginning

🇪🇺

Beginning of February, the French government released their LLM.

(Side note: No model in blablador has ever said that the “square root of goat is one”)

Potato

Non-transformer architectures

First one was Mamba in August 2024
Jamba from A21 in March
Last I checked, LFM 40B MoE from Liquid/MIT was the best one (from 30.09)
Performs well on benchmarks
What about real examples?
Some mathematical discussions about it being turing-complete (probably not)
Other example is Hymba from NVIDIA (tiny model, 1.5B)

Take the slides with you

Questions?

No dogs have been harmed for this presentation

Extra slides

LLMOps resource

A No-BS Database of How Companies Actually Deploy LLMs in Production: 300+ Technical Case Studies, Including Self-Hosted LLMs in https://www.zenml.io/llmops-database

“I think the complexity of Python package management holds down AI application development more than is widely appreciated. AI faces multiple bottlenecks — we need more GPUs, better algorithms, cleaner data in large quantities. But when I look at the day-to-day work of application builders, there’s one additional bottleneck that I think is underappreciated: The time spent wrestling with version management is an inefficiency I hope we can reduce.”

Andrew Ng, 28.02.2024

“Building on top of open source can mean hours wrestling with package dependencies, or sometimes even juggling multiple virtual environments or using multiple versions of Python in one application. This is annoying but manageable for experienced developers, but creates a lot of friction for new AI developers entering our field without a background in computer science or software engineering.”

Andrew Ng, 28.02.2024

What can you do with it?

Like the slides? Want to use them?

Gitlab link to source code of the slides (needs JUDOOR account)

https://gitlab.jsc.fz-juelich.de/strube1/2025-01-talk-hai-retreat

LLMs in (March)2025 .cls-1 { fill: #ffefbf; } /* Dark mode: invert SVG colors */ @media (prefers-color-scheme: dark) { svg { filter: invert(1); } }

Website

OUTLINE

Present

EVERYTHING CHANGED

The rise of China 🇨🇳

Diagram of thought

Huawei

China Telecom

Baichuan AI

Baidu

Yi (01.ai)

StepFun

Alibaba

InternLM

Tencent

Zhipu

ByteDance: Doubao

OpenBMB

ModelScope

You ain’t here to hear about ModelScope or OpenBMB

🐕‍🦺🐩🐕🐶

DeepSeek R1

DeepSeek R1

When something seems to be too good to be true…

DeepSeek R1

DeepSeek R1 on AIW Benchmark

DeepSeek R1 on AIW Benchmark

Other opinions:

DeepSeek R1

Community notes matter

THEY DON’T STOP AND I CAN’T KEEP UP WITH THIS

THEY DON’T STOP AND I CAN’T KEEP UP WITH THIS

The LLM ecosystem: 🇨🇦

The Open LLM ecosystem: 🇺🇸

The LLM ecosystem: 🇺🇸

The “open” LLM ecosystem: 🇺🇸

The LLM ecosystem 🇪🇺

🇪🇺

🇪🇺

Potato

Non-transformer architectures

Take the slides with you

Questions?

Extra slides

LLMOps resource

What can you do with it?

Like the slides? Want to use them?

LLMs in (March)2025