LLMs in (March)2025

Open source is the new black

Alexandre Strube

March 26th, 2025

Website

https://helmholtz-blablador.fz-juelich.de
  • Play around! 🐶

OUTLINE

  • Present
  • Future

Present

EVERYTHING CHANGED

  • OpenAI is no longer the only game in town
  • The GPT-4 barrier was broken by Open Source (The 2023 version of GPT-4 is #63 on LM Arena)
  • Training costs are WAY down
  • Inference costs, too

The rise of China 🇨🇳

  • China is now a major player in AI
  • And they do a lot of open source!
  • 2020: USA 11 LLMs, China 2 LLMs
  • 2021: 31 LLMs on both countries (I need more recent data)

Diagram of thought

Trying to overcome limitations of Chain-Of-Thought (Andrew Chi-Chih Yao from Tsinghua University)

Huawei

  • Probably the most sanctioned company in the world
  • Selling AI Chips
  • Ascend 910C: claimed to be on par with Nvidia’s H100
  • (In practice, their chips are closed to A100)
  • Already made by SMIC (previous models were made in Taiwan by TSMC)
  • Has LLMs since 2023 on Huawei Cloud
    • Can’t download, but can fine-tune and download the finetuned models
  • Servers in EU comply with EU regulations/AI act

China Telecom

  • Already has a 1 trillion parameter model using Huawei chips
  • TeleChat2-115b
  • Meanwhile, Deutsche Telekom gives me 16mbps in Jülich, like I had in 2005 in Brazil 😥
  • Using copper, like the ancient aztecs or whatever 🗿𓂀𓋹𓁈𓃠𓆃𓅓𓆣

Baichuan AI

  • Was the first model on Blablador after Llama/Vicuna in 2023
  • Has a 1 trillion parameter model for a year already
  • Baichuan 2 is Llama 2-level, Baichuan 4 (closed) is ChatGPT-4 level (china-only)

Baidu

  • Used to be their AI leader, playing catch-up
    • Fell behind BECAUSE THEY ARE CLOSED SOURCE
  • Doing a lot of AI research
  • Ernie 4.5 from March 2025 (version 3 is open) - They swear they’ll open them in June 30th(?)
  • Ernie-Health
  • Miaoda (no-code dev)
  • I-Rag and Wenxin Yige (text to image)
  • Ernie X1 is 50% of DeepSeek’s price, same reasoning
  • 100,000+ GPU clusters

Yi (01.ai)

  • Yi 1.5 pre-trained on 3.5T tokens and then produces it’s own data and re-train on it
  • Was the best model at the beginning of 2024 on benchmarks
  • IMHO was not so good in practice 😵‍💫
  • Yi Lightning is #12 on LmArena as of 26.01.2025
  • Yi VL, multimodal in June (biggest vision model available, 34b)
  • Yi Coder was the best code model until 09.2024
  • Went quiet after that (probably busy making money, last update 11.2024)
  • https://01.ai

StepFun

  • Step-2 is #8 on LLM Arena (Higher than Claude, Llama, Grok, Qwen etc)
  • Step-1.5V is a strong multimodal model
  • Step-1V generates images
  • No open source LLM available

Alibaba

  • Qwen2.5-1M: 1m tokens for context
  • Qwen2.5-VL: (21.03.2025) 32b parameters video model: can chat through camera, play games, control your phone etc
  • BLABLADOR: alias-code for Qwen2.5-coder
  • Better than Llama
  • Open weights on HuggingFace, modelscope and free inference too
  • 28.01.2025: Qwen 2.5 MAX released (only on their website and api)
  • QwQ-32 is a reasoning model, available on Blablador (#13 on LM Arena)
  • LHM: From picture to animated 3d in seconds
  • CEO was warning yesterday about “data center bubble”

InternLM

  • InternLM-3 is an 8B model with 4x less training time than Llama 3
  • Released in 2025.01.15
  • Strube’s opinion: Previous versions were bad, need to check again

Tencent

  • Hunyuan-DiT LLM and can create images too
  • Hunyuan-T1 (24.03.2025) is a Mamba-like model with reasoning, beating o1
  • The Five: Home Robot

Zhipu

  • Real-time video conversations in commercial products, scientific paper reader, etc
  • Open source:
  • CodeGeex (runs directly on VSCode),
  • LLM: GLM4, GLM-130B
  • Visual model: CogVLM 17b/CogAgent 18b/GLM-4v-Plus
  • Image generator: GofView-3-Plus
  • CogVideoX: Video generator with 5b parameters
  • Auto Agent: AutoGLM (does things for you on your phone or web)
  • https://zhipuai.cn/en/

ByteDance: Doubao

  • Multi-modal
  • 200x cheaper than OpenAI’s GPT-4o
  • Video generation diffusion models: Doubao-PixelDance and Doubao-Seaweed

OpenBMB

  • MiniCPM-o 2.6: A GPT-4o-level multimodal model with 8B parameters
  • MiniCPM-V 2.6: A image/video/ocr model with 8B parameters.

ModelScope

You ain’t here to hear about ModelScope or OpenBMB

🐕‍🦺🐩🐕🐶

DeepSeek R1

DeepSeek R1

  • Launched in 2025.01.21
  • Took the world by storm (Commercial usage is 97% cheaper than OpenAI-o1)
  • R1 is on par with OpenAI-o1
  • 675B parameters and distilled to 70, 32, and 8B
  • Costed a fraction of other models to train
  • MIT LICENSE
  • Free chat

When something seems to be too good to be true…

  • It probably is.

DeepSeek R1

  • Does very well on standard benchmarks
  • Not so well on not-so-popular hard benchmarks
  • (Strube’s opinion: maybe it was trained on the benchmarks too?)

DeepSeek R1 on AIW Benchmark

Source: Jenia’s Twitter

DeepSeek R1 on AIW Benchmark

Source: Jenia’s Twitter

Other opinions:

DeepSeek R1

  • A VERY good open model
  • Not so good as they claim
  • Everybody is reimplementing its techniques, as it’s so much cheaper than everything from 2024
  • HuggingFace is creating Open-R1 at https://github.com/huggingface/open-r1
  • Will shake the whole industry in ways I can’t fathom

Community notes matter

Pickle is hungry

THEY DON’T STOP AND I CAN’T KEEP UP WITH THIS

  • New model relased in January
  • DeepSeek Janus Pro 7B in 01.02
  • It not only understands multi-modalities, but also generates them
  • Best model understanding images, best model generating images
  • https://huggingface.co/deepseek-ai/Janus-Pro-7B

THEY DON’T STOP AND I CAN’T KEEP UP WITH THIS

  • DeepSeekMath 7B (05.02.2024!!!) created “Group Relative Policy Optimization” (GRPO) for advanced math reasoning
  • Now GRPO is widely used to improve Reinforcement Learning on general LLMs
  • DeepSeek-V3-0324 came out monday, doesn’t even have a README yet (post-training update?)

The LLM ecosystem: 🇨🇦

  • Cohere AI
    • Aya: group of models, from 8 to 32b parameters, multi-modal, multi langage etc
      • Strube’s opinion: they were not so good in 2024
    • C4AI Command A: 111b model
      • Text-only, never tried

The Open LLM ecosystem: 🇺🇸

  • Google: Gemini is closed and state-of-the-art, gemma is open and good
    • Gemini 2.5 released 24.03.2025, #1 on LM Arena
  • Microsoft has tiny, bad ones (but I wouldn’t bet against them - getting better)
    • They are putting money on X’s Grok
  • Twitter/X has Grok-3 for paying customers, Grok-1 is enormous and “old” (from march)
    • Colossus supercomputer has 200k gpus, aiming for 1M
    • Grok was #1 on the leaderboard until yesterday
  • Apple is going their own way + using ChatGPT
    • Falling behind? Who knows?

The LLM ecosystem: 🇺🇸

  • Meta has Llama, and somehow it’s making money out of it
    • Training Llama4 with lessons learned from DeepSeek
  • Anthropic is receiving billions from Amazon, MS, Pentagon etc but Claude is completely closed
  • Amazon released its “Nova” models in February. 100% closed, interesting tiering (similar to blablador)
  • Nvidia has a bunch of intersting stuff, like accelerated versions of popular models, and small speech/translation models among others
  • OpenAI is not open at all. Published a new image generator model yesterday.

The “open” LLM ecosystem: 🇺🇸

  • Outside of Academia, there’s OLMo from AllenAI
    • Has training code, weights and data, all open
  • Intellect-1 was trained collaboratively
    • Up to 112 H100 GPUs simultaneously
    • They claim overall compute utilization of 83% across continents and 96% when training only in the USA
    • Fully open
  • Nous Research Hermes 3, announced in January 2025
  • Fine-tuned from Llama 3.1 on synthetic data
  • Training live, on the interent

The LLM ecosystem 🇪🇺

  • Mistral.ai just came with a new small model
  • Fraunhofer/OpenGPT-X/JSC has Teuken-7b
  • DE/NL/NO/DK: TrustLLM
  • SE/ES/CZ/NL/FI/NO/IT: OpenEuroLLM

🇪🇺

  • A lot of people tell me: “HuggingFace is French and is a great success”
  • HuggingFace was based in NYC from the beginning

🇪🇺

  • Beginning of February, the French government released their LLM.

(Side note: No model in blablador has ever said that the “square root of goat is one”)

Potato

Non-transformer architectures

  • First one was Mamba in August 2024
  • Jamba from A21 in March
  • Last I checked, LFM 40B MoE from Liquid/MIT was the best one (from 30.09)
  • Performs well on benchmarks
  • What about real examples?
  • Some mathematical discussions about it being turing-complete (probably not)
  • Other example is Hymba from NVIDIA (tiny model, 1.5B)

Take the slides with you

https://go.fzj.de/2025-eum

Questions?

No dogs have been harmed for this presentation

Extra slides

LLMOps resource

A No-BS Database of How Companies Actually Deploy LLMs in Production: 300+ Technical Case Studies, Including Self-Hosted LLMs in https://www.zenml.io/llmops-database

“I think the complexity of Python package management holds down AI application development more than is widely appreciated. AI faces multiple bottlenecks — we need more GPUs, better algorithms, cleaner data in large quantities. But when I look at the day-to-day work of application builders, there’s one additional bottleneck that I think is underappreciated: The time spent wrestling with version management is an inefficiency I hope we can reduce.”

Andrew Ng, 28.02.2024

“Building on top of open source can mean hours wrestling with package dependencies, or sometimes even juggling multiple virtual environments or using multiple versions of Python in one application. This is annoying but manageable for experienced developers, but creates a lot of friction for new AI developers entering our field without a background in computer science or software engineering.”

Andrew Ng, 28.02.2024

What can you do with it?

Like the slides? Want to use them?

Gitlab link to source code of the slides (needs JUDOOR account)

https://gitlab.jsc.fz-juelich.de/strube1/2025-01-talk-hai-retreat