Lysandre (@LysandreJik)

Lysandre@LysandreJik

2026.05.12 07:32

Non AI-generated PR descriptions with explanatory graphs are actually so satisfying

Arthur Zucker@art_zucker

2026.05.12 07:30

To make amends for failing, here is a gift: visualization of the attention mask for DSv4 CSA and HCA layers! 🤗

0

2

21

1

Forward to community

Lysandre Reposted

Arthur Zucker@art_zucker

2026.05.12 02:37

This is going to be a little bit long, but I want to give hope to my fellow anxious ML engineers. We see a lot of propaganda on how this or that AI one shotted something, about how incredibly strong the models are getting and how we don't even need to review PRs and we can just ship to production. Although this can be true for some cases, its also far from being representative of all the challenges we have to face. I started using claude code 4 month ago, and quickly realized how it really does change the way we work. I can experiment 10x faster, fix small issues without coding and refactor code without sweating. BUT, these tasks were "just" tedious and not hard. The challenge in my day to day work is to take a research code and integrate it into transformers using our standards. Its challenging because code beauty is abstract and subjective just like a philosophy. By relying too much on claude, and on how seemingly good the code it produces look, I pushed the deepseekv4 integration without realizing that claude really did not understand the model. I gave it access to `transformers`, the original paper, the original code, the different blog posts and my past chats and skills created to add a model, a b200 node node and a LOT of tokens, but it did NOT nail it. It did not understand the eager attention path, it did not understand the basics of causal attention. It was even wrong implementing the manifold constrained hyper connections. It helped to reduce the burden of exploring implementation and debugging but it did not help reason around the model. I am not a doomer, I think our job as Software Engineers has never been this great, I am just saying that we still have a job, and we should still be a bit careful when it looks to good to be true 😉

0

10

211

20

Forward to community

Lysandre@LysandreJik

2026.05.11 14:32

I want to live in a world where I only need to think about the cost of my infra for my tooling to run; not my token limits. Excited to work with @onusoz on enabling open agent harnesses on local hardware: handle your own setup, ensure reliability, get more out of your agents.

Onur Solmaz@onusoz

2026.05.11 12:20

I have a new job! Excited to announce that I will be working with Hugging Face to make local models work great in OpenClaw and other open agent harnesses! I will be building in public and documenting everything along the way, stay tuned!

0

4

34

3

Forward to community

Lysandre Reposted

Onur Solmaz@onusoz

2026.05.11 12:20

I have a new job! Excited to announce that I will be working with Hugging Face to make local models work great in OpenClaw and other open agent harnesses! I will be building in public and documenting everything along the way, stay tuned!

0

154

1.3K

44

Forward to community

Lysandre Reposted

Arthur Zucker@art_zucker

2026.04.27 08:38

Reading @deepseek_ai 's v4 paper.... absolute hats off. Every problem has a mathematical solution, nothing is left to chance. I have so much respect for them, putting out months or years of efforts entirely for free, in the open for anyone to benefit. Real goats 🫡

0

75

4.6K

377

Forward to community

Lysandre Reposted

Isalia20@Is36E

2026.04.24 12:11

This marks the end of my first week at @huggingface! I'm joining as a founding engineer on HF's PyTorch team. My first project: safetensors on Mac is up to 3x faster🚀 Parallel reads straight into MPS unified memory, no CPU staging. MB Pro M5 Pro - Cold 16 GB: **2.97 → 8.23 GB/s** (2.8×) - Warm 3 GB: **10.3 → 26.6 GB/s** (2.6×)

0

6

155

7

Forward to community

Lysandre Reposted

merve@mervenoyann

2026.04.24 06:47

DSv4 genuinely shines in 1M context window and peak efficiency to run many agents/users 😍 shortly coming to transformers and we're making sure you get all the peak efficiency 🔥 @art_zucker

0

1

55

5

Forward to community

Lysandre Reposted

Pedro Cuenca@pcuenq

2026.04.20 17:44

Kimi K2.6 was released 1h ago, and it looks amazing! Here it's running with MLX (mlx-vlm) on two M3 Ultras (full 1T param VLM) 🔥

0

18

568

30

Forward to community

Lysandre Reposted

Pedro Cuenca@pcuenq

2026.04.16 15:01

🔈 Every model added to transformers has to be available on Apple Silicon 🍎 at once. We built a Skill and test harness for mlx-lm to get us closer 🔥 It's designed to help contributors AND support reviewers. Read on to see what we did and why it matters.

0

4

57

10

Forward to community

Lysandre@LysandreJik

2026.04.16 13:50

Great to see inference engines starting to leverage kernels on the Hub, in this case sglang. It's probably the easiest and fastest way to install flash attention and other specialized kernels right now.

0

20

1

Forward to community

Lysandre Reposted

Sayak Paul@RisingSayak

2026.04.14 09:25

We shipped a new repo type called "kernel" on the Hub. We want to democratize the whole ping-pong around packaging, distributing, and using custom kernels. This repo type is only available to a few community partners, @sgl_project being the first! Hop in 🧵for more details.