Register and share your invite link to earn from video plays and referrals.

Lysandre
@LysandreJik
Chief Open-Source Officer (COSO) at Hugging Face
636 Following    12K Followers
Non AI-generated PR descriptions with explanatory graphs are actually so satisfying
To make amends for failing, here is a gift: visualization of the attention mask for DSv4 CSA and HCA layers! 🤗
This is going to be a little bit long, but I want to give hope to my fellow anxious ML engineers. We see a lot of propaganda on how this or that AI one shotted something, about how incredibly strong the models are getting and how we don't even need to review PRs and we can just ship to production. Although this can be true for some cases, its also far from being representative of all the challenges we have to face. I started using claude code 4 month ago, and quickly realized how it really does change the way we work. I can experiment 10x faster, fix small issues without coding and refactor code without sweating. BUT, these tasks were "just" tedious and not hard. The challenge in my day to day work is to take a research code and integrate it into transformers using our standards. Its challenging because code beauty is abstract and subjective just like a philosophy. By relying too much on claude, and on how seemingly good the code it produces look, I pushed the deepseekv4 integration without realizing that claude really did not understand the model. I gave it access to `transformers`, the original paper, the original code, the different blog posts and my past chats and skills created to add a model, a b200 node node and a LOT of tokens, but it did NOT nail it. It did not understand the eager attention path, it did not understand the basics of causal attention. It was even wrong implementing the manifold constrained hyper connections. It helped to reduce the burden of exploring implementation and debugging but it did not help reason around the model. I am not a doomer, I think our job as Software Engineers has never been this great, I am just saying that we still have a job, and we should still be a bit careful when it looks to good to be true 😉
Show more
I want to live in a world where I only need to think about the cost of my infra for my tooling to run; not my token limits. Excited to work with @onusoz on enabling open agent harnesses on local hardware: handle your own setup, ensure reliability, get more out of your agents.
Show more
I have a new job! Excited to announce that I will be working with Hugging Face to make local models work great in OpenClaw and other open agent harnesses! I will be building in public and documenting everything along the way, stay tuned!
Show more
I have a new job! Excited to announce that I will be working with Hugging Face to make local models work great in OpenClaw and other open agent harnesses! I will be building in public and documenting everything along the way, stay tuned!
Show more
0
154
1.3K
44
Forward to community
Reading @deepseek_ai 's v4 paper.... absolute hats off. Every problem has a mathematical solution, nothing is left to chance. I have so much respect for them, putting out months or years of efforts entirely for free, in the open for anyone to benefit. Real goats 🫡
Show more
0
75
4.6K
377
Forward to community
This marks the end of my first week at @huggingface! I'm joining as a founding engineer on HF's PyTorch team. My first project: safetensors on Mac is up to 3x faster🚀 Parallel reads straight into MPS unified memory, no CPU staging. MB Pro M5 Pro - Cold 16 GB: **2.97 → 8.23 GB/s** (2.8×) - Warm 3 GB: **10.3 → 26.6 GB/s** (2.6×)
Show more
DSv4 genuinely shines in 1M context window and peak efficiency to run many agents/users 😍 shortly coming to transformers and we're making sure you get all the peak efficiency 🔥 @art_zucker
Show more
Kimi K2.6 was released 1h ago, and it looks amazing! Here it's running with MLX (mlx-vlm) on two M3 Ultras (full 1T param VLM) 🔥
🔈 Every model added to transformers has to be available on Apple Silicon 🍎 at once. We built a Skill and test harness for mlx-lm to get us closer 🔥 It's designed to help contributors AND support reviewers. Read on to see what we did and why it matters.
Show more
Great to see inference engines starting to leverage kernels on the Hub, in this case sglang. It's probably the easiest and fastest way to install flash attention and other specialized kernels right now.
Show more
We shipped a new repo type called "kernel" on the Hub. We want to democratize the whole ping-pong around packaging, distributing, and using custom kernels. This repo type is only available to a few community partners, @sgl_project being the first! Hop in 🧵for more details.
Show more
Super relevant post in the context of evals, and the difficulty to actually evaluate what you want