Register and share your invite link to earn from video plays and referrals.

Xuan-Son Nguyen
@ngxson
Engineer @huggingface
237 Following    6.5K Followers
Qwen3.6-27B running 100% on WebGPU. Not the best speed but still 😁
I think Reachy is the one who needs chess lessons… 😅 Robotics meets WebAI: Gemma 4 running fully offline on WebGPU with Transformers.js, controlling Reachy Mini over WebSerial. No internet, just a browser and a USB-C cable. What should Reachy play next?
Show more
Surreal to see Reachy Mini on the cover of the last @LinusTech video!
My data point: working on two projects in parallel with Pi + llama.cpp + Qwen-3.6-35B-A3B (I prefer the MoE 🙈) This works on my M1 Max (64 GB), which I bought 4.5 years ago. "Works" as in "you can get work done", not just "runs for a demo".
Show more
Normal activity at GOSIM Paris: ❌ Give or attend to a talk ❌ Networking 🤔 Watch cool Reachy Mini demo ✅ Show off your Home Assistant setup
Last week, we had a very playful but yet efficient off-site between @huggingface and @ggml_org . We brainstormed many UI/UX related subjects, many more to tackle in near future! It's a pleasure to meet everyone IRL and visit the beautiful capital of Bulgaria 🌹 @julien_c @victormustar @ggerganov and Alek
Show more
Kinda funny to think about it, but a hash function is also deterministic. It's just not Turing complete.
Interesting article on treating agent output like compiler output (and why)
Come and watch our cool robot demo!
I'll be at GOSIM Paris May 5-6 with a Reachy Mini booth at @joinstationf, presenting the Reachy Mini Conversation App on May 6 Stop by and chat with the robot – don't forget to ask it to show you its dances! See you soon!
Show more
I'm giving a talk at GOSIM 2026 about llama.cpp. It will be a high-level overview of what we archived in the past one year. Get your ticket here -->
Taking my flight this WE too, will try 😁
This is where we are right now. And i’m not gonna lie it feels pretty magical 🧚‍♀️ Qwen3.6 27B running inside of Pi coding agent via Llama.cpp on the MacBook Pro For non-trivial tasks on the @huggingface codebases, this feels very, very close to hitting the latest Opus in Claude Code, or whatever shiny monopolistic closed source API of the day is. In full airplane mode. Most people haven’t realized this yet. If you have, it means you have a huge headstart to what I call the second revolution of AI. Powerful local models for efficiency, security, privacy, sovereignty 🔥
Show more
did you know that huggingface_hub (just the Python client) is sending almost 6B requests/week? wow 😮 @huggingface
We're opening a Hugging Face office in Tokyo! Our goal: help open-source AI develop in Japan and grow the local community. Let's meet! ハギングフェイスの東京オフィスがオープンしました! 私たちの目標は、日本におけるオープンソースAIの発展を支援し、ローカルコミュニティを育てることです。ぜひお会いしましょう!
Show more
0
131
3.3K
478
Forward to community
I stopped using claude code on all of my llama.cpp workflows for the past few days. The quality degradation is just too significant. Experimenting on a mixed usage between Gemma 4 26B-A4B and Gemini 3.1 Pro, so far much better than what anthropic can offer.
Show more
Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right
Show more
opus 4.7 slightly more dangerous, slightly more expensive OR: run local models!
Given the right harness, you can just do everything you want
"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug."
Show more
Having a small break today! I'm taking a step back to reflect on my motivations and what I value when working on open source. Read my latest blog post 👇
llama.cpp now supports Qwen3-ASR, Qwen3-Omni and Gemma 4 audio/vision input 🔥 Mixed modalities is the future 😼😼
llama.cpp now supports various small OCR models that can run on low-end devices. These models are small enough to run on GPU with 4GB VRAM, and some of them can even run on CPU with decent performance. In this post, I will show you how to use these OCR models with llama.cpp 👇
Show more
While working on the pre-release support of gemma 4, I was surprised by its capabilities compared to their size. We're tapping on the surface here, there are more and more to discover about gemma 4. I'm excited to see what the community will do with it in the next few days 🚀🚀
Show more