登録して招待リンクを共有すると、動画再生報酬と紹介報酬を獲得できます。

Zihan "Zenus" Wang
@wzenus
Reasoning agent / RL / efficiency research @NorthwesternU & incoming @nvidia. Ex @Microsoft @yutori_ai @deepseek_ai @uiuc_nlp @RUC1937.
参加 March 2022
665 フォロー中    23K ファン
[Long Tweet Ahead] I just have to say, I’m genuinely impressed by DeepSeek. 💡 It’s no wonder their reports are so elegant and fluffless. Here’s what I noticed about their culture, a space where real innovation thrives, during my time there ↓ — — — — — 🌟 1. Be nice and careful to talents - The recruiting teams seek top talent from China & globally. Many are PhD / grad / undergrads from Chinese top 10 universities e.g., Tsinghua / Peking University. - Hiring is minimalist: My interview took only a few rounds. They basically check two criteria: Do you genuinely WANT to push fundamental AI problems forward? CAN you make it happen (at least one standout skill + solid skills to get things done)? - Roles seem shaped around the talent, instead of vice versa. Not like “we need a role, so we find a talent”, they basically ask: “Here’s an exceptional talent; how can they contribute?” This can lead to something unconventional: they can hire someone with expertise in MBTI who finally focuses on creating more personalized / role-playing models. - Something basic: Top-tier benefits in China, including for interns, allowing them to concentrate on work matters and worry less about material concerns. 🤝 2. Individualized HR culture - With above talent-first hiring logistics, even with a 200-people scale, I still feel everyone is unique and there is no such thing like a standardization where everyone can be replaced like a cog-in-machine. - No pressure or forced KPIs. I hardly feel any sense like “this must be done by this Thursday” from my mentor / seniors / colleagues. - Being collaborative. DeepSeek tries its best to forbid race inside the company. It’s like everyone contributes to the final model with their own (orthogonal) ideas and everyone hopes their idea is useful. If an idea is proved useful, everyone celebrates, and everyone is happy about it. ⚙️ 3. Disentangled development systems - DeepSeek covers a highly diverse set of talent directions. It’s like how “expert specialization” happens in their MoE models. People focus on what they’re best at, and it’s natural to ask others things out of their expertise. Helping others with one's expertise is not what people only do after completing their own work. - There is a shared basic pipeline that works pretty well for everyone. When a group adds new things to the system, they do really good documentation so others can know what happens in a minute and how it affects their own roles (most of the time, this won’t affect their work; they just feel things improve automatically). - Feedback loops are FAST: To verify whether ideas could work, is basically just to test whether it could work on the super-latest simplified baseline. I strongly feel whenever I have an idea in the morning, I can realize whether it’s effective in the afternoon -- no organization approval, no hard GPU utilization restrictions, little debugging (thanks to the rigorously debugged baseline), just try to seamlessly add my own idea to the model. This makes working there super reflective and feedback-rich at the beginning of an idea, even if many ablations are required later to finally merge the idea to the giant model. So all of the above makes the organization super Spontaneous-person-friendly, and maybe this is why you can always trust their tech paths even when many improvements / ideas are applied in each single model release. I do appreciate such disentangled organization, which makes fast and solid iterations at different angles in the model. 4. 🌍Diversity sparks innovation It’s not really about something like “we must consider every party”. They pay attention to inclusion but it’s not the biggest matter. The biggest matter lies in “How can people from diverse backgrounds contribute to the DeepSeek model?” I have many colleagues called know-it-all “百晓生”, a role-of-talent that DeepSeek hires. As an AI company, it’s interesting to see so many AI developers just from literature / social science backgrounds. They know little about machine learning formulas and could understand model training based on their intuition of babysitting a child. It’s fun to discuss Zhenhuan Zhuan (a Chinese history drama) during lunch and do a lot of mind-practice like how to survive in a squid game. The initial idea of this role-of-talent is to build a global knowledge base on history, culture, and science to expand AGI capabilities. However, I do feel how they contribute to working efficiency / nurturing ideas of all the team, at least, making everyone happy and more focused when getting back to work from lunch. — — — — — Something random I hope to share at the end: It’s fun to solve some challenges to realize individual value or get a sense of achievement. In fact, it matters what “challenge” you are facing. The “challenge” here could just be “how to achieve AGI” – in such case, you actually do not need to worry too much about “what if this idea has been tried by someone else”, “what if someone achieves AGI faster than me”, “what if this idea is too simple” or “what if someone get paid more than me” – things many are indeed worried about. When what someone care is about achieving AGI, they could just try relentlessly about what is really useful and incorporate them into the model. — — — — — Resources and References: Two interviews with DeepSeek founder Liang Wenfeng: DeepSeek hiring ads: And my experiences there.
もっと見る