vitalik.eth(@VitalikButerin):Updates since then: * Deepseek v4 is out. There *is* a 2-bit quant that can run within 90 GB ( https://t.co/yM1HMZXkXn ), and it works, however it's only fast on Apple hardware (I've head ~35 tok/s). On AMD, it's ~7 tok/s. IMO actually taking the effort to properly support more than one hardware manufacturer is a great example of the difference between mere "decentralized AI" and genuine "CROPS AI". I hope we can become better at this. * https://t.co/CFYF1smBH3 also has alpha telegram support now. However, the path to adding your account is quite janky * https://t.co/za4h233eYz looks promising as a way to run "dense" models (eg. Qwen 27B) more efficiently. It's janky, but on my 5090 laptop it seems to be ~2x more tok/s than llama.cpp * VoxTerm (local AI recording, no third-party servers) continues to be developed https://t.co/GSdKzkD9Ql And there's a lot more projects coming on the horizon. One other thing that has been on my mind is that there's actually a lot of intersection between "CROPS ethereum access layer" and "CROPS AI". For example, we want a ZK way to make (paid) calls to remote LLMs. But if we have this, then it's just as useful for solving another problem: private RPC reads in Ethereum. Another example: application-specific finetuned LLMs. Leanstral ( https://t.co/ilfww8ekJu ; I get ~38 tok/s on AMD) fits into < 70 GB, but can hold its own against 1T models on writing Lean code. Things like this are a huge boon for writing more secure code ( https://t.co/6YPWgVSzCg ). We should have models finetuned for Ethereum-related use cases as well.

2026.05.27 20:51

Updates since then: * Deepseek v4 is out. There *is* a 2-bit quant that can run within 90 GB ( ), and it works, however it's only fast on Apple hardware (I've head ~35 tok/s). On AMD, it's ~7 tok/s. IMO actually taking the effort to properly support more than one hardware manufacturer is a great example of the difference between mere "decentralized AI" and genuine "CROPS AI". I hope we can become better at this. * also has alpha telegram support now. However, the path to adding your account is quite janky * looks promising as a way to run "dense" models (eg. Qwen 27B) more efficiently. It's janky, but on my 5090 laptop it seems to be ~2x more tok/s than llama.cpp * VoxTerm (local AI recording, no third-party servers) continues to be developed And there's a lot more projects coming on the horizon. One other thing that has been on my mind is that there's actually a lot of intersection between "CROPS ethereum access layer" and "CROPS AI". For example, we want a ZK way to make (paid) calls to remote LLMs. But if we have this, then it's just as useful for solving another problem: private RPC reads in Ethereum. Another example: application-specific finetuned LLMs. Leanstral ( ; I get ~38 tok/s on AMD) fits into < 70 GB, but can hold its own against 1T models on writing Lean code. Things like this are a huge boon for writing more secure code ( ). We should have models finetuned for Ethereum-related use cases as well.

151

750

커뮤니티로 전달