Register and share your invite link to earn from video plays and referrals.

Search results for LongCovid
LongCovid community
One keyword maps to one global community path.
Create community
People
Not Found
Tweets including LongCovid
Understand entire hour-long videos and wield tools and search — an efficient multimodal model with 30B total params but only 3B active at inference 🎬 Title: Kwai Keye-VL-2.0 Technical Report URL: 🎬 Overview An open-source multimodal foundation model from Kuaishou, built for long-video understanding and agentic intelligence. It's a Mixture-of-Experts (MoE) model with 30B total parameters but only 3B activated at inference. ❓ Challenges Solved Processing hour-level videos demands enormous compute. ・Many frames make long-range temporal dependencies hard to capture ・The challenge was addressing that compute constraint while keeping strong performance across diverse tasks 💡 Methodology & Proposed Approach ・Long-context: adapts DeepSeek Sparse Attention (DSA) to GQA-based architectures for lossless 256K context processing, capturing key frames and long-range temporal dependencies ・Infrastructure: scalable video I/O, heterogeneous ViT-LM parallelism, custom DSA kernels ・Training: Cross-Modal Multi-Teacher On-Policy Distillation (MOPD) with Context-RL and Video-RL to address catastrophic forgetting during multi-task alignment 📊 Experimental Results ・State-of-the-art among models of similar scale ・Especially strong on fine-grained temporal localization (TimeLens) ・Excels at long-video comprehension on Video-MME-v2 and LongVideoBench ・Also capable at multimodal agent collaboration across Code, Tool, and Search, with self-correction 🌍 Use Cases It fits long-video understanding, search, and moderation, plus backbones for video-handling autonomous agents. As the first application of sparse attention to multimodal at this scale, its big strength is making hour-level video processing cost-realistic. #VideoUnderstanding# #Multimodal#
Show more