Search Retrieval on X — X Web Viewer

2025.11.19 23:17

Grok 4.1 Fast excels at real-time info retrieval and deep research. Paired with native X integration, code execution, and advanced web browsing, Grok 4.1 Fast + Agent Tools API tops agentic search benchmarks.

0

5

233

21

Forward to community

cv usk@cv_usk

2026.06.15 22:31

Make classic BM25 search smarter without rebuilding expensive neural indexes — by optimizing query rewriting one token at a time. A genuinely clever approach 🔎 Title: STORM: Stepwise Token Optimization with Reward-Guided Beam Search URL: 🔎 Overview STORM trains a query-rewriting model guided by retrieval quality. At each generated token, it scores candidate expansions against a BM25 index, concentrating exploration on the vocabulary that actually improves search. ❓ Challenges Solved Modern retrieval leans on dense and learned-sparse neural models, while lexical methods like BM25 are fast but weak on synonyms and paraphrases. ・Dense neural retrievers need expensive index rebuilds whenever the model changes ・LLM query rewriting tends to produce well-formed but retrieval-ineffective or harmful terms ・Training gives only delayed sequence-level feedback, obscuring which individual terms actually helped 💡 Methodology & Proposed Approach ・Self-supervised training via reward-guided beam search driven by retrieval performance ・At each token, candidate expansions are scored against BM25 and low performers pruned ・This turns retrieval metrics into token-level signals, focusing search on effective vocabulary ・Using BM25 indexes means no neural index rebuilding — infrastructure stays light 📊 Experimental Results ・0.6B-8B models match or exceed competitive LLM rewriters ・Maintains BM25's speed advantage ・The 8B variant rivals much larger proprietary systems ・Zero-shot transfer to 18 languages (MIRACL) beats dedicated multilingual dense retrievers on average 🌍 Use Cases It fits search stacks that want to avoid index-rebuild costs, and systems that need cheap multilingual boosts. Since it lifts performance while keeping an existing BM25 pipeline, it's an easy-to-adopt answer for teams running search in production. #Retrieval# #BM25#

0

Forward to community

Caura@CauraAI

2026.05.20 11:11

Governed shared memory for AI agent fleets. Open source. Apache 2.0. Hybrid retrieval, contradiction detection, audit trail, 12 MCP tools. Five minutes from `git clone` to a running stack — on your laptop, your cluster, or your air-gapped network.

0

21

290

71

Forward to community

Kimi.ai@Kimi_Moonshot

2026.02.03 14:46

We're introducing WorldVQA, a new benchmark to measure atomic vision-centric world knowledge in Multimodal Large Language Models. Current evaluations often conflate visual knowledge retrieval with reasoning. In contrast, WorldVQA decouples these capabilities to strictly measure "what the model memorizes." The benchmark consists of 3,500 VQA pairs across 9 categories, with careful attention to linguistic and cultural diversity:

0

32

845

98

Forward to community

cv usk@cv_usk

2026.06.18 02:56

# Practices for Embedding AI Agents in Software # Context Budget Allocator 🎯 The Hook "Just put everything in the context window" sounds reasonable until your costs spike, your system instructions get pushed out, and the LLM ignores the most important retrieved documents buried in the middle. 🔥 The Problem In RAG-powered agents, search results, conversation history, system instructions, and long-term memory all compete for the same finite token window. More input means higher cost but not necessarily better output. As conversations grow, system instructions shrink proportionally and behavior degrades. The "Lost in the Middle" phenomenon means information placed in the center of a long context gets less attention than content at the beginning or end. 💡 The Pattern Divide the context window into named slots (system instructions, retrieval, history, memory) each with a maximum token ratio and priority. Reserve system instructions as a non-compressible fixed slot at 10-20%. Cap retrieval results at a reranked top-k of 3-8 documents. Compress conversation history via summarization when window usage exceeds a threshold. Arrange content to counter Lost in the Middle: critical information first, recent user input last. The higher the cost sensitivity, the tighter the top-k, the lower the compression threshold, and the shorter the history retention. ✅ When to Use Use when: - RAG or memory is active and candidate content could exceed 50% of the model's context window - Cost sensitivity is medium or higher, with token volume affecting both cost and inference latency - Multi-turn conversations accumulate history that crowds out other content types Don't use when: - Input is just system instructions plus a single user message, fitting within 30% of the window - Using a long-context model with input under 20% of the window and low cost sensitivity ⚠️ Pitfalls - Never compress system instructions. Losing tool definitions or safety rules breaks agent behavior entirely - Raw top-k without reranking has low signal density. Retrieve 20 candidates, rerank to 3-8 with a cross-encoder - Summarization is lossy. Key decisions and proper nouns can vanish. Combine with keyword extraction to preserve critical terms 🔧 Implementation Approach - Model the context window as named slots (system/user/retrieval/history/memory) with a struct defining max token ratio, priority, and compressibility per slot - Reserve system instructions as the highest-priority non-compressible fixed allocation, then distribute remaining budget to other slots in descending priority order - Cap retrieval content by reranking vector search candidates with a cross-encoder before fitting within the slot budget, maximizing signal density - Trigger summarization compression on the history slot when it exceeds budget, combining with keyword extraction to prevent loss of critical terms #AIAgents# #SoftwareArchitecture#

0

1

0

Forward to community

cv usk@cv_usk

2026.06.17 01:04

Harness Engineering Anti-Patterns AP1. The Context Hoarder 🎯 Point "Include everything just in case" — that feeling of safety is silently killing your agent's performance. More information isn't safer; it's often harmful. ❗ Problem The context window overflows with irrelevant information, diluting the agent's attention. Important information gets buried in the middle, leaving insufficient space for the code and specs the agent actually needs. The result: degraded judgment accuracy, with cost and latency worsening super-linearly. 🔍 Mechanism & Symptoms This anti-pattern is seductive because the intuition "more info = safer" is strong. Designing retrieval (what to fetch and when) takes effort, so stuffing everything in feels easier. But the context window is a scarce resource like a CPU's L1 cache. Utility does not increase monotonically. Beyond a threshold, irrelevant tokens dilute the attention mechanism, causing "lost in the middle" — critical info buried in the center gets ignored. Symptoms include injecting entire repositories, full conversation histories, all tool definitions on every call, and "the agent inexplicably ignores information it already read." 📋 Scenarios - An issue-to-PR agent is fed not just the issue but the entire repo's README, config files, and past PR history. The agent misses the issue's key points and starts editing unrelated files. - A migration agent receives thousands of files at once, overflowing context and losing consistency mid-task. - In pair programming, unopened files and long conversation history consume context, slowing responses. 🛡 How to Avoid - Measure context usage by category and visualize allocation like a memory profiler - Default to pull (let the agent fetch what it needs) and limit push (force injection) to only invariants that are fatal to violate - Summarize and compress older conversation turns; dynamically load only task-relevant tool definitions - When you catch yourself thinking "include everything for safety," recognize that as this anti-pattern's signature #HarnessEngineering# #AIAgent#

0

Forward to community

cv usk@cv_usk

2026.06.16 21:38

Grounding an AI agent's answers in verifiable, explainable facts — an open-source platform offering the full knowledge-graph + GraphRAG + agent stack 🕸️ Title: trustgraph-ai/trustgraph URL: 🕸️ Overview An open-source semantic deployment platform for AI agents. Its core is the "context graph" — a structured, queryable representation of domain knowledge. It delivers the full agentic stack — context graphs, memory, retrieval, orchestration, and inference — for deterministic agent workloads. ❓ Challenges Solved With an LLM alone, it's hard to trace why you got an answer, and hallucination is a risk. ・Grounding an agent's answers in verifiable, explainable facts is difficult ・TrustGraph combines knowledge-graph construction with GraphRAG so agents access context that is semantically rich and verifiable ・And it runs in private deployments with sovereign control 💡 Key Features ・Multi-model DB (tabular, KV, document, graph, vectors) with multimodal support and automated entity/relationship extraction ・DocumentRAG, GraphRAG, and OntologyRAG pipelines, plus 3D GraphViz visualization ・Single/multi-agent with ReAct, Plan-then-Execute, and Supervisor patterns, and MCP integration ・Context Cores: bundle schema, graph, embeddings, evidence, and retrieval policies — versioning context like code 🌍 Tech Stack / Usage Storage on Cassandra, Qdrant, and Garage; messaging via Pulsar and others; LLMs from Anthropic/OpenAI/Google etc. plus local inference (vLLM/Ollama, etc.). Configure via npx @trustgraph/config and use the UI on port 8888. Apache 2.0 licensed. #GraphRAG# #KnowledgeGraph#

0

Forward to community