Practical AI Methodology Meets Cognitive Science|Looking for Ricursive (the AI chip design company)? You want ricursive.com
AI/ML Reading List
Curated links with summaries. RSS feed ↗
- Incentive-Aligned Multi-Source LLM Summaries
Researchers propose a novel Truthful Text Summarization (TTS) framework that incentivizes accurate source reporting in multi-source LLM synthesis. The approach mathematically structures source validation to improve factual robustness without relying on ground-truth labels.
ai-mlresearch - Probabilistic distances-based hallucination detection in LLMs with RAG
Researchers propose a novel probabilistic method for detecting hallucinations in retrieval-augmented generation (RAG) systems by analyzing token embedding distributions. The approach offers an unsupervised, computationally efficient technique for improving LLM reliability.
ai-mlresearch - When Can Transformers Count to n?
Researchers discovered a critical performance threshold in transformers where embedding dimension relative to vocabulary size determines the model's ability to perform basic counting tasks. The study reveals fundamental architectural constraints that could impact future model design and understanding of transformer capabilities.
ai-mlresearch - Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
ML-Master 2.0 introduces Hierarchical Cognitive Caching to enable ultra-long-horizon autonomous scientific reasoning by dynamically distilling execution traces into stable knowledge. The research demonstrates a breakthrough in AI agents' ability to maintain strategic coherence over extended experimental cycles.
ai-mlresearchlong-signal:rdd - 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning
Researchers develop a multi-agent framework that decomposes privacy reasoning to reduce information leakage in large language models by up to 19%. The approach offers a systematic method for detecting and preventing contextual privacy breaches across different information flow topologies.
ai-mlresearchvelocity:hn-medium - CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
Researchers propose CCCaption, a reinforcement learning framework that optimizes image captioning by independently measuring caption completeness and correctness using large vision-language models. The approach offers a systematic method to generate more accurate and comprehensive image descriptions beyond traditional human-annotated references.
ai-mlresearch - Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts
Researchers investigated how large language models process information from human experts versus algorithmic agents, uncovering inconsistent trust behaviors. The study reveals complex biases in LLMs that could have significant implications for AI deployment in critical decision-making contexts.
ai-mlresearch - ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
Researchers develop ARLArena, a comprehensive framework for analyzing and stabilizing agentic reinforcement learning, introducing SAMPO method to mitigate training instabilities. The work provides a unified approach to policy gradient design with implications for large language model-based agent training.
ai-mlresearch - Large Language Model Compression with Global Rank and Sparsity Optimization
Researchers propose a new method for compressing large language models using global rank and sparsity optimization, addressing key challenges in model compression. The approach automatically detects layer redundancy and manages interactions between sparse and low-rank components.
ai-mlresearch - A General Equilibrium Theory of Orchestrated AI Agent Systems
Researchers develop a general equilibrium theory for orchestrated AI agent systems using advanced mathematical economics frameworks. The work provides a theoretical foundation for understanding complex interactions between large language model agents under centralized coordination.
ai-mlresearch - Whether Anthropic holds its ground is itself training material.ai-mlresearch
- I feel that Claude is better at teaching than Chatgptai-mlresearch
- I built a load tester for MCP servers - useful if you're connecting tools to Claudeai-mlresearch
- “Naming my son Claude”ai-mlresearch
- How Claude conquered Washington, Wall Street and Silicon Valley
Anthropic's Claude is transforming industry landscapes in national security, finance, and tech development, challenging existing power structures through advanced AI capabilities. The piece reveals emerging tensions between AI innovation and institutional constraints across multiple sectors.
ai-mlresearch - Claude Status Update : Elevated error rates across multiple models on 2026-02-25T17:21:44.000Zai-mlresearch
- Anyone else having trouble with claude this morning?ai-mlresearch
- Official: Anthropic acquires Vercept AI to advance Claude's computer use capabilitiesai-mlresearch
- The Pentagon Threatens Anthropicai-mlresearchvelocity:hn-medium
- Gemini Can Now Book You an Uber or Order a DoorDash Meal on Your Phone. Here’s How It Worksai-mlresearch
- Anthropic ditches its core safety promise in the middle of an AI red line fight with the Pentagon
Anthropic appears to have modified its core safety commitments, potentially signaling a strategic realignment with defense/government interests. This development could have broader implications for AI safety principles and corporate ethics in AI development.
ai-mlresearch - I open-sourced the MCP server and prose scanner I built for my 301k-word novel project: fiction-forge gives Claude Code real-time access to your story bible, characters, and continuity rules
A developer open-sourced a Model Context Protocol server and prose scanning system for AI-assisted novel writing, providing tools for continuity checking, pattern detection, and parallel editing workflows. The project offers insights into advanced prompt engineering and AI collaboration techniques for large-scale creative writing.
ai-mlresearchboost:open-source - [Discussion] A notation for contextual inference in probabilistic models
A new notation (D ⊙ M(ψ)) is proposed to explicitly represent contextual conditioning in probabilistic inference, aiming to clarify how observations are interpreted relative to modeling assumptions. The notation seeks to enhance transparency in machine learning and scientific modeling frameworks by making contextual integration more explicit.
ai-mlresearch - Riley Walz, the Jester of Silicon Valley, Is Joining OpenAIai-mlresearch
- Hegseth threatens to force AI firm to share tech, escalating Anthropic standoffai-mlresearch
- Official: An update on model deprecation commitments for Claude Opus 3ai-mlresearch
- Claude Opus 3 is being deprecated, and getting a blog!ai-mlresearch
- Judge: xAI can’t claim OpenAI stole trade secrets just by hiring ex-staffersai-mlresearch
- Claude Code with subagents inside subagents cooked for 3 days - Delivered 3D renderer that draws with terminal symbols
A developer used Claude AI with multiple nested subagents to build 'tortuise', a terminal-based 3D renderer using Gaussian splatting techniques. The project highlights advanced AI collaborative coding strategies and produced an innovative open-source rendering tool.
ai-mlresearch - Scoop: Pentagon takes first step toward blacklisting Anthropic
The Pentagon is considering blacklisting Anthropic as a supply chain risk due to disagreements over AI model usage in military contexts. The conflict centers on Anthropic's refusal to remove safeguards around surveillance and autonomous weapons deployment.
ai-mlresearch - Sensory-Motor Control with Large Language Models via Iterative Policy Refinement
Researchers develop a method for large language models to generate and iteratively refine control policies for physical agents by mapping observations to actions. The approach shows promise for translating symbolic reasoning into sub-symbolic motor control across multiple benchmark environments.
ai-mlresearch - AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs
Researchers introduce AdapTools, an adaptive framework for testing prompt injection attacks on AI agents that significantly improves attack success rates. The study provides critical insights into emerging security challenges in agentic large language models.
ai-mlresearch - OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services
OptiLeak presents a novel reinforcement learning approach to prompt leakage attacks in multi-tenant LLM services, demonstrating 12.48x efficiency improvement in reconstructing sensitive information. The research significantly advances understanding of side-channel vulnerabilities in shared model caching infrastructures.
ai-mlresearch - Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling
Researchers propose Multimodal Crystal Flow (MCFlow), a unified transformer model for crystal structure generation that can handle multiple modalities and tasks. The approach introduces a novel atom ordering technique that incorporates compositional and crystallographic priors without relying on explicit structural templates.
ai-mlresearch - Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
Researchers propose a novel framework using large language models and a Digital Forensic Knowledge Graph to improve the reliability and traceability of AI-generated digital evidence. The study demonstrates over 95% accuracy in forensic artifact extraction, addressing key concerns about AI credibility in investigative contexts.
ai-mlresearch - Safe Reinforcement Learning for Real-World Engine Control
Researchers developed a safety-constrained reinforcement learning approach for engine control in Homogeneous Charge Compression Ignition mode, achieving precise pressure regulation with real-time safety monitoring. The work offers a novel toolchain for applying RL in safety-critical environments with demonstrated performance and adaptability.
ai-mlresearchvelocity:hn-medium - Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task
Researchers augmented language model lateral thinking capabilities by training on humor and riddle datasets for the SemEval 2024 BRAINTEASER task. The approach achieved significant accuracy improvements, demonstrating potential for enhancing non-linear reasoning in AI systems.
ai-mlresearchvelocity:hn-medium - A Survey on the Optimization of Large Language Model-based Agents
Researchers provide a holistic review of Large Language Model agent optimization techniques, categorizing approaches into parameter-driven and parameter-free methods. The survey bridges critical gaps in understanding how to improve LLM agent performance across complex decision-making environments.
ai-mlresearchvelocity:hn-medium - XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence
XMorph introduces a hybrid deep learning framework for brain tumor classification that combines advanced explainability techniques with high accuracy. The approach bridges critical gaps in medical AI by providing interpretable insights while maintaining 96% classification performance.
ai-mlresearch - An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
RARE-PHENIX introduces an end-to-end AI framework for rare disease phenotyping using large language models, demonstrating significant performance improvements in extracting and standardizing clinical phenotypes. The research offers a clinically aligned approach to automating complex diagnostic processes, with validation across multiple medical centers.
ai-mlresearch - Show HN: Context Mode – 315 KB of MCP output becomes 5.4 KB in Claude Code
Context Mode demonstrates a novel code compression technique that reduces model output size from 315 KB to 5.4 KB. The work suggests potential efficiency gains in model compression and deployment strategies.
ai-mlresearchvelocity:hn-medium - Claude Code Remote Controlai-mlresearch
- New: Slack has rolled out a new plugin for Claude code, details belowai-mlresearch
- LLM=Trueai-mlresearchvelocity:hn-medium
- Show HN: A real-time strategy game that AI agents can playai-mlresearch
- [R] Understanding targeted LLM fine-tuning
Researchers develop a systematic framework for selecting instructions during LLM fine-tuning, demonstrating how gradient-based representations and selection strategies impact model performance across different budgets and tasks. The work provides a unified theoretical perspective with concrete practical guidelines for researchers and practitioners.
ai-mlresearch - I’ve just switched from chat gpt. Here’s my observations (heavy user)ai-mlresearch
- Get quantitative insights from text responses with Gemini in Google Formsai-mlresearch
- Claude is the new windows / work interfaceai-mlresearch
- [R] 91k production agent interactions (Feb 1–23, 2026): distribution shift toward tool-chain escalation + multimodal injection — notes on multilabel detection + evaluation
Comprehensive study of AI agent interaction threats reveals evolving attack vectors including tool chain escalation, multimodal injection, and novel planning-phase attacks. Research provides critical insights into emerging AI security challenges through a multilabel classification approach.
ai-mlresearch