Practical AI Methodology Meets Cognitive Science|Looking for Ricursive (the AI chip design company)? You want ricursive.com
AI/ML Reading List
Curated links with summaries. RSS feed ↗
- Mandatory In-Person Presentation in CVPR 2026 [D]ai-mlcommunity
- Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization [R]
Depth-Recurrent Transformers iterate on the TRM approach to improve out-of-distribution generalization by increasing computational depth rather than sequence length, with findings that intermediate supervision can degrade genuine reasoning by making statistical heuristics too accessible. The work addresses a fundamental tension in model training: whether supervision at intermediate steps helps or hurts learning of robust compositional reasoning—a signal relevant to understanding limitations of current foundation models.
ai-mlcommunity - Claude code skill for neurotech/BCI machine learning [P]ai-mlcommunity
- "I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]
Researcher open-sourced HALO-Loss, a drop-in replacement for Cross-Entropy that enables neural networks to abstain on out-of-distribution inputs by placing an abstain class at the origin of a bounded latent space. The method achieves strong OOD detection (>50% FPR improvement on SVHN) without sacrificing base accuracy or requiring ensembles—directly relevant to practitioners building safety-critical classifiers and multi-modal systems like CLIP.
ai-mlcommunity - OpenAI acquires Hiro, an AI personal finance startupai-mlresearch
- Dialectic-Med: Mitigating Diagnostic Hallucinations via Counterfactual Adversarial Multi-Agent Debate
Researchers propose Dialectic-Med, a multi-agent framework using adversarial debate between proponent, opponent, and mediator agents to reduce diagnostic hallucinations in medical vision-language models. The work addresses a critical failure mode in healthcare AI—confirmation bias and visual hallucination—through explicit falsification mechanisms, demonstrating SOTA performance on three medical VQA benchmarks with improved explanation faithfulness.
ai-mlresearchvelocity:hn-medium - HiEdit: Lifelong Model Editing with Hierarchical Reinforcement Learning
HiEdit introduces a hierarchical RL framework that dynamically selects layer-specific targets for LLM knowledge editing, reducing side effects and catastrophic forgetting while cutting parameter perturbations by 50%. This advances the critical deployment challenge of updating deployed models without degradation—directly relevant to practitioners managing production LLMs and researchers working on model adaptability and robustness.
ai-mlresearchvelocity:hn-medium - Why Don't You Know? Evaluating the Impact of Uncertainty Sources on Uncertainty Quantification in LLMs
Researchers introduce a controlled dataset that categorizes uncertainty sources (knowledge gaps, output variability, input ambiguity) and demonstrate that existing UQ methods fail or mislead when uncertainty stems from sources beyond model knowledge. This work is directly relevant to practitioners and researchers building reliable LLM systems, as it exposes a critical blind spot: most confidence estimation methods conflate distinct uncertainty types and degrade unpredictably in real-world deployment scenarios where multiple sources coexist.
ai-mlresearch - FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness
FAITH proposes a post-training framework that maps LLM confidence and semantic entropy into natural-language knowledge states, then uses PPO with a dual reward function (correctness + uncertainty) plus retrieval augmentation to improve factual accuracy on knowledge-intensive tasks. This addresses a critical reliability gap in LLMs and demonstrates measurable gains, contributing to the long-signal horizon of alignment and trustworthiness research.
ai-mlresearchlong-signal:rdd - ZARA: Training-Free Motion Time-Series Reasoning via Evidence-Grounded LLM Agents
ZARA introduces a knowledge-retrieval-augmented agent framework for human activity recognition from motion sensors without retraining, using LLMs grounded in statistical evidence rather than black-box projections. This addresses a real generalization problem in sensor-based AI (domain shift, new subjects, cross-dataset transfer) with a training-free approach that could shift how practitioners deploy activity recognition systems in production.
ai-mlresearch - K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks
Researchers demonstrate that K-way energy probes in discriminative predictive coding networks reduce approximately to softmax under standard training, contradicting intuitions that they access richer decision signals. The negative result includes theoretical decomposition and six empirical conditions on CIFAR-10, advancing understanding of how different inference mechanisms relate to discriminative learning—relevant to practitioners working on neural mechanistic interpretability and alternative inference schemes beyond backpropagation.
ai-mlresearchvelocity:hn-medium - CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning
CFMS proposes a coarse-to-fine multimodal synthesis framework that combines MLLMs for high-level visual perception with symbolic reasoning engines for tabular QA and fact verification, showing competitive results on WikiTQ and TabFact benchmarks. This addresses a genuine capability gap—bridging visual table understanding with symbolic reasoning—and has clear applicability to enterprise reasoning tasks and multimodal model design.
ai-mlresearchvelocity:hn-high - Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents
Skill-SD introduces a self-distillation framework that uses dynamically summarized natural language skills from agent trajectories as privileged teacher supervision, stabilized via importance-weighted reverse-KL loss. The approach substantially improves multi-turn LLM agent performance (+14-42% over RL and OPSD baselines), addressing sample efficiency and training stability—key practical bottlenecks for anyone scaling agentic systems.
ai-mlresearch - NSFL: A Post-Training Neuro-Symbolic Fuzzy Logic Framework for Boolean Operators in Neural Embeddings
Researchers introduce NSFL, a post-training framework combining fuzzy logic t-norms/t-conorms with neural embeddings to enable native logical constraint handling in dense retrievers without retraining. The work demonstrates significant mAP improvements (up to +81%) and shows additive gains even on models fine-tuned for logical reasoning, advancing the intersection of symbolic reasoning and neural retrieval systems.
ai-mlresearch - Pioneer Agent: Continual Improvement of Small Language Models in Production
Pioneer Agent is a closed-loop system that automates the engineering lifecycle for adapting small language models to specific tasks, operating in cold-start (data acquisition, model training) and production (failure diagnosis, targeted retraining) modes. The work addresses a critical practitioner problem—iterative model improvement without manual engineering—and demonstrates substantial gains (1.6-83.8 points on benchmarks, 84.9% to 99.3% on production intent classification), making it directly relevant to ML engineers deploying cost-constrained models at scale.
ai-mlresearch - Digital hybridity and relics in cultural heritage: using corpus linguistics to inform design in emerging technologies from AI to VR
Researchers used corpus linguistics to analyze how the word 'relic' has been semantically framed across Early Modern English texts and contemporary web content, finding a shift from spiritual/political control objects to heritage symbols. The work discusses ethical considerations for digitizing culturally significant objects via hybrid and AI technologies, but lacks technical novelty in ML methods or meaningful implications for practitioners building AI systems.
ai-mlresearch - LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
LangFlow closes the performance gap between continuous and discrete diffusion language models through three key innovations: ODE-based evaluation bounds, learnable Gumbel-based noise scheduling, and self-conditioning. This shifts the technical narrative around diffusion for language—previously underperforming continuous approaches now match discrete baselines and outperform autoregressive models on transfer tasks, opening a competitive alternative training paradigm for practitioners and researchers exploring non-autoregressive architectures.
ai-mlresearch - RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents
RPA-Check introduces a four-stage automated evaluation framework for assessing LLM-based role-playing agents across dimensions like role adherence and narrative stability, validated on forensic training scenarios. The work demonstrates that smaller instruction-tuned models (8-9B) can outperform larger ones on procedural consistency—a finding with direct implications for practitioners choosing models for constrained, high-stakes agent deployments.
ai-mlresearchvelocity:hn-high - Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment
Researchers used targeted computational lesions on six multilingual LLMs to identify shared versus language-specific processing components, then validated predictions against fMRI data from 112 multilingual participants. The work provides causal evidence for a shared neural backbone with embedded language specializations, advancing both AI interpretability and neuroscience understanding of multilingual cognition.
ai-mlresearchlong-signal:rdd - Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty
Researchers propose E-GRM, a framework that uses model-internal uncertainty to selectively apply Chain-of-Thought reasoning only when needed, reducing inference costs while improving accuracy on reasoning benchmarks. This addresses a key practical constraint for deploying reasoning-capable LLMs at scale by eliminating wasteful computation on simple tasks while maintaining performance gains.
ai-mlresearchvelocity:hn-medium - GIANTS: Generative Insight Anticipation from Scientific Literature
Researchers introduce insight anticipation, a task where language models predict downstream papers' core insights from foundational works, backed by GiantsBench (17k examples across 8 domains) and GIANTS-4B, a 4B-parameter model trained via RL that outperforms proprietary baselines and generalizes cross-domain. This matters to the AI/ML field because it operationalizes a concrete component of scientific synthesis—a long-standing aspiration in automated discovery—with released artifacts enabling practitioner iteration and validation against third-party citation-impact proxies.
ai-mlresearch - SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering
SafeConstellations is an inference-time method that reduces LLM over-refusal by 73% through task-aware representation steering, using mechanistic insights about embedding space trajectories across layers. This directly addresses a real deployment friction point where safety guardrails reject benign requests, and the trajectory-based approach offers practitioners a principled conditional intervention applicable to high-stakes applications like content moderation and task-specific inference.
ai-mlresearch - Improving LLM Unlearning Robustness via Random Perturbations
Researchers demonstrate that current LLM unlearning methods inadvertently reduce model robustness by treating forget-tokens as backdoor triggers, and propose Random Noise Augmentation (RNA) as a lightweight defense mechanism. This work directly addresses a critical vulnerability in machine unlearning—a field increasingly important for regulatory compliance and safety—by providing both theoretical insight and a practical, method-agnostic solution.
ai-mlresearchvelocity:hn-medium - How LLMs Might Think
Philosophers argue that LLMs may engage in arational, associative thinking rather than rational cognition, reframing the debate on machine thought beyond Stoljar and Zhang's prior argument. This contributes to the slow-burn cognition signal about what constitutes thinking in AI systems—foundational for long-term alignment and consciousness research.
ai-mlresearchvelocity:hn-high - Microsoft Ships Agent Framework 1.0, Merging Semantic Kernel and AutoGen into a Single Production-Ready SDK
Microsoft unified Semantic Kernel and AutoGen into Agent Framework 1.0, a single production-ready SDK supporting multi-agent orchestration and protocol interoperability. This matters to practitioners building enterprise AI systems—consolidating two major frameworks into one standard lowers integration friction and signals Microsoft's commitment to open-source agent infrastructure as a commodity layer.
ai-mlresearch