Hello Builders, issue #6
News from the trenches of AI in the week Dec 22-29, 2025
Hello Builders,
As 2025 draws to a close, the AI landscape reveals a fascinating paradox: while social media amplifies the worst of AI hype culture, the underlying research is making genuine, measurable progress. This week, PayPal deployed a production multi-agent system using NVIDIA’s NeMo framework, signaling that enterprise agentic AI has moved from proof-of-concept to production reality. Meanwhile, new research on perplexity-aware scaling laws challenges the “more data is better” paradigm, and responsible AI frameworks gain traction with consensus-driven reasoning approaches. The tension between market irrationality and technical advancement has never been starker. For builders shipping real products, the signal is clear: focus on fundamentals, ignore the noise.
This week’s signal in the noise
PayPal deploys production Commerce Agent using NVIDIA’s NeMo framework, signaling enterprise adoption of multi-agent architectures has reached maturity. • New perplexity-aware scaling laws promise more efficient LLM training, moving beyond the brute-force “more data is better” paradigm. • Responsible AI frameworks gain traction with consensus-driven reasoning approaches for explainability and governance. • MIT Technology Review calls out AI boosterism on social media, with Google DeepMind’s Demis Hassabis publicly embarrassed by overhyped claims.
NEMO-4-PAYPAL: Enterprise Multi-Agent Systems Go Production
PayPal’s announcement of its Commerce Agent represents a significant milestone in enterprise AI adoption. Built on NVIDIA’s NeMo framework with fine-tuned Nemotron models, this multi-agent system handles search and discovery at production scale. The partnership demonstrates that the theoretical promise of agentic architectures is translating into real-world deployments. What’s notable isn’t just the technology, but the validation of the multi-agent paradigm: instead of monolithic models trying to do everything, specialized agents coordinate to handle complex workflows. For enterprise architects evaluating agentic systems, PayPal’s deployment provides a blueprint, though the reliance on specialized hardware partnerships raises legitimate questions about vendor lock-in and portability.
Link: https://arxiv.org/abs/2512.21578
The Perplexity Paradox: Smarter Scaling Laws for LLM Training
Researchers have proposed a novel perplexity-aware data scaling law that challenges conventional wisdom about continual pre-training. The current power-law relationship between dataset size and test loss yields diminishing returns, leading to suboptimal data utilization and inefficient training. The new approach suggests that measuring perplexity landscapes can more accurately predict performance than simply counting tokens. For organizations running expensive training jobs, this represents potential cost savings of millions of dollars. The implication is clear: the “scale is all you need” era may be ending, replaced by smarter, more efficient approaches to model development.
Link: https://arxiv.org/abs/2512.21515
Responsible AI: Consensus-Driven Reasoning for Explainable Agents
As AI agents gain autonomy, the challenges of explainability, accountability, and governance become critical. A new framework proposes consensus-driven reasoning to address these concerns, coordinating Large Language Models, Vision Language Models, tools, and external services while maintaining transparency. The approach is particularly relevant as enterprises deploy agentic systems that influence downstream decisions. This isn’t just academic research; it’s a direct response to regulatory pressure and enterprise risk management requirements. For decision-makers, this signals that responsible AI isn’t optional; it’s becoming a technical requirement baked into system design.
Link: https://arxiv.org/abs/2512.21699
The Hype Reckoning: When AI Boosterism Backfires
MIT Technology Review published a sharp critique of AI boosterism on social media, centered on an incident where Google DeepMind CEO Demis Hassabis called out an OpenAI researcher’s overhyped claims about GPT-5 solving mathematical problems. Hassabis’s three-word response, “This is embarrassing,” encapsulates the growing backlash against hyperbolic AI announcements. The piece argues that social media incentives reward sensationalism over accuracy, creating a feedback loop that damages the field’s credibility. For builders, the lesson is straightforward: let your work speak for itself, and be skeptical of claims that seem too good to be true. The gap between demo and production remains the only metric that matters.

