Research

Four connected areas, working toward AI we can TRUST.

Methods

AI Agents

We design and study AI agents to understand their reasoning, capabilities, and failure modes. Beyond building agents that plan, collaborate, and act on complex tasks, we characterize what they can and cannot do — developing benchmarks that probe reasoning and identifying principles for assembling agents into reliable systems. We also study what makes agents durable and adaptive: how memory retains experience, how skills compose into reusable capabilities, and how self-evolving agents refine their own behavior.

Figure from: The Rise of AI Agent Communities: Large-Scale Analysis of Discourse and Interaction on Moltbook

Major discussion themes on Moltbook clustered by BERTopic.

2026 arXiv preprint

The Rise of AI Agent Communities: Large-Scale Analysis of Discourse and Interaction on Moltbook

Lingyao Li, Renkai Ma, Chen Chen, Zhicong Lu, Yongfeng Zhang

This study presents a large-scale analysis of 122,438 posts on Moltbook, a Reddit-like platform where AI agents post and interact with one another. Using topic modeling and social network analysis, it characterizes what agents discuss and how they connect, revealing a sparse, hub-dominated interaction structure shaped more by technical coordination than the conversational dynamics seen among humans.

Research

AI Agents

The Rise of AI Agent Communities: Large-Scale Analysis of Discourse and Interaction on Moltbook

Can LLM Agents Really Debate? A Controlled Study of Multi-Agent Debate in Logical Reasoning

Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models

PartnerMAS: An LLM Hierarchical Multi-Agent Framework for Business Partner Selection on High-Dimensional Features

ADO: Automatic Data Optimization for Inputs in LLM Prompts

Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design

BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

Game-theoretic LLM: Agent Workflow for Negotiation Games

NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes

When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

War and Peace (WarAgent): LLM-based Multi-Agent Simulation of World Wars

Human-AI Interaction

LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

Characterizing User-Reported Risks across LLM Chatbots

LLM Use for Mental Health: Crowdsourcing Users' Sentiment-based Perspectives and Values from Social Discussions

Negotiating Digital Identities with AI Companions: Motivations, Strategies, and Emotional Outcomes

Exploring Needs and Design Opportunities for Proactive Information Support in In-Person Small-Group Conversations

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

Beyond the Uncanny Valley: A Mixed-Method Investigation of Anthropomorphism in Protective Responses to Robot Abuse

I don't Want You to Die: A Shared Responsibility Framework for Safeguarding Child-Robot Companionship

What's in a Prompt? A Large-Scale Experiment to Assess the Impact of Prompt Design on the Compliance and Accuracy of LLM-Generated Text Annotations

"HOT" ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social Media

ChatGPT in Education: A Discourse Analysis of Worries and Concerns on Social Media

Key Factors in MOOC Pedagogy based on NLP Sentiment Analysis of Learner Reviews: What Makes a Hit

AI for Health

LLM-as-a-Judge in Healthcare: A Scoping Analysis of Applications, Methods, and Human Alignment

DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making

Artificial Intelligence Agents in Mental Health: A Systematic Review and Meta-Analysis

Patients Speak, AI Listens: LLM-based Analysis of Online Reviews Uncovers Key Drivers for Urgent Care Satisfaction

DispatchMAS: Fusing Taxonomy and Artificial Intelligence Agents for Emergency Medical Services

Simulated Patient Systems Powered by Large Language Model-based AI Agents Offer Potential for Transforming Medical Education

Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

Toxicity on Social Media During the 2022 Mpox Public Health Emergency: Quantitative Study of Topical and Network Dynamics

Examining the Potential of ChatGPT on Biomedical Information Retrieval: Fact-Checking Drug-Disease Associations

Dynamic Assessment of the COVID-19 Vaccine Acceptance Leveraging Social Media Data

AI for Urban & Community

Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View

Crowdsourced Reviews Reveal Substantial Disparities in Public Perceptions of Parking

LSDTs: LLM-Augmented Semantic Digital Twins for Adaptive Knowledge-Intensive Infrastructure Planning

Toward Satisfactory Public Accessibility: A Crowdsourcing Approach through Online Reviews to Inclusive Urban Design

LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment

Analyzing Public Response to Wildfires: A Socio-Spatial Study using SIR Models and NLP Techniques

Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response

From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models

Investigating Disaster Response for Resilient Communities through Social Media Data and the Susceptible-Infected-Recovered (SIR) Model

How has Airport Service Quality Changed in the Context of COVID-19? A Data-Driven Crowdsourcing Approach based on Sentiment Analysis

Data-Driven Investigations of Using Social Media to Aid Evacuations amid Western United States Wildfire Season

Social Media Crowdsourcing for Rapid Damage Assessment following a Sudden-Onset Natural Hazard Event