MAS.664 · AI for Impact · MIT · Spring 2026

lily.

Behavior change, in the moment.

An agentic AI pipeline for nicotine reduction. Four specialized agents observe your behavioral state, predict relapse risk, and deliver a personalized CBT-grounded intervention — every time you feel a craving.

Open Web App → Open Mobile App → View on GitHub
4
Agents
120+
API calls tested
30
Pipelines run
CBT
Grounded strategies

Agent Pipeline Architecture

When you log a craving, Lily fires a 4-agent pipeline. Each agent makes an independent cloud LLM call with structured JSON inputs and outputs, passing state forward. The Orchestrator closes the feedback loop by writing a memory update that improves future interventions.

01
User State Agent
Synthesizes your craving event, stress level, context, and memory history into a structured behavioral snapshot.
craving event memory history → state JSON
02
Prediction Agent
Estimates relapse probability, craving trajectory, and the window to act based on your behavioral state.
state JSON → risk % + trajectory
03
Intervention Agent
Selects the most appropriate CBT strategy (urge surfing, cognitive reframing, delay tactic, behavioral substitution) and generates a personalized message.
state + prediction → strategy + message
04
Orchestrator Agent
Makes the final decision, sets a proactive check-in window, and writes a memory update that closes the feedback loop.
all outputs → decision + memory write
USER INPUT → User State Agent → Prediction Agent → Intervention Agent → Orchestrator Agent → ACTION OUTPUT
                                                                                    ↓
                                                                           memory write
                                                                                    ↓
                                                              feeds back to User State Agent on next call

Powered by: Llama 3.3-70B via Groq (server-side — no API key needed)

Limitations

We believe in being honest about what doesn't work yet. These are the known limitations of the current version.

Strategy monoculture
The Intervention Agent defaults heavily to craving surfing across all risk levels. A diversity mechanism tied to prior strategy effectiveness is needed.
Prediction Agent over-calibrates high
Risk scores cluster at 60–80% even for craving 1/10. The model doesn't differentiate meaningfully at the low end of the scale.
No contradiction detection
The User State Agent averages contradictory inputs rather than flagging them. An input validation layer is the next architectural priority.
Rate limiting at scale
10+ concurrent pipelines hit the token-per-minute ceiling. Production deployment needs request queuing or a higher API tier.
Pipeline latency
4 sequential agent calls take ~12–15 seconds. Parallelizing User State + Prediction agents is the highest-leverage optimization.

Roadmap

Input validation layer before User State Agent
Strategy diversity mechanism using prior effectiveness data
Prediction Agent recalibration at low-risk thresholds
Parallelized User State + Prediction agent calls
Clinical outcome validation — A/B test vs. control group
Integration with Apple Health, Oura, Whoop
Expansion to alcohol and other behavioral addictions

Connect your OpenClaw agent

Lily is fully compatible with OpenClaw — the open-source personal AI assistant framework. Deploy Lily as a coaching agent on any channel you already use: WhatsApp, Telegram, iMessage, Slack, Discord, or SMS.

Step 1 — Install OpenClaw

npm install -g openclaw@latest
openclaw onboard

Step 2 — Add Lily as an agent

openclaw agents add --from https://raw.githubusercontent.com/denihoxh/lily-app/main/SOUL.md

Step 3 — Install the Lily pipeline skill

openclaw skills add https://raw.githubusercontent.com/denihoxh/lily-app/main/SKILL.md

Step 4 — Start the gateway

openclaw gateway:watch

Once running, Lily will respond to craving events on any connected channel and maintain behavioral memory across sessions.

Key files
SOUL.md
Agent identity, rules, personality, channel config, and memory schema
SKILL.md
Full skill definition with input/output schema and integration examples
Bring your own agent — pipeline API

Any agent framework (LangChain, CrewAI, AutoGen) can call Lily's pipeline directly as a REST API.

POST https://lily-app-xi.vercel.app/api/pipeline
Content-Type: application/json

{
  "craving": 8,
  "stress": 7,
  "contexts": ["After coffee"],
  "memory": ["Prior event logs..."]
}

See AGENT_INTEGRATION.md for full examples with LangChain, CrewAI, AutoGen, and raw Python.

Clinical Framing

Lily is a coaching tool, not a medical device
Lily is not a substitute for professional care. It is designed as an AI co-assistant that works best alongside clinical support — not as a standalone treatment.

Interventions are grounded in Cognitive Behavioral Therapy (CBT) techniques. Heavy users are flagged at onboarding and encouraged to supplement Lily with professional support. The system escalates to professional support suggestions when users express severe distress.

Clinical framing informed by Jana Krystofova Mike, MD — Pediatric Critical Care, UCSF; published researcher in agentic AI for clinical intervention.

Team

Deni Hoxha
MBA Candidate · MIT Sloan '27
Product manager with experience launching and scaling digital products. Ex-Morgan Stanley (Zelle platform). Harvard BA/MA '21.
Leila Veerasamy
MBA Candidate · MIT Sloan '27
Background in CPG manufacturing, sustainability, and international development. Ex-founder of DTC startup focused on user experience design. Brown University.

MAS.664 AI for Impact · MIT Media Lab & MIT Sloan · Spring 2026