Lily — Documentation

How it works

Agent Pipeline Architecture

When you log a craving, Lily fires a 4-agent pipeline. Each agent makes an independent cloud LLM call with structured JSON inputs and outputs, passing state forward. The Orchestrator closes the feedback loop by writing a memory update that improves future interventions.

01

User State Agent

Synthesizes your craving event, stress level, context, and memory history into a structured behavioral snapshot.

craving event memory history → state JSON

02

Prediction Agent

Estimates relapse probability, craving trajectory, and the window to act based on your behavioral state.

state JSON → risk % + trajectory

03

Intervention Agent

Selects the most appropriate CBT strategy (urge surfing, cognitive reframing, delay tactic, behavioral substitution) and generates a personalized message.

state + prediction → strategy + message

04

Orchestrator Agent

Makes the final decision, sets a proactive check-in window, and writes a memory update that closes the feedback loop.

all outputs → decision + memory write

USER INPUT → User State Agent → Prediction Agent → Intervention Agent → Orchestrator Agent → ACTION OUTPUT
                                                                                    ↓
                                                                           memory write
                                                                                    ↓
                                                              feeds back to User State Agent on next call

Powered by: Llama 3.3-70B via Groq (server-side — no API key needed)

Known issues

Limitations

We believe in being honest about what doesn't work yet. These are the known limitations of the current version.

Strategy monoculture

The Intervention Agent defaults heavily to craving surfing across all risk levels. A diversity mechanism tied to prior strategy effectiveness is needed.

Prediction Agent over-calibrates high

Risk scores cluster at 60–80% even for craving 1/10. The model doesn't differentiate meaningfully at the low end of the scale.

No contradiction detection

The User State Agent averages contradictory inputs rather than flagging them. An input validation layer is the next architectural priority.

Rate limiting at scale

10+ concurrent pipelines hit the token-per-minute ceiling. Production deployment needs request queuing or a higher API tier.

Pipeline latency

4 sequential agent calls take ~12–15 seconds. Parallelizing User State + Prediction agents is the highest-leverage optimization.

What's next

Roadmap

Input validation layer before User State Agent

Strategy diversity mechanism using prior effectiveness data

Prediction Agent recalibration at low-risk thresholds

Parallelized User State + Prediction agent calls

Clinical outcome validation — A/B test vs. control group

Integration with Apple Health, Oura, Whoop

Expansion to alcohol and other behavioral addictions

Agent integration

Connect your OpenClaw agent

Lily is fully compatible with OpenClaw — the open-source personal AI assistant framework. Deploy Lily as a coaching agent on any channel you already use: WhatsApp, Telegram, iMessage, Slack, Discord, or SMS.

Step 1 — Install OpenClaw

npm install -g openclaw@latest
openclaw onboard

Step 2 — Add Lily as an agent

openclaw agents add --from https://raw.githubusercontent.com/denihoxh/lily-app/main/SOUL.md

Step 3 — Install the Lily pipeline skill

openclaw skills add https://raw.githubusercontent.com/denihoxh/lily-app/main/SKILL.md

Step 4 — Start the gateway

openclaw gateway:watch

Once running, Lily will respond to craving events on any connected channel and maintain behavioral memory across sessions.

Key files

SOUL.md

Agent identity, rules, personality, channel config, and memory schema

SKILL.md

Full skill definition with input/output schema and integration examples

Bring your own agent — pipeline API

Any agent framework (LangChain, CrewAI, AutoGen) can call Lily's pipeline directly as a REST API.

POST https://lily-app-xi.vercel.app/api/pipeline
Content-Type: application/json

{
  "craving": 8,
  "stress": 7,
  "contexts": ["After coffee"],
  "memory": ["Prior event logs..."]
}

See AGENT_INTEGRATION.md for full examples with LangChain, CrewAI, AutoGen, and raw Python.

Important

Clinical Framing

Lily is a coaching tool, not a medical device

Lily is not a substitute for professional care. It is designed as an AI co-assistant that works best alongside clinical support — not as a standalone treatment.

Interventions are grounded in Cognitive Behavioral Therapy (CBT) techniques. Heavy users are flagged at onboarding and encouraged to supplement Lily with professional support. The system escalates to professional support suggestions when users express severe distress.

Clinical framing informed by Jana Krystofova Mike, MD — Pediatric Critical Care, UCSF; published researcher in agentic AI for clinical intervention.

Who built this

Team

Deni Hoxha

MBA Candidate · MIT Sloan '27

Product manager with experience launching and scaling digital products. Ex-Morgan Stanley (Zelle platform). Harvard BA/MA '21.

Leila Veerasamy