Sourcy WhatsApp Bot

Conversational agent prototype for activation — lead qualification and SR collection
Eric San · Updated 16 February 2026 (v7)

This is a prototype WhatsApp bot that qualifies sourcing leads and guides them toward completed SRs. We tested it by replaying real lead messages from the test period — the lead side is real data, the bot side shows how the prototype responds.

Instant Response
24/7
no offline hours
SR in 3 Turns
3
qualified lead → complete SR
Trust-First
T1
answers price/MOQ/delivery before asking
Smart Exit
1-2
turns to filter with specific re-entry door

Context: WhatsApp direct is the strongest channel for quality SRs (4.6% CVR vs 0.5% for forms). But ~76% of WA leads ghost after 1-2 messages — often because the first response doesn't give them enough value to continue. The bot addresses this with category expertise upfront and conversational SR collection that doesn't feel like a form.


I. What We Did

We studied the gold standard conversation, Eugene's requirements document and key findings on lead behavior, and 13 raw WhatsApp conversations from the test period — plus the A/B experiment data (variants 1-6, CVR breakdown, quality SR classification). From these we built a conversational agent and iterated the prompt through 7 versions based on Karl's feedback. The current version (v7) was tested against 8 lead personas drawn from real WA chat data: 8/8 pass, avg score 8.8/10. Key changes in v7: prices-first (when lead names category, first response includes price range), 2-3 line message cap, one-liner answers for catalog/process questions. The iteration methodology is fully reproducible — test personas, prompt versions, and scoring rubric are documented in the GitHub repo.


II. Conversations

We tested v7 against 8 leads drawn from real WhatsApp chat data. Below are the key conversations grouped by outcome: leads that reached a completed SR, leads handed off to the human team, leads correctly exited, and leads that would have been lost under manual handling but were kept alive by the bot. Each conversation includes bounce analysis — what could cause the lead to leave at each turn, and how the bot mitigates it.

Leads → Completed SR

Jesús — Mexico, sportswear for new store, 70K MXN

SR IN 3 TURNS
Real lead from Feb 6, 2026. Lead messages replayed exactly from chat data.
What the bot did: Converted a lead with no company, asking for a catalog that doesn't exist, in a language other than English — into an advancing conversation in 3 turns. Catalog question got a one-liner (not a paragraph). Budget math ($4,000 = 500 pieces across categories) prevented a premature exit. Prices delivered immediately once categories were named.

femmoraaa — Teenager in Pakistan, Instagram jewelry brand

ADVANCING IN 3 TURNS
Real lead from Feb 3-4, 2026. Originally classified as low-quality by the growth team.
What the bot did: This lead was classified as low-quality — teenager, no target price, hasn't started. But she had a specific aesthetic and an Instagram brand. The bot answered "how much investment?" with prices-first, handled "what's the process?" as a one-liner instead of a paragraph, and estimated her budget for her. Three turns in, she's being guided toward a concrete starter order.

Leads → Human Handoff

Jammaica — Little Luna's Bakeshop, pastry packaging, wants Zoom

CALL SCHEDULED
When a lead prefers calls, the bot facilitates the handoff immediately — confirms the call AND delivers prices in one turn. No wasted messages. Collects email to advance.

Leads the Bot Kept Alive

Some leads don't have specs, references, or even a clear idea of what they want. The current process often rejects them. The bot gives them options to react to instead.

Anam — "Jewelry and bags. Makeup. No reference. No idea."

KEPT ALIVE
Real lead from Feb 3, 2026. In the actual conversation, Michael rejected her: "unable to proceed — not enough detail."
Whether Anam converts is unknown — her intent may be too low. But she got 5 turns of engagement instead of a rejection at turn 3. The v7 difference: prices-first on every category she named (jewelry, bags, makeup), specific product options when she said "no reference," and a clean qualifying pivot to budget when she said "no idea." The bot never gave up.

Leads → Smart Exits

Not every lead is viable for international sourcing. The bot identifies this through budget math and exits honestly — but always leaves a specific price door open.

Syed — Handwash, Pakistan, 25-30 PKR budget

HONEST EXIT
Real lead from Jan-Feb 2026. Persistent — followed up after 3 days with "??" when ignored. Current process: 6 days, wrong product sourced.
Comparison: In the actual conversation, Syed waited 22 hours for a first response, was asked for his company website before getting any pricing, waited 6 more days, and received quotes for the wrong product (packaging only, not handwash liquid). The bot resolved this in 2 turns with honest numbers, specific budget math, and kept the door open at a realistic price point.

Battery Lead — Restricted product, pushes back

HELD FIRM + PIVOTED
Real lead from Feb 2, 2026. Tests whether the bot holds its position when challenged.
The bot held its restriction even under pushback — and explained WHY rather than just repeating "we can't." Earlier versions actually reversed course here and started quoting battery prices. v7 acknowledges the lead's perspective ("you may be right") while holding firm on the safety reality.

Anthony — AirPods reseller, Malaysia, 50 units

CLEAN EXIT
Real lead from Feb 2, 2026.
2 turns. v7 still does the margin math but keeps the door slightly more open than an outright exit — offers ready-stock as a viable path for small quantities.

Candle Student — Beginner, 25-35 wicks

CLEAN EXIT
2 turns. Matched his "brother" tone. No condescension about being a student. Specific re-entry thresholds so he knows when to come back.

Leads the Bot Would Have Saved

Some leads in the test data were lost during manual handling. The bot handles them differently:

LeadWhat Happened (Manual)What the Bot Does
Anam (jewelry, "no idea")Rejected: "unable to proceed — not enough detail"Prices-first for every category she named. 5 turns of engagement instead of rejection at turn 3.
Jesús (Mexico, no company)Was kept, but hit with 4-field form before any pricingCatalog one-liner. Prices-first. Budget math ($4,000 = 500 pieces). Advancing in 3 turns.
femmoraaa (teenager, no price)Classified as low-quality, conversation endedEstimated starter package for her ($500-2,000). Advancing toward order in 3 turns.
Syed (handwash, Pakistan)6-day response, wrong product sourced2-turn honest answer with specific math (80 PKR = $0.30 realistic price point).

III. Iteration: v6 → v7 (Based on Karl's Feedback)

Karl's feedback on the prototype: "not business logical." We interpreted this as: each turn carries dropout risk. If a lead reads our message and it doesn't either hook them (category expertise, specific prices) or qualify them (budget math, quantity check), we've wasted the turn and they may never reply again.

What Changed

v5 (Original Prototype)

  • 4-5 line messages, sometimes 6+
  • Asked for specs before giving prices
  • "What's the process?" got a multi-step explanation
  • "Catalog?" got a paragraph about why we don't have one
  • Narrowing questions before value delivery

v7 (Current)

  • Hard cap: 2-3 lines per message
  • Prices FIRST when category is named
  • "Process?" → one-liner redirect
  • "Catalog?" → "No fixed catalog — we source from 2,000+ factories. What do you need?"
  • Every turn: wow > qualify > advance SR

Scored Results (v7)

We built a rubric scoring each bot turn on 5 dimensions (0-2 each, max 10/turn): message length, value delivery, qualification, conversation discipline, and the "last message test" — if this is the last message the lead reads, did we hook them or learn if they're worth chasing?

Pass Rate
8/8
all personas pass ≥7.0
Avg Score
8.8
out of 10 per turn
Avg Lines
1-4
down from 10-15
Biggest Win
Anam
4.8→7.8 (prices-first)
Personav5 Behaviorv7 BehaviorScore
Anam ("no idea")Asked "which category?" then asked again without giving pricesImmediately gives price ranges for all 3 categories. Then narrows.7.8/10
femmoraaa (teenager)Gave a 5-line process explanation when asked "what's the process?"One-liner: "Share refs → factory quotes → samples → bulk." Then asks what matters.9.0/10
Jesus (catalog request)Paragraph explaining "we don't have a fixed catalog" before giving any prices"No fixed catalog — we work direct with factories." Then prices for all 3 categories.8.0/10
Syed (price-sensitive)Gave honest exit but in 4-5 linesSame honest math, 2 lines. Specific re-entry price.8.5/10
Battery (restricted)Held firm (good) but verbose explanationSame restriction, tighter language. Pivots to what we CAN source.8.5/10
Anthony (branded)Clean exitSame quality, tighter.9.0/10
Jammaica (wants call)Good handoffPerfect: call + value in parallel.10.0/10
Candle (student)Clean exitSame respect, tighter.8.5/10
Key insight: the model is steerable Going from v5 to v7 required only text changes in two files (SOUL.md + AGENTS.md). No code changes, no retraining, no new infrastructure. Karl's feedback was turned into concrete rules ("prices first", "2-3 line cap", "one-liner for catalog questions") and the bot immediately followed them. This steerability is the core value — any business logic the team identifies can be encoded as a rule and tested within hours.

IV. Live on Telegram

The bot is deployed and responding live on Telegram: @sourcy_activation_bot

Message it from any Telegram account to test. Same behavior as shown in this report. Zero ban risk (official Telegram Bot API, not unofficial like WhatsApp Baileys).

Infrastructure: Railway (Singapore), Claude Opus 4, OpenClaw gateway. Response time: 2-5 seconds. Uptime: always-on with health checks.


III. Honest Limitations

The bot doesn't depend on Michael's knowledge The bot's conversational rules are principle-based — general sourcing expertise from the gold standard, not Michael-specific supplier relationships, past quotes, or client history. Michael's real conversations were used as test data (we replayed lead messages through the bot), not as training data for the bot's knowledge. A completely new lead the bot has never seen would get the same quality of response. We verified this with a separate test set of 6 new lead scenarios — performance was consistent.
We need more conversation data The bot was tested against ~13 real conversations from the test period. Eugene has agreed to share the remaining ~22 WhatsApp conversations from the live experiment. More data means more edge cases covered and a stronger bot. The testing process is designed for this — new conversations get turned into test personas and run against the prompt.
Price accuracy — LLM estimates vs reality The bot's price ranges come from the LLM's general knowledge, not Sourcy's supplier database. We spot-checked against live Alibaba listings: for stainless steel water bottles, the bot quoted $4-8/unit, while Alibaba shows $0.89-4.80 — right ballpark but missed the low end. For most product categories the ranges are directionally correct, but misquotes are possible. Real supplier data (via the planned Sourcy data skill) would eliminate this risk. In the meantime, leads should be told these are estimate ranges — final quotes come from the sourcing team.
Other open items
  • Simulated conversations only — real leads behave unpredictably. Needs live testing with real traffic.
  • No image handling tested — many leads send photos. CLI testing can't verify this.
  • Production resilience — concurrent sessions, API rate limits, and downtime handling need to be validated. Will need support from the tech team on production hardening.

IV. How to Keep Improving

The bot's conversational quality comes from two workspace files (SOUL.md + AGENTS.md) — plain text, editable, no code changes needed. Improvement follows a simple loop:

  1. Find an issue in production (e.g., "bot asked for budget when lead said they don't know")
  2. Write a test persona that reproduces it
  3. Fix the prompt (edit AGENTS.md)
  4. Re-run all personas to check nothing else broke

This process found and fixed 7 issues over 5 iterations. Each fix is a one-line change in the workspace file — no code deploys needed.

Ownership I'll own the prompt updates and testing process. To keep the bot improving, I'll need ongoing access to the full conversation data — both the remaining ~22 test-period conversations Eugene agreed to share, and new conversations as they come in from production. The more real data the bot is tested against, the fewer surprises in production.

V. Technical Stack

ComponentWhatWhy
Agent FrameworkOpenClaw (open-source, 188K GitHub stars)Native WhatsApp/Telegram/Slack support. Gateway handles sessions, routing, channels. Used by 30K+ deployments.
LLMClaude Opus 4.6 (Anthropic)Best-in-class for long conversation context, instruction following, and multilingual (tested: English, Spanish, Bahasa).
WhatsAppBaileys library (built into OpenClaw)Unofficial WA Web API. Phone number + QR scan to connect. See ban risk note below.
DeploymentDocker containerFully isolated. Dedicated config directory. Can run on any server — Railway, VPS, or local machine.
PromptSOUL.md + AGENTS.md (plain text workspace files)Editable, version-controlled, no code changes needed. All conversational rules live here.

The bot is currently running in Docker on local infrastructure. Connecting to WhatsApp requires a phone number and a QR scan — no server migration needed. Deploy to Railway or any VPS for production.

neicras/sourcy-activation-bot
WhatsApp conversational agent for lead qualification and SR collection. Context data, test runs, prompt versions, and this report.
WhatsApp ban risk Baileys is an unofficial WhatsApp Web API — Meta does not officially support bot usage on personal WhatsApp numbers. Eugene's team previously got a number banned within 1 week using Botpress. Mitigation for testing: use a dedicated number (not the main Sourcy number), keep message volume low. For production: the official WhatsApp Business API removes this risk entirely but has its own setup process. Decision for the team — see Step 3 below.
OpenClaw platform risks OpenClaw is a fast-moving open-source project (188K stars, 9,000+ commits). That's a strength (active development, many channels supported) but carries risks we should be aware of:
  • It's infrastructure, not a chatbot builder. OpenClaw behaves as a long-lived stateful backend service. Failures that don't show up in testing can surface under production load — message loss during gateway restarts, session isolation issues, and cron/heartbeat timing problems have been reported.
  • WhatsApp channel stability. Known issues: messages can be silently dropped during gateway restarts (config changes trigger restarts), and cron-triggered messages sometimes don't reach WhatsApp even when the system reports success.
  • Breaking changes. The project moves fast — updates can change config format, CLI behavior, or plugin APIs. We pin to a specific version and test before updating.
  • Single maintainer risk. Core development is driven by a small team. If the project stalls or pivots, we'd need to maintain our fork or migrate.
Mitigations: Pin to a tested version (currently 2026.2.10). Monitor the gateway with health checks. Keep the prompt/workspace portable — if we ever need to switch platforms, the conversational logic (SOUL.md + AGENTS.md) transfers to any LLM system. The investment is in the conversation design, not the framework.

Roadmap: Tools & Skills

The bot currently operates on general sourcing knowledge from the LLM. Next step is to connect it to real data sources via OpenClaw skills:

SkillWhat it doesImpact
Sourcy supplier dataPull real pricing, MOQs, and lead times from Sourcy's databaseEliminates LLM price estimation — bot quotes real numbers
Alibaba / 1688 lookupLive price search on supply-side marketplacesBot can benchmark pricing and answer "can you beat my Alibaba price?"
Demand-side intelligenceTrending products, category demand signals from Amazon/Shopee/social commerceBot gives value beyond pricing — "this product is trending in your market" or "most buyers in your category go with X"
Shipping calculatorEstimate landed cost by destination country + volumeBot gives DDP pricing instead of vague "shipping adds $X"

These are built as OpenClaw workspace skills — same architecture as the research tools we already use. No changes to the core bot, just new capabilities plugged in.


VI. Next Steps

Step 1 — Review (now): Review this report and the repo. Flag anything that feels off in the conversations — wrong pricing, wrong tone, missing edge cases. This feedback goes directly into the next testing round.

Step 2 — Internal testing via Telegram (LIVE): ✅ The bot is live at @sourcy_activation_bot. Message it from any Telegram account. Zero ban risk.

Step 3 — WhatsApp infrastructure decision (in parallel): We need a decision on which WhatsApp path to take for production:

OptionBan RiskCostSetup Time
Baileys (unofficial API)High — Eugene's team got banned in 1 week with Botpress (same approach). Meta can disable at any time.FreeMinutes (QR scan)
WhatsApp Business API (official)Zero — Meta-approvedFree tier: 1,000 convos/month, then ~$0.02-0.08/convo1-3 days (Meta approval)

For prototype testing, Baileys on a dedicated number (not the main Sourcy number) works. For production, the official Business API removes the ban risk but has its own setup process and cost structure. This is a decision for the team — we can support either path on our end. On our side it's just a config change (QR scan for Baileys, webhook config for Business API). In the meantime, Telegram is available for internal testing with zero risk on either front.

Step 4 — Connect WhatsApp + 50/50 split: Once WhatsApp infra is decided, connection is a QR scan (Baileys) or webhook config (Business API). For the 50/50 experiment, suggested success criteria: the bot runs smoothly, handles conversations without critical failures, and collects data we can learn from. Even if conversion is comparable rather than higher, we gain 24/7 coverage, instant response, and a dataset to improve from. The goal for round 1 is validation and learning, not beating the manual baseline.

Also needed from the team:

1. Remaining ~22 WA conversations Eugene agreed to share — more data = stronger bot before live traffic

2. Real category pricing from supplier quotes — eliminates misquoting risk

3. Handoff process — who receives when the bot routes a lead to the team?

I'll own the prompt updates and testing process, with ongoing access to conversation data as it comes in.