R1 (19 Feb): Initial landscape + critical assessment. R2 (19 Feb): Data access estimation, HK pain points, feature-level landscape. R3 (19 Feb): Restructured around “what to build”; added VLM accuracy data; 40 sources. R3.1 (19 Feb): Updated with current model performance. DrawEduMath leaderboard Dec 2025 (Gemini 3 Pro: 71.3%, GPT-5: 64%). Typed grading: GPT-4.1-mini 94.5% on worked algebra (EDM 2025). Scan-to-grade upgraded from P2 back to P1 (with mandatory teacher review). 42 sources.
Every recommendation below is grounded in three filters: (1) what HK teachers actually ask for, (2) what’s technically feasible at primary level, and (3) what competitors don’t already do well. The narrative spine is: what should Essai build for math in HK schools?
| Feature | What it does | Why P0 | Evidence |
|---|---|---|---|
| Question generation HK curriculum, P3–P6, by topic + difficulty |
Teacher selects strand/topic/difficulty → AI generates a worksheet in seconds | >50% of HK teachers want AI to generate test questions.1 SmartQuest only covers DSE (secondary).8 Clear gap at primary level. | EdU Jockey Club pilot proved this works for P5–P6 in one HK school.3 Quezzio has ~4K generators with 5 difficulty levels.9 |
| Auto-grading MC + typed worked solutions |
Machine-mark MC, numeric answers, and typed worked solutions submitted digitally | >50% of teachers want AI grading help.1 GPT-4.1-mini hits 94.5% accuracy on college algebra worked solutions41 — primary-level arithmetic will be higher. Production-ready today. | EDM 2025 study: 18K responses graded, 94.5% human agreement (GPT-4.1-mini w/ self-consistency).41 SmartQuest does OMR for bubble sheets.8 Quezzio uses symbolic equivalence.9 |
| Class dashboard Scores by topic/strand |
Teacher view: avg scores, topic breakdown, weakest areas per class | Low build effort — extends existing Essai dashboard architecture.22 Teachers explicitly ask for “who struggled with what.” | SmartQuest8 and Third Space Learning11 both have class dashboards. Table stakes. |
| Feature | What it does | Why P1 | Evidence |
|---|---|---|---|
| Scan-to-grade Handwritten exercise books, teacher review required |
Teacher scans stack of exercise books. AI pre-marks. Teacher reviews and overrides via review UI. | This IS the moat for HK (paper-first culture). VLM accuracy now at 71.3% (Gemini 3 Pro, Dec 2025)32 — viable as pre-mark + human review workflow. Mach/HKU digit OCR is >97%.6 Build pipeline now; accuracy auto-improves as models upgrade. | Graded Pro uses similar pre-mark + review model for 3,000+ UK/US schools.7 FERMAT benchmark: Gemini-1.5-Pro achieved 77% error correction rate on grades 7–12.42 |
| Per-student tracking + parent report | Auto-generate weekly per-student progress → shareable via WhatsApp | HK parents expect WhatsApp updates. Differentiates from SmartQuest (no parent-facing output).8 Low build cost — generated from grading data. | IXL and Third Space Learning both offer parent reports.11 WhatsApp is HK’s primary parent-school channel. |
| TSA / HKDSE past-paper variants | Given a real past paper, generate 5 variants with same difficulty, different numbers. Each student gets unique version. | Solves “students memorise past papers” problem. TSA drives 80% of primary homework.4 High demand. | Quezzio already does algorithmic variation for US standards (10 generators/standard × 5 levels).9 Requires HKEAA license (HK$5,275–6,685/year).20 |
| Homework quality optimizer | Instead of “30 questions on fractions,” AI recommends “these 8 cover the same learning objectives.” | Aligns with EDB “quality over quantity” homework policy.21 Reduces student burden while maintaining coverage. Unique positioning. | EDB Curriculum Guide explicitly recommends reducing homework quantity at primary level.21 HK students receive 7+ homework pieces/day.5 |
| Feature | What it does | Why P2 | Gate |
|---|---|---|---|
| Autonomous scan-to-grade No teacher review required |
Fully automated marking of scanned exercise books with no human-in-the-loop. | P1 ships with teacher review; P2 removes it. Requires VLM accuracy >90% on handwritten primary math to maintain teacher trust (80.3% worry about accuracy).1 | Ship when: VLM accuracy >90% on primary math handwriting. At current improvement rate (~5–10%/6mo), estimated mid–late 2027.32 |
| Step-by-step error detection | Mark where in a student’s working the error occurred, not just right/wrong | HK users already complain AI “doesn’t check each step.”16 But EMNLP 2025: <10% step-level error detection on harder math.19 | Feasible at P3–P6 arithmetic only. Not DSE. Ship with teacher override as required. |
| Feature | What it does | Why P3 | Evidence |
|---|---|---|---|
| Misconception taxonomy + error twin matching | Model why students err. Surface other students making the same mistake. Group for targeted reteaching. | Nobody asked for this — but Third Space Learning built their entire business on it (700K+ lessons, 93% met Expected Standard).11 Classic innovator’s dilemma: the teachers who’d benefit most don’t know to ask. | MalruleLib: 101 malrules across 498 templates, 1M+ instances.38 Cross-template prediction drops to 40% — needs domain-specific training. ScaffoldiaMyMaths proves HK primary fractions work.13 |
| Cross-subject correlation | “Students weak in math word problems also score low on Chinese reading comprehension.” | Only possible if you own both essay + math data for the same students. Essai is the only platform positioned to do this.22 | Requires meaningful data in both subjects first. v3 at earliest. |
| Socratic tutoring (AI follow-up) | When a student gets it wrong, AI asks guiding questions instead of showing the answer | Organic demand in HK already.18 Khan Academy’s “Explain Your Thinking”: 20–36% showed more understanding through AI conversation.12 But MathGPT has 50+ schools doing this already.10 | MathGPT is Socratic and “cheat-proof.”10 Graded Pro does viva voce follow-ups.7 Essai should not compete here in v1. |
>50% of HK teachers surveyed want AI to help with grading homework and generating test questions.1 66.9% want AI to organize teaching resources; 65.8% want AI to design teaching presentations.1 Australian secondary teachers spend 10+ hours per week marking.40
80.3% of HK teachers worry AI-generated content contains errors.1 83.5% worry students will use AI to copy homework.1 Over 50% find it difficult to apply AI tools.1 Individual teachers report schools actively discourage AI use in teaching.14
95% of HK students use AI tools vs 90% of teachers, but students rate themselves significantly more proficient.2 Both groups worry about over-reliance weakening critical thinking.2 OHKF recommends: framework for teachers, progressive AI literacy curriculum, one-stop resource platform.2
HK primary students receive 7+ homework assignments per day — 85% of teachers confirm the same on weekends.5 ~80% of homework is TSA-related drill.4 Nearly half of families report children experiencing “drastic” stress; some children cry during homework.4 Despite most parents not believing TSA is useful, ~50% still prepare children for it — pure “herd mentality.”4
The EdU Jockey Club Primary AI math pilot (P5–P6 only, with Microsoft partner “遊戲湯麵”) is one of very few examples of digital math AI in HK primary schools.3 That pilot generates questions in seconds and marks handwritten answers with a 5-level grading scale. This is at one school only — not a commercial product.3
“老師用AI出卷, 學生用AI作文做功課, 老師再用返AI改卷, 咁老師同學生存在嘅意義係?” [“Teachers use AI to set papers, students use AI to do homework, teachers use AI to mark — what’s the point of either existing?”] — @wxharp, 3,100 likes, Sept 202515
Implication: AI that removes human involvement is seen as threatening. Essai must position as teacher-augmentation, not teacher-replacement.
“唔係每一個步驟都take嘅,就係一筆take過,有鬼用啊。如果有中間部份寫錯咗點樣批改呢” [“It doesn’t check each step, just sweeps through in one go — useless. What if the middle steps are wrong?”] — @fk_pk_1919, Jan 202616
Implication: Step-by-step verification matters to HK users. “Just marked the final answer” will get rejected.
Math tutors overheard in a HK coffee shop: “ChatGPT 很弱,但 Deepseek 更爛” [“ChatGPT is weak at math, but Deepseek is even worse”]. Private tutors actively testing AI tools and finding them unreliable for actual math teaching. — @jagolee48, 275 likes, Feb 202517
Implication: The tutor market is testing and rejecting existing generic AI tools. There’s demand, but no trusted product yet.
“數學神技!AI + 蘇格拉底提問法 = 免費私人補習老師。好多人問 AI 數學題,淨係識叫佢俾答案。錯晒!” [“AI math mastery! Socratic method = free private tutor. Most people just ask AI for answers — totally wrong!”] — @studywithai1314, Dec 202518
Implication: Socratic/guided AI (not answer-giving) is already being promoted organically in HK. Khanmigo-style interaction has organic demand.
The EDB organises primary math across 5 strands (Number, Algebra, Measures, Shape & Space, Data Handling) with emphasis on “quality over quantity.”21 Secondary reorganises into 3 dimensions. Key assessment gates:
| Level | Assessment | What it tests | Stakes |
|---|---|---|---|
| P3 | TSA25 | Basic Competencies in math25 | Low (school feedback only) |
| P6 | TSA + pre-S125 | Competency + banding readiness | Medium (determines S1 banding) |
| S3 | Internal + TSA | Junior secondary proficiency | Medium (streaming) |
| S6 | HKDSE24 | Paper 1 conv. (65%, 2¼hr) + Paper 2 MC (35%, 1¼hr)24 | HIGH — university admission |
HKDSE Compulsory Math: Paper 1 has 3 sections (A1 elementary 35mk, A2 harder 35mk, B advanced 35mk). Topics span indices, factorization, quadratics, functions, trigonometry, coordinate geometry, statistics, probability.24
“AI for Empowering Learning and Teaching” Programme: HK$500K per school (one-off). Application deadline: Feb 28, 2026. Must implement AI in at least 3 subjects (covering 2+ levels each).23 If a school already uses Essai for Chinese + English, they need a third subject. Math is the obvious third. Essai offering Math lets schools tick all three subjects with one vendor, simplifying the grant application.22
| Study | Finding | Implication for Essai |
|---|---|---|
| IXL Florida (77K+ students)26 | Students outperformed non-users on FAST assessment; higher usage = bigger gains | Usage intensity matters. Dashboard + tracking drives engagement loop. |
| IXL Holland MI RCT (Johns Hopkins)27 | ESSA Tier 1 evidence; significant math gains | Adaptive practice has causal evidence behind it. |
| DreamBox ESSA Studies28 | “Strong” rating; 13K+ students; +0.10 effect size | Even modest effect sizes are meaningful at scale. |
| Springer Meta-Analysis 202430 | Effect size 0.343 favouring AI over traditional instruction (21 studies) | AI math instruction consistently outperforms traditional. |
| GenAI Math Meta-Analysis 202531 | Pooled g=0.603; moderate-to-large positive impact | GenAI specifically (not just adaptive software) has strong evidence. |
| Khan Academy “Explain Your Thinking”12 | 20–36% of students showed more understanding through AI conversation | Socratic AI reveals hidden understanding. v2/v3 feature. |
| Photomath29 | 220M+ downloads; 2.2B problems/month; acquired by Google 2023 | Consumer demand for math AI is massive. School-facing product is the gap. |
| Platform | Level | Focus | Schools | Strength | Weakness |
|---|---|---|---|---|---|
| SmartQuest8 | Secondary (DSE) | Paper gen + auto-mark (OMR) | 80+ (free trial) | DSE-specific, Google Classroom integration | No primary, no diagnostic, no teaching loop |
| AIMaths | Primary (P1–P6) | Adaptive learning + diagnostics | Unknown (new) | 5 HK curriculum strands, learning paths | Very new, limited adoption, no teacher loop |
| Sayo Academy | All levels | General AI teaching tools | 100+ | Broad subjects, quiz generation | Generic — not math-specialised |
| Vinci AI | All levels | School-based LLM infra | Unknown | On-premise, 100+ pre-built AI apps | Infrastructure play, not a product |
| Platform | Key Feature | How Good | HK Gap |
|---|---|---|---|
| Graded Pro7 | Handwritten math grading + viva voce | 3,000+ schools UK/US; “remarkable accuracy” claim; teacher override with annotations | No HK curriculum alignment; no Cantonese support |
| Quezzio (Wolfram)9 | Algorithmic question generation | ~4K generators; symbolic equivalence grading; anti-cheating via unique variants | US Common Core only; no HK strands |
| MathGPT10 | Socratic “cheat-proof” tutoring | 50+ schools; $25/student; never gives direct answers | US-focused; no HK curriculum; no teacher grading loop |
| Third Space Learning11 | Misconception detection via voice tutoring | 700K+ lessons; 4-stage misconception/calculation distinction; 93% met Expected Standard 2025 SATs | UK curriculum only; voice-first (not paper); B2C model |
| IXL26 | Adaptive practice + score tracking | ESSA Tier 1 evidence (Johns Hopkins RCT)27; Florida study: higher usage = bigger gains | US/UK standards; no HK localisation |
| ScaffoldiaMyMaths13 | HK primary fraction scaffolding | Research-stage; adaptive scaffolding for lower-ability HK primary students | Not a commercial product; fractions only |
| Platform | What | Math? |
|---|---|---|
| LingoTask | Chinese + English essay/oral AI grading | No — language only. 150+ schools. QEF-funded til 2028. |
| Goodclass.ai (HKUST) | Generic AI education platform | Vaguely — not math-specialised |
| HKTA AI Homework | Consumer tutoring | Yes — B2C only. HK$138/mo. |
| Subject | Pipeline | Volume | Teacher Submissions |
|---|---|---|---|
| English | userdata → pdfdata → imagedata_full → report_score22 | 11.4K essays | 5.6K (11 schools) |
| Chinese | c_userdata → c_pdfdata → c_imagedata_full → report_score_c22 | 9.3K essays | 959 (8 schools) |
| Oral | teacherAssignment → oralReport22 | 14.5K reports | — |
| Math | does not exist | 0 | 0 |
Adoption is school-driven, not teacher-driven. 海怡寶血小學 alone accounts for ~50% of all essay volume.22 A handful of champion schools drive most activity. Expect the same for math.
Chinese adoption is 6× lower than English (959 vs 5,600 teacher submissions).22 Not because fewer schools use Chinese — because the Chinese AI grading product is newer and less trusted. Teachers need to see accuracy before committing workflow. Math will face the same ramp.
Realistic Y1 target for math: 2–3 champion schools, ~500–1K graded exercises/month. Based on Chinese essay pattern: 6-month ramp to steady state, then plateau until next round of school adoption.22
| Current Workflow | Essai Math Role |
|---|---|
| Teacher sets paper manually (textbook / past papers) | Question generation — generate by topic + difficulty in seconds |
| Students write answers in exercise books (paper) | v1: typed answers on device. v2+: scan paper32 |
| Teacher collects books, marks with red pen | v1: auto-grade typed MC/numeric. v2+: scan-to-grade when VLM accuracy permits |
| Teacher records scores in class register | Auto-tracking — scores recorded per student by topic/strand |
| Teacher re-explains weak areas in class | Class dashboard — “class is weakest on fractions; here’s a targeted warm-up” |
| Parent asks “how is my child?” | Parent report — per-student progress shareable via WhatsApp |
Math handwriting recognition requires: digit recognition (0–9), symbol recognition (+, −, ×, ÷, =, fractions, brackets), layout understanding (vertical addition, long division, working steps), and diagram interpretation (geometry shapes, angles).
HKU / Mach Innovation achieved >97% accuracy on digit + symbol recognition using Xception/ResNet/UNet models trained on real HK student handwriting.6 But this is OCR (reading what’s written), not grading (judging whether the working is correct). DrawEduMath (Dec 2025 leaderboard) shows VLMs achieve 57–71% when they need to evaluate handwritten student work, up from 51–66% in the original paper, improving ~5–10% every 6 months.32 A hybrid approach (Mach OCR for recognition + LLM for evaluation) could outperform pure VLM approaches at primary level.
Stance being assessed: “P0 should be question generation + auto-grading + class dashboard at P3–P6 level.”
Yes, >50% of HK teachers say they want question generation and grading help.1 But 62% of teachers globally say student engagement drives edtech adoption, not time-saving.36 Only 18% of US K-12 teachers have tried AI at all.35 EdTech sits unused because companies fail to research teacher workflows.37
Reformed position: Question generation is still P0 — it’s the lowest-risk entry point. But don’t assume it alone creates sticky usage. Monitor engagement metrics aggressively. If teachers generate questions but students don’t engage, the product dies regardless.
R3 downgraded scan-to-grade to P2 based on older accuracy data (51–66%). Updated DrawEduMath leaderboard (Dec 2025) shows Gemini 3 Pro at 71.3%, with ~5–10% improvement every 6 months.32 At 71%, roughly 3 in 10 answers need teacher correction — but that’s still faster than marking 10 in 10 from scratch. The key is UX: ship as “pre-mark + review” where AI does the first pass and the teacher overrides. Graded Pro uses this model for 3,000+ schools.7
Reformed position: Scan-to-grade back to P1, but with mandatory teacher review UI. The trust risk is real (80.3% worry about accuracy1), so never position it as “AI grades your homework.” Position it as “AI saves you 60% of marking time.” Autonomous grading (no review) stays at P2, gated on >90% accuracy. Invest in Mach/HKU OCR partnership6 for the recognition layer.
SmartQuest covers only DSE (secondary).8 Essai’s “start at primary” avoids direct competition. But: primary schools have lower budgets than secondary. DSE is a high-stakes exam with strong willingness-to-pay. TSA is a low-stakes feedback assessment25 — parents care intensely, but schools don’t need to spend money on it. ~500+ primary schools in HK39 means a large addressable market, but per-school contract value may be lower.
Reformed position: Primary is still the right beachhead — Essai has existing primary school relationships22, the grant requires 3 subjects23, and competition is weakest here. But be honest: this is a volume play, not a high-ARPU play. Plan pricing accordingly.
700K+ lessons delivered. 93% of students met Expected Standard in 2025 SATs.11 They distinguish misconceptions from calculation errors through a 4-stage process — and charge premium for it. MalruleLib catalogues 101 malrules across 498 templates38 — proving that systematic error modelling is possible. ScaffoldiaMyMaths proved it works for HK primary fractions specifically.13
HK teachers who would benefit most from misconception diagnosis don’t know to ask for it. They ask for grading relief because that’s the pain they feel. Misconception diagnosis addresses the pain they can’t articulate. Classic innovator’s dilemma.
Reformed position: Still P3 — you can’t sell what teachers don’t understand yet. But start collecting error pattern data from day one (even at P0). When enough data accumulates, the misconception layer is the moat nobody can copy.
| Priority | Feature | R3 Position | R3.1 Position | Why Changed |
|---|---|---|---|---|
| P0 | Question generation (digital, P3–P6) | P0 | P0 HOLD | Still the right entry point |
| P0 | Auto-grading (typed worked solutions) | P0 (MC + numeric only) | P0 EXPANDED | GPT-4.1-mini 94.5% on worked algebra41 — can now grade typed steps, not just final answers |
| P0 | Class dashboard | P0 | P0 HOLD | Low effort, high value |
| P1 | Scan-to-grade (w/ teacher review) | P2 | P1 UPGRADED | Gemini 3 Pro 71.3%32 — viable as pre-mark + review workflow; Graded Pro model proven7 |
| P1 | Parent WhatsApp report | P1 | P1 HOLD | High HK-specific value |
| P1 | TSA past-paper variants | P1 | P1 HOLD | High demand, exam-culture fit |
| P1 | Homework quality optimiser | P1 | P1 HOLD | EDB policy alignment21 |
| P2 | Autonomous scan-to-grade (no review) | — | P2 NEW | Gated on VLM >90% accuracy; estimated mid–late 2027 |
| P2 | Step-by-step error detection | P2 | P2 HOLD | EMNLP 2025: <10% on harder math19 |
| P3 | Misconception taxonomy | P3 | P3 HOLD | The moat, but invest only after adoption proven |
| P3 | Cross-subject correlation | P3 | P3 HOLD | Needs data in both subjects first22 |
| Component | Effort | Priority | Dependency |
|---|---|---|---|
| HK math question bank (P3–P6, 5 strands, tagged by topic + difficulty)21 | Medium — bootstrap with TSA past papers25 + AI generation | P0 | HKEAA licensing for past papers20 |
| Auto-grading engine (MC + typed numeric; symbolic equivalence) | Medium — LLMs reliable at primary-level arithmetic19 | P0 | Question bank |
| Class dashboard (math strands — extend existing Essai UI)22 | Low — reuse essay dashboard architecture | P0 | Auto-grading engine |
| Per-student tracking + parent WhatsApp report | Low — generated from grading data | P1 | Class dashboard |
| TSA past-paper variant generator25 | Medium — Quezzio model applicable9 | P1 | Question bank + HKEAA licence20 |
| Homework quality optimiser (EDB-aligned)21 | Low-Medium — algorithm over question bank | P1 | Question bank + learning objectives mapping |
| Scan-to-grade OCR for handwritten math6 | High — Mach/HKU partnership or licence | P2 | VLM accuracy >85% OR Mach partnership6 |
| Step-by-step error detection | Medium — only P3–P6 feasible19 | P2 | Scan-to-grade OCR |
| Misconception taxonomy (per topic)38 | High — leverage MalruleLib + teacher input | P3 | Error data accumulation from v1 |
| HKDSE paper generator | Medium — SmartQuest benchmark8 | P3 | HKEAA licence20 + secondary curriculum mapping |
This section consolidates the accuracy evidence that gates several feature decisions.
| Benchmark | What it tests | Result | Implication |
|---|---|---|---|
| DrawEduMath (NAACL 2025, leaderboard Dec 2025)32 | VLMs on 2,030 real K-12 handwritten math images; teacher-posed questions | 57–71% accuracy (Gemini 3 Pro: 71.3%, GPT-5: 64%); improving ~5–10%/6mo | Viable for pre-mark + teacher review; not yet autonomous |
| EDM 2025 Typed Grading41 | GPT-4.1-mini on 18K college algebra worked solutions | 94.5% accuracy (w/ self-consistency); GPT-4.1-nano 93.1%; GPT-4o 91.9% | Typed/digital auto-grading is production-ready for P3–P6 |
| FERMAT (ACL 2025)42 | VLM error detection + correction on handwritten math (grades 7–12) | Gemini-1.5-Pro: 77% error correction rate | Error correction (not just detection) approaching viability |
| GPT-4o Handwritten Grading (Nov 2024)33 | GPT-4o on handwritten college math (now superseded by newer models) | “Too inaccurate for classroom deployment” | Older result; GPT-5/Gemini 3 Pro show significant improvement since |
| EMNLP 202519 | LLM step-level error detection | <10% accuracy on harder math | Step-by-step grading unreliable for complex problems; P3–P6 arithmetic is simpler |
| GSM1k (NeurIPS 2024)34 | Frontier model generalisation on grade-school math | Genuine generalisation confirmed; up to 8% accuracy drops from data contamination | LLMs can do primary-level math; contamination is a concern for benchmarking |
| HKU/Mach Innovation6 | Digit + symbol OCR on HK student handwriting | >97% accuracy | OCR is solved for HK; evaluation/grading is the gap |
| MalruleLib (Jan 2026)38 | Cross-template prediction of student malrules | Drops to 40% cross-template | Misconception detection needs domain-specific training, not general LLMs |
Layer 1 (OCR): Reading what the student wrote. Solved for digits/symbols (>97%).6 Harder for full expressions and layouts.
Layer 2 (Evaluation): Judging whether the student’s work is correct, identifying where errors occurred, and determining the nature of the error (calculation vs misconception). VLMs are at 57–71% on handwritten input (Dec 2025 leaderboard)32, 94.5% on typed worked solutions.41 Handwritten evaluation is the remaining binding constraint, but improving rapidly.
Essai’s v1 uses typed digital input for P0 auto-grading (94.5% accuracy, production-ready). P1 introduces handwritten scan-to-grade with mandatory teacher review (71% pre-mark accuracy, improving). This layered approach is both technically honest and pragmatically sound.
| Opportunity | What it is | Why Essai specifically |
|---|---|---|
| Multi-subject grant bundle | Schools using Essai for Chinese + English need a 3rd subject for the HK$500K grant23 | Only Essai is positioned to offer 3 subjects in one vendor. LingoTask is language-only. SmartQuest is math-only. |
| Scan → Grade → Feedback (one step) | Teacher scans exercise books. AI pre-marks + teacher reviews. Returns graded + per-student feedback + class summary. | Meets HK paper-first reality. Graded Pro does this for UK/US7 but nobody for HK curriculum. Mach proves OCR works locally.6 P1 with teacher review; autonomous at P2. |
| “Error twin” matching | Surface other students who made the same error. Group for targeted reteaching. | HKU/Mach already uses hierarchical clustering for this concept.6 No commercial product surfaces this for teachers. |
| Parent WhatsApp report | Auto-generate weekly per-student report shareable in parent group | HK parents expect WhatsApp. “Your child improved on fractions this week” would be viral in parent circles. Low build cost. |
| Cross-subject correlation | “Math word problem weakness correlates with Chinese reading comprehension.” | Only possible with both essay + math data for the same students. Essai is the only platform in position.22 v3 at earliest. |
CONDITIONAL YES — with honest technical constraints.
Reformed Stance (R3): Essai should build AI Math for HK schools. But the v1 must be scoped tighter than R2 suggested. Lead with digital-input question generation + typed-answer auto-grading + class dashboard at P3–P6. These are technically feasible, match teacher demand1, and competition is weakest here.8
What R3.1 Changed: Two critical data updates. (1) Typed grading is production-ready: GPT-4.1-mini hits 94.5% on worked algebra solutions41 — P0 auto-grading can now handle typed steps, not just final answers. (2) Handwritten grading is closer than R3 suggested: DrawEduMath leaderboard (Dec 2025) shows Gemini 3 Pro at 71.3%, improving ~5–10% every 6 months.32 Scan-to-grade upgraded back to P1 (with mandatory teacher review UI). Autonomous grading (no review) gated at P2 on >90% accuracy. TSA past-paper variants and homework quality optimiser remain at P1.
The Honest V1: A digital-first math tool that generates HK-curriculum-aligned questions, auto-grades typed answers, and shows teachers where the class is weak. Not revolutionary. Not a moat. But shippable, trustworthy, and grant-qualifying. The moat (scan-to-grade + misconception detection) comes in v2/v3 as the technology catches up and error data accumulates.
Remaining Risks: Zero HK math teacher interviews (all pain data is survey-level). Unknown Essai engineering capacity. HKEAA licensing needed for past-paper content.20 Primary schools have lower willingness-to-pay than secondary. “Digital-only” v1 doesn’t match the paper-first classroom reality — this is a deliberate trade-off for trust.
What Would Resolve Uncertainty: (1) 3–5 HK primary math teacher interviews (direct validation of demand and willingness to use a digital-input tool). (2) SmartQuest product teardown (how good is their OMR accuracy?). (3) Mach Innovation/HKU partnership feasibility check. (4) Pilot test: LLM grading on 20 real HK P3–P6 math papers (accuracy benchmark before building). (5) Engineering capacity check (who builds this, when?).