LinguaLeap Assumption Card

LinguaLeap Assumption Test Cards
We selected our three core assumptions to systematically de-risk LinguaLeap across desirability, viability, and feasibility – the three lenses that determine whether a product should exist, can exist, and will sustain itself. The desirability assumption tests whether students truly value in-person peer conversations; the viability assumption validates willingness to pay and pricing power; and the feasibility assumption addresses the technical question of whether AI can actually understand beginner speech and provide useful feedback. Together, these help us eliminate / de-risk the potential issues around demand, business model, and technical capability.

————————————————————————————————————————-

1. Feasibility Assumption: AI can reliably understand novice language speech & provide useful feedback (Ines and Ishita)

Critical Assumption: Our assumption is that AI models can correctly interpret novice language learner speech, infer intended meaning, and deliver accurate feedback at a level that supports real improvement for students. If this assumption is wrong it will mean that we cannot deliver the core value proposition (confidence and feedback), and the product collapses into just “peer matching.”

Why It’s Risky

  • LLMs + speech models are trained mostly on native speakers
  • Models currently struggle with accents, mistakes, and early proficiency speech
  • Human instructors infer intended meaning and scaffold incorrectly formed language — AI often can’t
  • Fixing this is not a UI issue — it’s a deep technical capability + data problem
  • Timeline risk: we have ~5 months to validate feasibility before the business dies
  • This isn’t “will users trust AI feedback?”
  • It’s “can AI even generate correct feedback for beginners?”

Hypothesis: AI will correctly understand and evaluate beginner-level language speech at least 70% of the time without misinterpreting intent or giving incorrect advice.

Test Method The way we will Technical prototype test is by recruiting 10–15 beginner French learners and each will records 5 short speaking tasks (such as introducing themselves, describing their day etc). For this we will need to collect: AI speech-to-text output, AI grammar/meaning interpretation, AI feedback + suggested corrections, have French instructors score, speech recognition accuracy, intended meaning comprehension, correctness of feedback, harmfulness (did AI reinforce wrong patterns?) We should also include baseline vs acclimated data (eg mini training like “Say these 5 F-sounds so the system adapts to you”)

Metrics & Validation Criteria
Our Metric Success Threshold will be:
Speech recognition accuracy on beginner accents ≥ 70% word-level accuracy
Correct interpretation of intended meaning ≥ 70% tasks correctly understood
Feedback correctness (linguistic + intent) ≥ 75% judged correct by teachers
Harmful feedback (misleading/wrong) ≤ 10% occurrences
Improves after short “voice training” ≥ 10% boost in model performance post-calibration

Failure mode signals: Our signs that our assumption and model have failed will be that the model can’t understand mistakes students typically make, it misinterprets intent frequently, it gives harmful feedback > 5% and voice calibration doesn’t meaningfully help.

What would this test teach us?This will teach us whether  AI handle beginner-accented speech? Can it understand intent? Does a personal calibration step (like “Hey Siri” setup) meaningfully improve performance?

If needed → do we need human-in-the-loop, more training data, or a different scope?

Contingency if Invalidated  If the model can’t handle novice speech today then we will do:

Short-term pivots with start with pronunciation + filler word feedback only (objective signals); focus on intermediate learners first; human review loop for feedback quality; build accent-adaptation training onboarding

Long-term strategy: collect large dataset of learner accents & errors; train model on structured error patterns (CEFR error corpora); build a custom pronunciation/accent model; clean One-Sentence Version

Overall: We are testing whether AI can reliably understand beginner foreign-language speech and give accurate, constructive feedback. If it can’t, we don’t have a product — just a matching tool — so we will run controlled speech-task experiments scored by real instructors and evaluate accuracy, intent recognition, and harmful feedback risk.

————————————————————————————————————————–

2. Viability Assumption (Luke, Amesha and Alan)

WE BELIEVE THAT students will pay $20/month for an AI-powered language learning platform

TO VERIFY THAT, WE WILL create AI-guided conversation demo app to test with 2 users, track conversion to opting into further conversations and ratings out of 5

AND MEASURE self-reported interest on a scale from 0-10 and proposed price range that they would be willing to pay

WE ARE RIGHT IF students give us an average of 7.5 in interest score and their price range falls around $15-$25 a month

————————————————————————————————————————-

3. Desirability Assumption (Silin and Barry)

Hypothesis

Students want to meet in person to have peer conversations using their target language.

Assumption Selection (Desirability)

Our entire value proposition is that we provide conversation partner matching and conversational feedback for students, and that this is a viable way of improving or maintaining their oral proficiency. If students do not see the value in having peer conversations and/or are unwilling to meet up, our product is undesirable, making this an extremely high risk assumption.

Testing Method

We will first gather a list of 10+ students who are currently trying to learn a given target language (TL)

  • Conduct a survey about whether they want to chat in person or meet online
  • Pair them up to participate in conversation (can be randomized based on ELO)
  • Have a native speaker of the TL act as a facilitator and feedback mechanism (standing in for an AI agent)
  • After conversation concludes, have participants fill out a survey separately and anonymously about their experience
  • If they responded that they would be willing to meet up with someone again, schedule another conversation to hold them to it.

Metrics

  • Amount of times willing to meet a week
  • Satisfaction with conversational quality
  • Willingness to find new partners to chat with
  • Length of conversations

Success Criteria

  • Students are willing to meet at least once a week (ideally 2-3+ times a week) and follow through on scheduled meetings
  • Students rate their conversations an average of 4 out of 5 across different measures of conversational quality
  • Students rate their willingness to meet with new conversation partners at least 4 out of 5 (and if they have additional meetings, they are willing to speak to someone new)
  • Student conversations organically last for at least 10 minutes with minimal prompting from the facilitator

 

Avatar

About the author

Leave a Reply