Team Alpaca Usability Report

In our usability testing, we walked participants through four key tasks:

Automatic Scheduling: Finding a time to schedule coffee with a friend based on shared availability.
Pre-Event Check In: Checking in with a friend leading up to a planned social event.
Post-Event Check In: Reflecting on a recent social event.
Flake Score: Participants viewed a summary of their recent flaking activity and metrics evaluating their “flakiness.”

We tested with three participants across two testing rounds — in our first test, our participant completed the tasks individually, and in our second tests our other two participants worked through the tasks together.

Overall, we identified the following issues while completing our testing procedure:

Application-wide:

Unclear what information is shared with other person/people in the text conversation. (severe)
- Throughout several tasks, users were unsure whether their information would be shared with others (e.g., Reflections, Flakescore). We plan to address this to give users confidence that their information will be kept secure. Possible solutions we will explore include providing a brief note at the bottom of each reflection with text such as “This response will not be shared…” or providing onboarding information clarifying the purpose of pre and post-event check-ins.

Task 1: Automatic scheduling

Unclear whether or not calendar is synced, and whether scheduling suggestions come from user’s calendar, message content, or both (moderate).
- To clarify the scheduling process, we plan on indicating how suggestions were made by linking the content of messages to recommended time slots (e.g., “6am on Friday the 14th” → “I might be free in the morning tomorrow”). In the screen where we prompt the user to analyze their message calendar and sync their calendar, we will also note, likely through a text label, that only their individual calendar and the content of their messages will be used—rather than multiple users’ calendars.
Suggested scheduling times were not scrollable, restricting which options the user could select (trivial).
- We identified this small technical bug in one of our usability tests, where some off-screen calendar options were not accessible because horizontal scroll was not enabled. Our direct plan to address this is to implement horizontal scrolling; however, we are also exploring more vertical formats, such as a width-2 grid, to be consistent with vertical-first mobile layouts.
Post plan check in popup bug (trivial)
- We identified a bug where the post-plan check-in modal appears when users click into a text conversation. We will address this bug by removing this modal, ensuring the check-in is only accessible from the plans page.

Task 2: Pre-event check in

Didn’t know how close to the event the check-in was happening (moderate).
- Testers noted that there was no information in the prototype about how far in advance from the event the check-in was to be completed. One tester said they thought the check-in would occur an hour in advance, while another thought the check-in occurred at the start of the event. We plan to address this by allowing users to choose the interval between check in and start time (eg. 1hr, 12hrs) to accommodate the different user preferences we identified in testing.
Working was confusing: “Check-in now not yet recorded” (trivial)
- Our prototype included a flag that indicated that the user had not checked in; however, our first tester found the wording we used confusing. Between the first and second usability tests, we changed the wording to “No pre-event check in recorded,” which seemed to resolve the confusion. We may simplify further to “no response recorded” for further clarity and consistency across tasks 2 and 3.

Task 3: Post-event check in

Check in felt overly long; some questions seemed superfluous (moderate)
- Two out of three participants felt that including text-based reflection prompts added complexity to the experience that would discourage them from regularly completing post-event check-ins. They also expressed that it was unclear to them how subjective reflection information would be used in the calculation of this Flakescore. While we are still reflecting on the importance and viability of allowing free-form responses, we are considering removing these written reflection components for greater clarity and ease of use.
Unclear whether you are evaluating yourself or the other plan attendees. (moderate)
- When prompted to choose between “showed up on time,” “showed up late,” “flaked,” etc., testers were unsure whether they were documenting their own behavior or that of the other plan attendees. We plan to address this by including pronouns in the choices — eg. “I showed up time,” “I flaked” — to indicate that the reflection is centered around personal behavior.

Task 4: Flake score

Score generation is opaque; unclear where score increases/decreases come from (moderate)
- Participants commented on several aspects of the Flake Score breakdown section, namely punctuality and specificity. They wondered how the platform had determined whether they had checked-in to plans on time and what factors went into the specificity of their responses while making plans. In order to clarify how these aspects of the score were calculated, we will provide users with the ability to click on different categories and receive an overview of the considerations for each. We are also exploring showing users recent plans that had contributed to each category in the breakdown. For example, the user could open a popup containing a list of recent plans and how they had added or subtracted to their % for punctuality.
- One participant additionally commented that it would be helpful to have a clearer time range for the FlakeScore to reveal how relevant it was to their recent activity (e.g., weekly score vs. monthly score), specifically in the graph portion of the screen showing their FlakeScore over time. We will attempt to address this by adding clearer axes to the graph and simulating data based on daily, weekly, and monthly flaking activity.

Tyler Abernethy, Vardhan Agrawal, Katherine Sullivan, Sunny Yu. Usability Report. March 11, 2026.

A Higher Common Sense

About the author

Leave a Reply Cancel reply

About the author

Related Posts

Leave a Reply Cancel reply