ChatGPT finally counts ‘r’s in ‘strawberry’ but still trips on ‘cranberry’

OpenAI’s ChatGPT announced on April 28, 2026 that it could correctly count the three “r” letters in “strawberry,” a task that has long stumped language models. Within minutes, users demonstrated the bot still miscounted “cranberry,” reporting only one “r” instead of two. Tests of the same model on a classic “car‑wash” reasoning question also showed mixed results, with some competitors flagging the logical flaw that the model missed. The episode highlights both progress and lingering gaps in AI’s handling of simple counting and contextual reasoning.

On April 28, 2026, the official ChatGPT X account posted a short video captioned “At long last,” declaring that the latest version of the chatbot could finally answer the long‑standing trivia question: how many “r” letters appear in the word “strawberry.” The bot responded with the correct count of three, a milestone that many AI observers had marked as a symbolic win for large language models that often stumble on elementary letter‑counting tasks.

Almost immediately, the celebration turned into a new round of testing. X user @NathanEspinoza_ posted a screenshot showing the bot’s answer to the same question with the word “cranberry.” ChatGPT claimed there was only one “r,” a clear miscount given the word actually contains two. The discrepancy prompted a quick replication on a personal instance of ChatGPT running on GPT‑5.5, which reported two “r”s—still incorrect, but different from the earlier answer. In both cases, the model acknowledged the mistake when challenged, attributing it to a simple counting error.

The pattern suggests that the recent fix may be hard‑coded for the specific term “strawberry” rather than reflecting a broader improvement in how the model parses individual characters. Large language models, including ChatGPT, encode words as high‑dimensional vectors that capture meaning and context but do not inherently preserve the granular structure of letters. Consequently, tasks that require precise character‑level analysis remain difficult without explicit programming.

Beyond counting, the same day saw renewed scrutiny of the model’s reasoning abilities. OpenAI’s post also boasted that ChatGPT could now solve the “car‑wash” problem—a scenario that asks whether it’s faster to walk or drive to a car wash located 50 meters away. The logical trap lies in recognizing that walking would be quicker only if the car itself were not needed for the wash. When the author tested the latest GPT‑5.5 model, it again recommended walking, ignoring the necessity of the vehicle. Competing systems fared better: Claude (Sonnet 4.6) echoed the same mistake, while Google’s Gemini flagged the oversight, and Grok not only identified the flaw but also noted the question’s popularity as a benchmark for contextual understanding.

The mixed results underscore a broader debate within AI research: are models genuinely getting smarter, or are they simply being tuned to pass a growing catalog of benchmark tests? The strawberry success, coupled with the cranberry slip and the car‑wash reasoning gap, paints a picture of incremental advancement punctuated by lingering blind spots.

The car‑wash reasoning test

Experts have long used the car‑wash scenario to probe whether an AI can differentiate between surface‑level efficiency and the underlying goal of a task. While walking covers the distance faster, the user still must bring the car to the wash, rendering the walking recommendation impractical. Gemini’s response highlighted this nuance, stating that walking would be quicker but that the car must be present for the wash to occur. Grok went a step further, labeling the question a “popular test” for assessing whether an AI grasps the actual objective versus offering generic advice about health or environmental benefits.

OpenAI’s claim of fixing the strawberry test may reflect a targeted patch rather than a systemic overhaul of the model’s tokenization and reasoning pipelines. As AI developers continue to iterate, each public demonstration—whether a triumph or a stumble—offers valuable data points for refining how language models handle both linguistic subtleties and real‑world logic.

For now, users can expect ChatGPT to answer “strawberry” correctly, but they should remain skeptical of its performance on similar tasks that require precise character counting or nuanced contextual judgment. The episode serves as a reminder that while AI capabilities are expanding, the gap between human intuition and machine inference still contains noticeable cracks.

ChatGPT finally counts ‘r’s in ‘strawberry’ but still trips on ‘cranberry’

Key Points

The car‑wash reasoning test

Also available in: