THE MIGRATION LINE
INCIDENT // AIRBNB-2024 SHIPPED

Airbnb used an LLM-driven pipeline to migrate 3.5K React test files from Enzyme to RTL in 6 weeks instead of an estimated 1.5 years.

AIRBNB · 2024 · TESTING / REACT / LLM / AUTOMATION
System stress over time Breach at T+4
~3.5K Files migrated
1.5 years Est. manual time
6 weeks Actual time
97% Automated success
BASELINE

Enzyme had outlived its design

Airbnb adopted Enzyme in 2015, and it served the team well for years. But Enzyme was designed for earlier versions of React, and its deep access to component internals no longer matched modern React testing practices. In 2020, Airbnb had already started using React Testing Library for all new test development — the old framework wasn’t being replaced outright, it was just no longer where new work happened.

REQUIRE

What the migration actually needed

Airbnb needed an automated way to refactor nearly 3.5K test files from Enzyme to RTL — one that preserved both the original intent of each test and the team’s existing code coverage. A full manual rewrite was estimated at 1.5 years of engineering time, which set the bar any alternative approach had to beat.

DECISION

Choosing an LLM-driven pipeline

A mid-2023 hackathon team showed that large language models could convert hundreds of Enzyme files to RTL in just a few days. Building on that result, in 2024 Airbnb built a scalable, LLM-driven pipeline: migration broken into discrete, parallelizable per-file steps, modeled as a state machine, where a file only advanced once validation on the previous step passed.

ROLLOUT

Retries, context, and the long tail

The team found brute-force retry loops — re-running failed steps with the validation errors and latest file version fed back into the prompt — worked better than fine-tuned prompt engineering alone. For complex files, they pushed prompts to 40,000–100,000 tokens, pulling in up to 50 related files, few-shot examples, and passing tests from the same project. The first bulk run hit 75% success in four hours. To close the gap, they stamped files with migration-status comments and built tooling to re-run files filtered by which step they were stuck on, running a “sample, tune, sweep” loop against the remaining failures.

OUTCOME

What they accepted and what they gained

The remaining files took another week of manual work to finish. Even accounting for LLM API usage costs and six weeks of engineering time, the total was far more efficient than the original 1.5-year manual estimate. Airbnb replaced Enzyme while preserving original test intent and overall code coverage — the two constraints that had made deleting the old files a non-option in the first place.

The lesson: when two systems are too different to swap directly, the win isn’t a smarter one-shot conversion — it’s a pipeline that retries, gathers context, and lets automation and humans split the work where each is strongest.

Source — read the original

https://web.archive.org/web/20250313172640/https://medium.com/airbnb-engineering/accelerating-large-scale-test-migration-with-llms-9565c208023b

A plain-language, AI-drafted and human-edited retelling of the article published on web.archive.org, reorganized and explained in our own structure and words, with original analysis in the editor's note above. The facts, numbers, and decisions belong to the original author and are not altered. For the full depth, read the source.

← All systems