Dependency Update and Artifacts Promotion in Multi-repo Project

Dependency Update and Artifacts Promotion in Multi-repo Project

We all know Google employs a version tracking system that uses a single repository/depot. Every close-source google product that you love is tracked by this single repo, which is so large that it cannot fit onto a single disk drive and must be hosted on the cloud. During my first encounter with the monstrous system, I was just as baffled as you might be, but I have come to appreciate the benefits (and costs) of a single repo after I start working on the Istio project.

An open-sourced management platform for microservices, Istio presents a uniform abstraction over heterogeneous cloud vendors to support canary release, policy enforcement, telemetry, and much more (shameless plug). It consists of multiple repositories on Github with the vision that each module could be used independently outside of Istio, except that for Istio itself the independence we would like does not exist due to the dependency among these repos. By dependency, I mean each repo needs other repos to build itself. Istio uses Bazel build tool. All dependencies are defined in the WORKSPACE file at the root directory of each repo, and each dependency includes a commit SHA pointer that specifies the exact version used in this build.

One nice feature of a single repo is the ability to make atomic changes given the property of a single point of serialization. Each commit/snapshot/diff/view may touch multiple files and is applied to the code base atomically (analogous to transactions). For files spanning multiple repos, atomicity is no longer guaranteed. The major challenge in a multi-repo scheme is staleness. When the stable branch of a dependency advances to a new version, the parent repo still uses the dependency of the older version, unaware of such an update. To change the SHA pointer that the parent repo has, it is going to be a separate PR on the parent aside from the one that updates the dependency, which is the reason atomicity is gone.

Yet having one single repo makes continuous integration slow, where each PR must pass the pre-submit test before merging. Even a one-line change runs tests on the entire project to prevent regressions. As the repo grows in size, so does the test suite. When each CI takes two hours to complete, productivity suffers. Google’s solution is Blaze (whose open source version is Bazel, shout out the to the anagram), which defines a hierarchical build dependency so that the affected files by any code change and be exactly identified, so only the affected tests need to be run. Multiple repos, on the other hand, make CI much easier since tests could be partitioned in different repos and PRs on each repo only triggers tests on that repo.

Back to our problem of stale dependency. For changes involving separated shards/participants, the first idea that comes to mind is distributed transactions (say two-phase locking and two-phase commit). But it means an entire revamp that requires additional tooling on code reviewing multi-repo, aggregating pre-submit CI testing, etc, which is likely to take up the entire quarter and everyone on the dev team still suffers in the meantime. We prefer something less invasive but sooner to deployment. The design is to let go of strong consistency for eventual one, by running cron jobs periodically checking dependency versions and if changed, create a PR on the parent repo. It may take a few runs for a change in the leaves to propagate to the root, but it is okay for the repo to be stale by a couple of hours, because we know eventually the entire project is consistent. If you are interested in the binary used to do this, check this out.