3 reasons ‘polylipoists’ oppose monolipo

To answer the question of what a monorepo and a polyrepo are, we must first look at the modules that make up a typical application. An application has several components. That is, several mobile client apps, a web application frontend, one or more backend services, a database and data management layer, professional services (such as reporting and management), etc.

There is also source code linked to each component. The source code, accessed by all developers responsible for the component, is stored in a code repository. There are many different ways to manage code repositories and repositories. Most of the repositories are based on an open source system known as Git.

ⓒ Getty Images Bank

Each component has a code repository. Now, here comes the problem. Do you need to store all the code for every component of your application in a company-wide or application-wide code repository? Or should each component have its own separate code repository? Until recently, each component of an application typically had its own repository. This model is referred to as a polyrepo because it concerns applications with multiple independent code repositories.

However, in recent years, some vendors, particularly Google, have been in favor of putting all the code of all application components into one big repository. This code management model is called monolipo.

The polylipo and monolipo methodologies each have their pros and cons, and there are many analysis articles. Which of the two methods should I use?

Personally, I think the traditional polylipo model is far superior. The monolipo model promotes bad habits and unhealthy processes, and the scalability of the development organization and the complexity of the application itself make it substantially more difficult to scale the application. Let’s take a closer look at the three reasons that support this claim.

Reason #1: Monolipo violates the single-team ownership principle

I strongly support single-team service ownership. We believe that ownership of a service, system, module or component should belong to a single development team. The team must be responsible for all aspects of the component, including design, creation, testing, deployment, operation, support and modification.

This model also includes owning all changes to that component’s source code. This doesn’t mean that only team members can change the source code of that component, it just means that any changes to a component are the responsibility of the owning team. The owning team must have the right to review and approve any changes. Ultimately, you are responsible for the component, so you should be able to manage the component’s code.

Enforcement is unbelievably difficult when all the source code for all components is contained in the same repository. With the same access rights, check-in procedures, and approval functions, if our team’s source code and the source code of the neighboring team are all in the same repository, it is very difficult to maintain the ownership and integrity of the component source code.

In the polyrepo model, each system component or service has its own unique repository. The owner of the component owns and manages the repository. The owner decides who can and cannot change. The owner decides who can and cannot accept and approve code reviews. Repository ownership is an important part of the overall ownership required for a single team service ownership model.

Reason #2: Monolipo promotes bad large-scale refactoring practices

An advantage of Monolipo is that it makes it easier to refactor very large sections of code when requesting changes. This makes even big tasks like changing internal API entry points a lot easier. The reason is that you can update the endpoint and all calls to the endpoint in one huge request.

But I think such a huge change is bad practice. You shouldn’t do massive refactorings that cross team boundaries in a single big update. For large projects, these requests usually require large-scale coordination among multiple development teams. For changes to take effect, multiple teams often need to do it all at once by coordinating release and release schedules. Individual service deployments are then virtually unmanageable.

Instead, a multi-step process should be used for global changes, such as changing the definition of an internal API entry point. For example, there is the following method.

  • Add support for new entry points and support old and new entry points together. Change the API version to reflect the change. Deploy a new version of the entry point and run both entry points for the time being.
  • Inform all affected teams that the new entry point has been published and that this entry point should be used from now on. Mark the old entry point as ‘outdated’ and indicate the scheduled date the old entry point will be removed.
  • Work with affected teams to ensure that changes are made to support new entry points on desired timelines. Have all teams deploy changes to use the new entry point. Each team must independently deploy support.
  • Once all teams have deployed the updated support and no one is using the old entry point, you can remove the old entry point from the service.

It’s obviously a longer and more complex process than making a single big change to a single repository. But think about the purpose you were originally trying to achieve. It changes the way internal entry points work, and the impact of that change can be felt across multiple teams across the entire organization. These kinds of changes should not be fast and easy to implement. However, it should be done slowly and with caution.

For all-out changes to be possible, coordination with each affected team is necessary. It’s not a good practice to be able to easily create such a massive, high-impact change. Reducing the overall change review process can lead to unexpected issues and can reduce communication and trust between teams.
Don’t make excuses with “it’s more comfortable”. Considering the impact, there are always things that need to be made neither easy nor fast. The multi-step change process will take longer because it involves a more thoughtful and large-scale evaluation that should not be rushed. Polylipo helps to mandate additional management and evaluation, while Monolipo makes it easy by shortening these considerations in a disruptive way across the organization.

Reason #3: Small repositories are better than large ones

Large applications have large repositories. If a large application had hundreds of components, and all of those components were in the same code repository, how huge would that repository be?

Google Monolipo is also huge. A single repository contains all of the primary Google code and contains over 2 billion lines of code, over 85 terabytes. It is 40 times the size of the entire Microsoft Windows operating system combined.

The larger the repository, the harder it is for individual engineers to manage the repository with the work of trying to develop the code to put in the repository. Many people work on a single repository, and the more often the code changes in a repository, the more effort the individual using that repository needs to maintain.

For Google, more than 45,000 changes are made to Monolipo every day. As the number of developers in an application grows and the number of components within the application grows, it becomes an exponential problem in overhead.

If it’s a polyrepo application, then each component has its own repository and few people work on the repo every day, so it’s not difficult to manage. Instead of thousands of developers working on Monorepo every day to manage thousands of changes, the well-organized team architecture of a typical polyrepo team repository manages 100 changes per day, so 5 to 10 admins. level will stop.

For Google, the challenge of scaling Monorepo has grown so large that it has had to develop a new repository management tool. Breaking away from the traditional Git-based system, I created a tool called Piper. It’s a tool designed specifically for processing huge code bases, but it wouldn’t be necessary if Google had used the polylipo model. This is because each repo on Polyrepo will be small enough for traditional mainstream code management tools, including Git. So much so that no special tools are needed.

Monolipo vs. Polylipo: Which one to choose?

In fact, there is no definitive answer. Both perspectives have strong advocates and have distinct advantages and disadvantages. In particular, Google believes its Monolipo model is perfect, and has invested heavily in building the set of tools and processes it needs.

However, in a well-organized development environment, Monolipo’s disadvantages far outweigh its advantages. Polylipo, on the other hand, provides a much more beneficial, longer lasting, and highly scalable environment for scalable modern application architectures. [email protected]


Source: ITWorld Korea by www.itworld.co.kr.

*The article has been translated based on the content of ITWorld Korea by www.itworld.co.kr. If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!

*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.

*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!