Why are Microsoft Products so Large?

A few months ago I anonymously answered a question on Quora, and it turned out to be my most popular answer ever, by several orders of magnitude. I’ve reposted it here, in order to expand on it a little bit.

Question (paraphrased): Why is Office more than 800MB in size, when LibreOffice can come preloaded with all of Ubuntu on a 750MB CD?

This question seems a little loaded, and is looking for an excuse to accuse Office of bloat. However, it’s important to keep in mind how difficult it is to make a software suite: you have an extremely broad user base, comprising the proverbial grandma who fires up Word to type up an email, to the banker who uses the most advanced pivot-table-sparkline-sprinkled features of Excel. Here’s a couple major reasons I can think of, in no particular order:

  • Office ships with a huge and growing number of templates, graphics, macros, default add-ons, help documents, etc. This is a major driver of bloat — it has nothing to do with lines of code, and everything to do with a vibrant, comprehensive, and growing ecosystem.
  • Office is decades old. Think about this for a moment. I’ve debugged code that was written in the early 90s. Since Office is pretty well designed and written (contrary to public perception), we almost never throw away old code. So the cumulative effects of years of new features tends to only grow the codebase. Properly leveraged, this is a major competitive advantage.
  • The sheer number of features in Office is mind-boggling. For most releases, Office closes more bugs (not sure if I’m allowed to disclose numbers) than most products have lines of code. Failure to understand how many features Office has is the #1 cause of death to direct competitors.
  • Office installs all code that it needs out of the box, with no external dependencies aside from the Windows API. This might seem counter-intuitive, but it actually makes the suite much larger. This is because we don’t rely on any third party library or framework. This can obscure the real size of installations such as LibreOffice, because it requires Java (and its default library), but nobody counts that against the size of LibreOffice’s installation. The same can be said of .NET applications – .NET itself is a massive codebase.
  • Licensing and code obfuscation plays a small factor. In addition to having to write licensing and antipiracy code that LibreOffice doesn’t need to implement, this must be obfuscated and protected against attacks. No easy feat, considering the attacker has local administrator rights. Also, Office is designed to be resistant to failure, and there is significant updating and security support built into the platform. This all adds weight.
  • Running in native code also means there are fewer abstractions; Office has code to deal with weird hardware and software configurations. It accounts for settings that stupid “registry cleaners” tweaked that would otherwise break it. It ships in 40+ languages. It knows how to deal with paths that exceed 255, or contain unicode characters. It contains security checks to defend against users opening malicious excel documents. There are hundreds more examples of things Office does that nobody realizes, but which would be sorely missed if they disappeared. This all takes code.

In general terms, Microsoft products optimize for the long tail of use cases. This means it has lots and lots of features that are seldom used. The 80/20 rule applies here: 80% of the users use 20% only of the features. There is a nuance to this rule though: every user uses a different 20% of the product. This means a software suite needs exponentially more features to capture a larger and larger share of the market; Office owns the market.

The reasons for Office’s dominance is poorly understood, and often attributed to format lock-in, or being the existing standard. But Office really wins by fully exploiting its economy-of-scale, and size is a side-effect of this. The massive user base allows Microsoft to invest in features that are relevant to only a small segment of users and still turn a positive ROI. But Microsoft often invests in features even when ROI is negative. Subsidizing unprofitable features means Microsoft can do lots and lots of things that competing software won’t do. This means competitors will have to burn money to catch up to Office, which they won’t have because they don’t have as large a customer base[1]. This essentially guarantees Office’s dominance.

It also explains why Office takes up so much space. It’s not a bug; it’s a feature.

[1] Except Google, which is apparently happy to subsidize from search.

Leave a Reply

Your email address will not be published. Required fields are marked *