« Back to home

On Scaling Code and Static Typing

Seemingly legit question on proggit just now:

"Out of curiosity, is static typing really that large an advantage? Yeah, I get that run-time errors could be worse than compile-time, but isn't that something that oughtn't to get past testing?"

The answer is yes, static typing is a huge advantage.

When you start out in software development by writing hobby projects, hobby websites, or code for school, the code bases are not very large. It's easy to gather a belief that that anything under the sun can be done with your favorite language because they're so easy to get started and work with by yourself or a few people.

But I've postulated for several years now that there's some number -- call it n -- that, upon reaching that number of lines of code, modules, or whatever metric you want, your organization cannot effectively scale its code. N depends on a lot of things. Tooling available, structure of the code, experience of the devs, the language itself, the institutional knowledge that needs to be passed on through code (because of high churn), and a bunch of other factors we probably can't even name.

It's impossible to say what n really is as a hard number applicable to all organizations. Maybe it's 50,000 lines of code for a new grad or maybe it's 50,000,000 if you have a super experienced, senior team that's been working on that code for 25 years.

But what is possible to claim is that n is larger by default for some languages than others. A larger number n simply means that an engineer does not require Total Code Awareness to do work.... thanks to abstractions offered by the language and its tooling.

Static typing makes a huge impact towards that end. It:

Allows for more sophisticated IDEs, code browsers, static analysis and so on.
Allows abstractions to be more thoroughly vetted before code is executed.
Better documents the intent of the original programmer.

From the get-go, these can make a huge impact on what n can ever eventually be. Add onto that the best practices that have been established for already-scaled languages and n is even higher.

Yet, all companies, with all languages, will face having to raise n at some point no matter what language they've chosen. This can be for non-code reasons due to churn, due to personnel talent, customer demands or whatever. (Assuming of course their company does need to scale, which would be a shame if not).

However, those that have implemented in a language with a low initial threshold for n will ultimately have to work more to increase that threshold. Each additional change requires more effort, often lacking a standard in doing so because not many have gone past that threshold.

An example that I've now seen at two companies that choose to use dynamic languages is using test running and test writing to scale. Scaling the organization becomes an effort in mandating code testing and then working to raise the number of tests that can be run. The tests are essential because without Total Code Awareness, it's the only way someone can make a change and have confidence they didn't break something else. This is not true for a statically typed language. Tests are one of the ways people can have that confidence. The compiler and static analysis are other ways.

As a result, these companies have had tens of thousands of tests that take 45 minutes to an hour to run. The tests become extremely brittle because, without that brittleness, no one has confidence that things won't break. Ultimately, to raise n, the company has traded off what a compiler can help with by forcing people to write lots and lots of code and run that code every time a change is made. This will take longer than something like incremental compilation ever would.

Raising n is why we've seen larger organizations run away from mainstream dynamic languages for code that needs to scale. Python is all but dead at Google. Facebook has added their own type checking to PHP at this point. Twitter scrapped Ruby for Java and Scala. And so on.

So there's your choice. You choose a threshold when you pick a language. Going above that threshold means a lot of work in the future. Choosing a language with a high initial threshold can help a lot down the road. It's up to you to determine what n works for you. Personally, I prefer static typing for all of these reasons, because I don't like working on small projects, and I believe it increases n dramatically.