The useless text book algorithms
I was listening to the Programming Touchdown podcast, and on Episode 43, around the 13:00 minute mark, there was the following quote:
I can count on one, maybe two hands, the number of times in my entire career where I need to use… like the algorithm I used made an impactful difference. Most of the time, it doesn’t matter, architecture can matter, sometimes. Like, true, textbook algorithms.
This is roughly what it felt like to hear this…
I mean, seriously! Okay, I write database engines for a living. Part of my job is to look at academic literature to see if there are good ways to improve what we are doing. But you know what, let us say that building database engines is something special, and that your regular web developer doesn’t need any of it.
Let us see how right that is, shall we?
I’m going to use The Art of Computer Programming by Donald E. Knuth as an example here, because that certain match the definition of a text book. Reading just the table of contents:
- Data structure - Stack, Queues, Linked Lists, Arrays, Binary Trees.
- Concepts: Co-routines (async in C#, promises in JS, etc), Random Numbers, how floating points number actually work.
- Sorting algorithms, searching sorted & unsorted data.
Saying that algorithms don’t matter is about taking decades of research and throwing it down the tube, because the machine is powerful enough for me to not notice that I’m brute forcing a solution at O(N**2)
Comments
"It depends."
The key word is context. If you build something for your local intranet and for 10 users, bad algorithm could be "good enough". Nobody will notice. If you work on "google search" type app choosing wrong algorithm could cost millions.
But, for professional sw developers there should be no excuse to know at least basic data structures and their pros and consts. Also maybe "cloud economy" will improve this status, because better code has more direct consequences in cost.
You provide the algoritms for other programmers like me.
I have almost never sorted data by programming. I started using PolyFiles for Turbo Pascal in the 1980s. It was a B-tree implementation as far as I remember.
Nice to know about algoritms and I like to read about them in my spare time but at work I just use the algoritms that the frameworks provide.
If you know about something about algoritms your data structures become wiser. Never do Random search in a large linked list. Sometimes you can guess the algoritm your program is using by taking the time :-)
99.9% of the total time developers spend world-wide is not related to non-trivial algorithms.
I do not know why people recommend Knuths book so much. It has no practical relevance for most people. Algorithms are a very different skill from architecting and programming most real-world software. The book clearly has non-zero benefit, it's just not time well spent.
In other words you don't need to understand the internals of a hash table in order to understand it's perf profile and use it successfully. 99.9% of the time.
Petar, Basic algorithms are important pretty much everywhere. Even dealing with very little data, and very few users, you can produce utter crap if you don't know what you are doing. Sure, something a brute force approach is faster to write, and you make a judgement call on this and say, this is better. But not actually doing that judgement call means that you are not doing your job. And not knowing that you need to make that judgement call means that you aren't qualified for the job
Carsten, Sure,most of the time a dev will use prepared data structures and basic routines. If you write your own sort routines, instead of using the framework's, that is suspicious most of the time. But using them? That is crucial. Choosing a linked list when you need to do a lot of indexed get/set is a good example, but something like caching, which is frequently used, if you don't understand LRU/MRU, how do you do that? And that is a classic text book algorithm
Mark, If you don't understand the basic algorithms that are described in Knuth, how can you write software?
A basic example is people not understanding string, and what is means for your system, here is a good example: http://www.joelonsoftware.com/articles/fog0000000319.html
You don't build the basic stuff (at least not unless you have special requirements), but you better be able to understand what is going on there.
Ayende, when did you last look at the code of an "enterprise" internal codebase? It is banal. It's validating data, passing it around, reading it, writing it, calling services... Little need to know what's under the hood.
I know that I do not know anything about cache invalidation because of:
See http://martinfowler.com/bliki/TwoHardThings.html http://www.quora.com/Why-is-cache-invalidation-considered-difficult
Mark, Point to a OSS project doing enterprise stuff (not libraries, framework or something like that) that isn't trivial. And I can easily find places where proper text book algo are used (or should have been used)
I must get out more - I am genuinely shocked reading the comments here. Really amazing.
Knowledge of basic algorithms and design patterns is one of the key differences between average developers and the top 20%. If you want to have a long, productive and lucrative career, you need to step up and learn the tools of your trade.
+1 to Tom's comment. In my experience, developers with that kind of attitude usually produce code that will inevitably have to get fixed later by developers who actually understand what's going on under the covers. Software development omniscience is impossible, but we shouldn't be comfortable in our ignorance either.
+1 to Tom's comment.
I do agree that data structures are important (how can you not agree with that?) but I also agree with some of the other comments that in so many cases it just doesn't matter. Servers are so powerful, and datasets are so small that just getting it working is more than enough for many teams.
I used to work at a professional services company and those types of places are brutal when it comes to code health. Since clients are paying for all changes to the software, they want it as cheap and fast as possible, no matter the technical debt cost. As long as it works, and it's not slow as all hell, they're happy with it. Most of the programmers at professional service style companies are also just happy to have a job, so they don't care either. Those are degrading places to work (which is why I'm no longer there.)
But if you work on anything "important", or want to take pride in your work, you need to care and know these things. Tom has it right - knowledge of algo's is what separates the average with the top developers.
Of course algorithms are important. Isn't truth also saying "premature optimization is the root of all evil" ?
i think it's important to really understand what you're doing and how important this is for your goal. Working on a web facing groupware i - of course - are focusing performance and proper use of the resources we have. haste makes waste and understanding the sytems from bottom up is good. But if you're not dealing a platform that has to handle millions of records and deliver them in seconds to the user you may focus on things like security or - and that's aside a very important part - user experience. having a system that is blazing fast but wastes tons of user time due non-understandable UI may be less effective than a system letting the user wait for a second but is much more intuitive. i'm not that UI guy and i like - like propably most readers here - working with large datasets. but i also see that the success or failure of software or new features on software depends on a high adoption rate. and adoption is just a result of how a more or less untrained user can do this daily job on the software.
I listened to the episode. Before tuning in, I was upset that he doesn't want to learn computer science for computer science's sake. But that's not where he's at. He clearly understands the fundamentals and was just putting an exclamation point on the silliness of interview questions that go way off target. I agreed with that. But I took A LOT of offense to his other comment that (paraphrasing) "I only solve for the case where there are going to be 5 items and then when the logging says there are 10 items I fix it, but not before". I'm not really a fan of rework particularly when it's the product of laziness, which is usually the root cause. Do the job right the first time and then we can do other more interesting problems in the future rather than wasting cycles fixing. This is not "gold plating", it's a quality of life issue in my opinion. Don't add entropy to the world.
This is one of those cases that we can apply the Dunning-Kruger effect. People that does not master algorithms/data structures does not know all the benefits involved in mastering them. So they try to minimize the Cognitive Dissonance minimizing the importance of algorithms/data structures.
This question can only be answered by those that master algorithms/data structures, and they generally advocates for the importance of mastering algorithms/data structures.
Why People Fail to Recognize Their Own Incompetence https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
OS projects are rarely banal. They are meant to do a certain job well in a general way so that they are reusable. The most banal OS project I can think of is https://github.com/ua-parser which basically applies a Regex based rule set to parse user agent strings (e.g. https://github.com/ua-parser/uap-csharp/blob/master/UAParser/UAParser.cs). Just a few DTOs, push a little data around, ...
The regex clearly is based on hard CS theory but using it is not. A hash table is hard to write but easy to use. etc.
If you need to know internals you can often demand-page them in from Wikipedia.
I have not made it clear in previous comments so let me add: If you want to be a very top programmer you probably need to have a big arsenal of CS stuff. But even then you don't need to be able to write those algorithms 99% of the time. Using them is enough.
Mark, Regex are a great example. You should be able to look at a poorly performing regex and figure out why it is poorly performing. See good examples here: http://blog.codinghorror.com/regex-performance/
This is pretty much my point. If you don't have grasp of the algorithms, you are going to be making very poor use of the environment you are on. Furthermore, it is going to have a huge impact on your codebase even for "bland" projects
I'd of been in the gotta know you algorithms and methods and ways of design camp..... I have done, and still do, a lot of programming on embedded systems all the way through to web apps. Love Knuth. Design Patterns are a great stepping stone to learning to design..... and to be an all round programmer you should know these things. However, there are cases where this isn't really that important at all. I've been helping a friend learn programming using Meteor, and it hasn't required any knowledge of algorithms, or any specific awareness of design patterns, in fact its required very little traditional programming knowledge at all. Its required knowing how to query / get at things, use of libraries that encapsulate various logical algorithms ( sort, map, etc). But more importantly, its required how to structure UI into components and wire it to a backend (which is simple), and that's probably way way way way more important than any knowledge of SOLID, algorithms, patterns, design principles, etc. It surprises me how much my friend puts together without really understanding at all whats going on underneath.
In fact, what I see, is almost an opposite effect, that developers, who having an awareness of design principles, patterns, etc, they tend to bloat things with what they think is good principles, good abstractions. I can clean my friends code up in a matter of moments, but developers who "Architect" too much for the problem at hand, that takes a lot lot more effort to correct.
"I can count on one, maybe two hands, the number of times" I have used an algorithm that made a difference ... in any given week in the last year. Knowing the difference between a list, hash table or a b-tree has effectively become the crux of my job, fixing the code of developers that don't. I'm shocked each day seeing how devs depend on an intellisense these days: "xyz, press dot, there's Contains method on the list, done". No, you're not done. Making that judgment call that Oren mentions whether such is enough IS part of developer's job, period.
@Keith. What kind of efford takes? Understanding the architecture or changing things effectively? Beause if the efford is for changing things, then, no good principles or good abtraction is done. If the efford is for understanding the architecture I don't think this is inherently bad. Understanding an architecture could be hard, but when you reach to a good comprehesion and you get the architecture mindset; all the pieces of the puzzle fits and you can confidently touch things in a easy way because you always know where touch and their consequences. It is just one big efford for easier future lfe; otherwise is a pain in the ass the rest of your life.
Some truth to what you are saying, Ayende. I'll change my opinion. It's important to know lots of algorithms and their properties. It's a waste of time studying them in detail is the way Knuth does it. You can demand-page that detailed knowledge in.
Raymond Chen gave a presentation several years ago that talked about how the implementation of things on your platform can affect the algorithm you chose: https://channel9.msdn.com/Blogs/scobleizer/Raymond-Chen-PDC-05-Talk-Five-Things-Every-Win32-Programmer-Needs-to-Know
Coding a dictionary is hard, but using a dictionary is trivial ... until you find someone returning values from a dictionary by opening them all up and searching for specific matching sub-properties.
Indeed, 98% of developers don't need algorithms, that's the fact you need to accept. As well as they don't develop database engines. Actually they most likely develop CRUD business applications. There is a lotbof other things they need to know besides algorithms. You don't need to know chemistry and physics to be a bus driver.
Alexander, As a bus driver, you don't need to know physics and chemistry, sure. But the basic algorithms? That is like saying that you don't need to know the meaning of the road signs
Basic algorithms sure, you do. But basic only. Usually this topic quickly overgrowns into something more complex, like NP-completance, theory of probability, and all this Boyer-Moore algorithms I strongly believe developers don't need to understand. In another words, they are nice to have, and it makes you a better developer. But there are other more important things making you even better developer. I would name SOLID here. Good class hierarchy improves maintainability what reduces cost. Insufficient performance you can easily buy, especially in the age of cheap Chinese hardware and affordable clouds.
Comment preview