Thanks for that. Though, LOC seems to fall into the "What is a line of code trap."
Study 2 also supports complexity with errors, not size (small or large).
I hadn't realized how old these studies were, however. For some reason, I thought they were from the 90's. I'd be wary about relying on, what is, frankly outdated assumptions.
I'd be wary about relying on, what is, frankly outdated assumptions.
You'd be wary, and so you'd replace these studies with what exactly? Which of their assumptions are outdated?
Let's not fall into the trap of thinking that programmers before us didn't understand as much. As far as I can tell, the issues were much the same. A greater problem with older studies is that their sample sizes were mostly small and their experiments never replicated.
> Which specifically of their assumptions are outdated?
Methodologies. Languages. Practices. Tools.
You'd have to assume that nothing has changed in the last 20-30+ years in the world of programming to combat errors and defects, which is decidedly not the case.
This isn't at all to say that programmers before us didn't understand as much. Just they had different limitations.
I'll admit, I've been doing far more reading on the subject because of this thread. I still stand by the concept that a function should do one thing, and only one thing. The size of the function isn't important, but the nature of a function doing one thing generally leads to smaller functions.
I also think duplication should be removed. This also tends to remove excessive code.
On that note, and interesting discussion can be found here:
Sorry to reply twice, but I remembered something about this:
a function should do one thing, and only one thing.
I'm sure we all agree that functions should be well-defined. However, this is curiously separate from the question of how long they should be, because there are often different possible decompositions of functions into legitimate "one things" at distinct granularities.
For example, I have a rather long function in my code that builds a graph. That's clearly "one thing": you pass it what it needs to build a graph, and it returns the graph. But I could also extract smaller functions from it that also do "one thing"—to add a vertex, say, or establish an edge. In both decompositions—buildGraph by itself vs. buildGraph, addVertex and establishEdge as an ensemble—all the functions do "one thing". It's just that in the first example there is one coarse-grained "one thing", while in the second there are three finer-grained "one thing"s.
So cohesion (a function should do one thing only) doesn't tell us much about whether to prefer longer or shorter functions. All other things being equal, the coarse decomposition is arguably better than the finer-grained one, because it's simpler in two ways: it has fewer parts (1 vs. 3 in the above example), and the total code is smaller (it saves 2 function declarations).
> because there are often different possible decompositions of functions into legitimate "one things" at distinct granularities.
Actually, it's fairly well established. DRY: Don't repeat yourself. Yes, if you need to assign 50 values, and each is unique, than breaking them up might be difficult. However, these aren't the rules. These are the exceptions. Heck, even your graph example could be an exception.
However, in my experience, most code does contain hits on how things can be broken up into smaller, more manageable pieces.
> the coarse decomposition is arguably better than the finer-grained one
=) The finer-grained one is arguably better than the coarse decomposition because it removes repeated lines of code and makes it easier to test the smaller methods. That the total number of lines of code is increase is unimportant, because you rarely look at all the total code in a module. Rather, you are focused on individual components. If the individual component is smaller, it makes it easier to understand.
I appreciate your point of view. And I agree, it's not easy to make things smaller without losing something. However, that doesn't mean we shouldn't try.
That's really just being lazy on your part though. I'm not going to sit here and list everything that's come onto the scene since 1990. Every new language. Every new methodology. Every new practice, pattern, and tool. And not just new things, but also improvements over way things are done.
Unless you are going to sit here and suggest that ARC or GC don't help decrease memory errors, or that languages like Python or Java haven't helped move things along. Heck, even C has been steadily improved over the years. Compilers are getting smarter. Tools like valgrind. Agile methodologies, and formalized code testing. Static analysis and even improve code reviews. Heck, even simple things like IDEs and editors.
So much has changed, so much as evolved. Does that mean everything is wrong? No. But relying on studies that can't be replicated and don't account for common programming practices and environments today is dangerous.
You claimed that the studies kazagistar listed rely on outdated assumptions. So, specifically which studies are invalidated by specifically which assumptions? It ought to be easy to name one, since there are quite a few studies and you say there are myriads of outdated assumptions to choose from.
Your answer is that "so much has changed", you're "not going to sit here and list everything", and my question is "really just being lazy"? That's a little hard to take seriously.
I'm going to be frank, in the 80s, programmers sucked compared to today's programmers. If you took an 80s programmer and time shifted him to today, he'd look like he had no experience and really struggle until he'd spent a lot of time learning new stuff.
You've grown up with software engineering in the last 10 years. Have you honestly not seen the sea change in that time?
Be it programming languages that have shifted dramatically from the original OO in Java/C# 1.0, javascript inspiring massive changes in those languages with anonymous types, closures, etc., with unit testing, TDD. There's so much that's changed, so, so much I can't even think of half of it.
I said it below, who today passes ByRef? Or uses a global variable? 10 years ago that was common.
Just 10 years ago! The field is moving so amazingly rapidly!
And one of the real things I've seen, something you can't describe unless you've got experience, which I've no doubt you have, is that methods now actually do what they say they do. They didn't used to. They used to do other things, and touch global vars, and get passed in variables to muck around with.
And on top of that, people now seem to actually understand OO. Not just blindly recanting what polymorphic means, they can actually split up and create objects sensibly. They really couldn't do it very well 10 years ago. No-one really understood it, it sounded like a good idea but only a tiny percentage actually understood what it really meant.
And yet you think studies done back then, using techniques that would get people fired today, are worth referencing?
LOOK at the dates in the parent post. 1986? No Java, no Javascript, no Python, no Ruby, no C#. But COBOL had just released an update!
You'd better watch out for those GOTOs!
I honestly am perplexed that you can even start to think those studies are still relevant today.
The grandparent wasn't claiming that the 80s were as good as us, he's claiming that just the passage of time is a bad reason to forget a lesson.
You mention a long list of rules and practices, but I'm not sure they've had as unadulterated a benefit as you think. Replacing GOTOs with objects has more often than not resulted in a more structured form of spaghetti (https://en.wikipedia.org/wiki/Yo-yo_problem, for example) that's no better. There were already people programming with closures and anonymous types, there's just more of them now. That's probably better, but not night-and-day. Lots of people still write spaghetti with closures and callbacks.
Arguing about whether the past is better or the present is better is a waste of time. We're just trading generalities at this point about a topic that permits no easy generalization. One way I've been trying to avoid this trap is with the mantra (generality :), "languages don't suck, codebases suck." It forces me to remember that it's possible to write good code in a bad language, and (often) to write bad code in a good language. Intangible abilities, domain properties, accidental design decisions and cultural factors have a huge effect that it's easy to ignore and underestimate by clinging to reassuring rules about the parts that seem more tangible. But you won't grow until you try to stare into the abyss.
(Edit: By 'you' in that last sentence I didn't mean you specifically, mattmanser.)
Study 2 also supports complexity with errors, not size (small or large).
I hadn't realized how old these studies were, however. For some reason, I thought they were from the 90's. I'd be wary about relying on, what is, frankly outdated assumptions.