Tradeoffs in the Software Workflow

Title: Tradeoffs in the Software Workflow
Date: October 26, 2022
Duration: 1 HR

SPEAKER
Titus Winters, Senior Software Engineer, Google

MODERATOR
George Fairbanks
Software Engineer, Google

Registration

Software Design and Development: Software Engineering & SDLC Phases (free course for ACM members)
Software Engineering as a Concept (free video for ACM members)
Computer Systems and Software Engineering: Concepts, Methodologies, Tools, and Applications, Volume I (free book for ACM members)
Software Quality Assurance (free book for ACM members)
Software Engineering in the Era of Cloud Computing (free book for ACM members)
Software Mistakes and Tradeoffs: How to make good programming decisions (free book for ACM members)

Q: What are actually the biggest 10 top refactoring projects? Which companies would do that?
A: It’s hard to get precise numbers, and sometimes hard to precisely describe what is and is not a refactoring project. Is forking and upgrading the Windows codebase from version to version refactoring or a new development? I suspect a fair chunk of that starts as refactoring and then changes gears. But the primary reason I make claims like this is that Google’s monorepo is the largest known. We’re better than most at refactoring tools, and many of the biggest projects in my career have been the largest refactorings that have been attempted in Google’s monorepo.

Q: What is OSS Fuzz?
A: GitHub - google/oss-fuzz: OSS-Fuzz - continuous fuzzing for open source software.

Q: Do you agree with the statement that software is an asset and code is a liability? (at least for some definitions of the two terms)
A: Absolutely. I often say that the job of an aeronautical engineer is not to use more aluminum or titanium, it’s to build machines that get people safely from place to place at a reasonable cost. Using the common raw materials is necessary, but should be minimized. Similarly, the job of a software engineer is to solve problems, not to write code.

Q: How is “make software under development available to users” aligned against “never show a fool an unfinished job”?
A: I wouldn’t say that we should be giving users access to unfinished features. Rather, focus on small additional value when possible. When that isn’t possible, build the partial features into your binary but don’t enable those incomplete pieces for your users - hide it in configuration flags, etc. This ensures that there’s still one version and that you can continue working on and deploying from trunk. But the point about providing value stands: if you’ve got 10 half-finished features, your software isn’t valuable. If you’ve got 4 completely finished features that may mathematically seem like less productivity, but it’s more value.

Q: If there is no central technical authority, then don’t you think system architect will always be busy in answering the queries and questions from all technical champions of the company.
A: In my experience, the ecosystems where the most senior people are answering questions are the ones that are most productive. If you’ve got the choice between doing some very hard, very important work yourself, or making the 100 people around you a little bit better … I’m pretty sure in the long run the company is going to be more successful with the latter.

Q: Often times in the computational physics industry, end-to-end tests take hours. What would you do differently if you had ~100s of such tests that are part of your quality process?
A: Rely more on smaller tests, catch the bugs earlier if possible. That sounds glib, I know. But in practice I think we have to get out of the habit of thinking of all tests as independent (and valuable) boolean results - some tests aren’t really detecting anything, some tests are flaky, some are just slowing you down. Go through the test results for the last year and look at all of the failures. Classify those: could it have been caught with a smaller/cheaper/earlier test? Was it actually indicative of a real failure? If you’ve got tests that add hours to your release process and never catch anything, delete the test. If you’ve got tests that take hours and only catch bugs that could often also be caught in a fast presubmit test, invest the effort to write those presubmit tests. Even if you can only catch a fraction of them ahead of time, that’s valuable - the smaller tests are easier to maintain and easier to react to when they fail.

Everything from commit to deploy should be mechanical. There’s no creativity, it should be largely automated, there should be few surprises. If you want high quality and good velocity, look at every defect that is caught late in that process as a process problem - could it have been caught earlier and cheaper? If there’s no history of defects of that type, fix it and move on. But if you keep finding defects of a similar form, invest in process to catch that earlier.

Q: Don’t we not only need to watch (1) the number of defects, but also (2) how often a given defect is manifested in use and (3) how severe are the consequences of manifesting a given defect?
A: Intuitively, yes. But in practice I don’t think we can ever agree on how to compare such things numerically (see the arguments raised in the talk), and so in practice the best we’re going to do is to measure time and people (engineers) affected.

Q: Why is security not part of this process? DevSecOps?
A: Security was the original impetus for the term “shift left.” And if you’ve got defect-detection in place for common security problems, everything I said in the talk applies directly. I certainly believe that security needs to be addressed as early in the workflow as possible.

A professor of mine once said, “AI is the set of problems that we don’t have easy solutions for yet.” Security is sort of the same: if it’s a known problem and we’re testing for it, then monitoring and testing and the standard workflow activities I described cover it. If it’s a novel problem (or we aren’t up-to-date and are ignoring it) then we’re not going to spot it until something goes wrong in production.

Q: As an educator where do we find resources to incorporate secure coding and defect free software development process for beginners.
A: Stop teaching C and C++??

Q: I have a question related to your point about culture. Quality expectations are much lower on most software teams than on most teams that create and maintain physical products, and IMO this is mainly caused by two factors that are essentially cultural: 1) a “we’ll fix it later” attitude that is typically due to a misunderstanding of what Agile is about, 2) a belief that software is inherently of such great complexity that defects are (and always will be) unavoidable. What are your thoughts about both of these?
A: I strongly agree. If the model I presented here catches on, I would think those attitudes would drop: defects that make it to users are expensive to the bottom line, and it’s vastly cheaper/less risky to catch it as part of the workflow. I think that both of those points speak directly to #1. And for #2, I would point out that friction and entropy are unavoidable as well. That doesn’t justify failing to put oil in your car. Just because it’s unavoidable doesn’t mean it can’t be minimized and mitigated.

Q: What’s an integration test?
A: Generally speaking we want tests to be hermetic, fast, reliable, etc. That tends to push us toward somewhat fake tests that invoke our APIs in small (single-process) ways. However, that style of testing fails to let us test the seams of the system: you can know that the database works, and you can know that your client appears to generate proper database queries, but it’s always possible that the link between those two is buggy. Thus we tend to add “integration tests” - tests that bring up multiple processes and that lose some (or all) of the hermetic, fast, reliable requirements. These tend to be more realistic, but much more expensive, more flaky, and harder to diagnose when something goes wrong. So it’s a tradeoff between cost and fidelity, as we see so often.

Q: What are your thoughts on requirements defects, and requirements management in general?
The process illustrated began with Design and not Requirements (Elicitation, Analysis, Verification). Should Requirements be considered in assessing the aggregate risk mitigation or is this step assumed?
A: Google parlance tends to view Requirements gathering as a part of the Design phase. Knowing what you are building and why is part of designing how to build it. We tend to view it as a less formal thing than the literature often suggests.

Q: There are measures of code complexity (cyclomatic complexity, lines of code, essential complexity) that can be used before integration testing. This may cut down the number of unnecessary integration tests and focus on the ones that matter, right?
A: I don’t think such measures are particularly telling about defect rates, at most they are suggestive of where the design may need work. I think we’d need to focus on the history of test failures from the integration tests to decide which ones are valuable - are they catching actual defects? Could those defects be caught earlier with a better unit test approach?

Q: Why is code review cheaper than having a human from the QA team take a look?
A: Why would a human that isn’t actively involved in writing that code be better at evaluating it? There’s pretty extensive literature on pair programming and code review practices and it’s fairly consistent - those practices are good for education, communication, and quality. I’ve never seen any research or credible argument why offloading that to a separate group would be valuable.

Q: What are your thoughts on model checking and formal verification in the software workflow process?
A: There are a few rare cases where it pays off. Even the formal methods experts closest to me view it as being a very high cost approach with limited applicability. When it’s the right answer (concurrency and lock-free code, for instance), it can be a great boon, but it doesn’t come up often.

Q: You estimate the cost of a defect as time * people. You mentioned other costs like resource cost and latency cost. Did you not include them because they are mostly negilgible? If you were to include those costs, would the cost be more difficult to estimate? How do you estimate latency cost in terms of dollars?
A: If you’re fully working out the math implied by the talk, you’d need to look at costs and benefits over a given period of time.

Resource costs are relatively easy to estimate or measure: look at the TCO of the hardware and software that is being used for development and testing, do the amortization and per-unit-time cost over the period being evaluated.

Latency is trickier, in that it is changing the onset time for new value. If you only release quarterly, it doesn’t matter how many features you get done, users aren’t going to pay for those until the quarter after they are available. If you release daily, that can be almost a full quarter more value added.

Q: How often do best practices turn out to be passing fads?
A: Relatively rarely, I think. What is far more common is taking the words of new practices and corrupting them. “Certified Agile” or “Certified Scrum Master” are pretty clear evidence of corruptions of the original Agile or Scrum ideas.

Q: Fast release cycles doesn’t obviously apply to single delivery contracts. Where is the value found by adopting the practice anyway?
A: I disagree - the point of all of this is to have the capability of delivering quickly. Even in a single-delivery contract there are likely many individual features that need to be developed. The workflow and perspective presented in the talk applies to reducing the cost of developing each of those features and keeping them working up until the point that you do the final delivery.