One of the final talks I attended at CUSEC 2010 in Montreal, was probably the most important, and unfortunately – (seemingly) the most underrated. Daniel Berry from University of Waterloo Software Engineering gave a talk entitled “Ambiguous Natural Language in Requirements Engineering”. Unfortunately he was speaking to an audience in the Agile area who have been exposed to culture stating that Waterfall Software Process is dead. More inaccurately, that requirements gathering is a dead science. This talk was not only enlightening, but also made me rethink many topics I haven’t thought about since University.
In summary for those who don’t want the verbose version: The process of transforming the ‘idea’ (as a requirement) and turning it into ‘code’ is a problem that extends well beyond just the ‘code’ part. Many software errors (%5-%10) are a result of ambiguous requirements – Requirements that neither party in the process even knew were ambiguous. This talk was a branch of Software Engineering shining at its best – and Berry offered solutions!
First to address those developers still pretending this problem doesn’t apply to them: Not all software projects involve 1-2 developers. Not all projects allow you (the developer) to talk directly to a client. Some projects will have you writing code based on requirements written by someone else in your company. This problem exists whether you want to believe it or not.
Requirements gathering is such an important aspect of software development I could not believe that others were not as excited as I was. I cannot fault the audience too much, many present at the conference were young developers still in school, who have little to no industry experience. The concept of a formal Requirements Specification or a Functional Requirements document is a topic that makes most modern day agile (web) developers laugh. This is unfortunate. Regardless of the process, regardless of the development cycle, at some point, high level ideas most be translated into computer code. I have written User Stories, Tasks, Use Cases, Flows, etc all which conveyed the ‘idea’ into something one could ‘code’. This is where I sat up in my chair and really tuned in. Berry continued….
He pointed out that regardless of if the requirements are done early by business development types in the field, or later on by programmers in the lab – Some human being must translate these ideas into writing/english/code/binaries. I could not see it any more clear: this is where many errors occur! The two options of formalizing ideas are: Natural Language (Plain readable English), or Formal Language (Math/Equations). Berry noted that UML is not a Formal language.
With each option we see inherent problems. Formal language is generally not written by business types, however once it is fully written, it (technically) cannot be wrong. It will truly capture the needs and requirements of the system. The problem of course is that clients/stakeholders do not understand formal logic, and it is hard to verify that the correct problem is actually being modeled. If the software project called for creating an application for the Math Department, this might not be a problem (Berry noted, most people in the world aren’t that crazy – audience laughed, presentation moved on).
Natural language on the other hand is easy to understand, and easy to agree upon between the stakeholders – business team – programmers. Most requirements are written in NL. But the parties involved will be rarely agreeing upon the same thing. Berry notes the phenomenon known as “subconscious disambiguation”. People will interpret things differently, without even realizing that there is an alternate way of seeing it. Which means that a programmer may interpret a requirement in a way that is completely different than the stakeholder, yet both agree that the written requirement is mutually understood.
The problem seemed so clear an obvious, but he continued to an example. Suppose we are hired to create the student account CMS application (the current incumbent at UofT is ROSI which I believe is by SAP, this piece of software crap needs to restarted daily and the website has an hours of operations sign). Take this requirement:
Students enroll in 8 of courses per semester.
Students enroll in thousands of courses per semester.
As Berry noted, that both are syntactically identical and correct. Yet both mean completely different things. Turning these domain requirements into functional specs, what do we do? Do we create a table that allows students to have 8 columns per semester for foreign ID’s to their courses? Do we prepare a table to support up to 1,000 different unique courses? Juxtaposed, there seems to be nothing ambiguous about this statement. But as a former student, I understand the problem domain. If these requirements were separated by hundreds of other requirements, maybe I wouldn’t catch this so easily.
Lets switch to a (possibly) less familiar domain:
The 8 onboard ATMEGA48 chips can take 5.5 V before shorting.
programmer 1:
// Increase voltage to chips that can take more for(i = 0; i < numchips; i++) { if(chips[i].getVolts() + threshold < maxVolts ) { chips[i].increaseVolts(threshold); } }
programmer 2:
// Increase voltage if chips can take more voltSum = 0; for(i = 0; i < numchips; i++) { voltSum += chips[i].getVolts(); } if(voltSum < maxVolts) { Controller.increaseVolts(chips); }
Oooops, someone just blew the embedded chips and our navigation system just died. Who do you fire? Both programmers were correct. (Side note, lets hope that NASA doesn’t use such crappy code or ambiguous requirements *see mars lander ). This was an exadurated case, but the deeper we go into an unknown domain, the worse it gets. In my opinion, this point becomes more important the more foreign the domain. Since the requirements gathering with the clients, and the coding to the spec take place between 3 parties (BA, Stakeholder, Programmer) there is a possibility of ambiguity between 3 groups or more. Berry insists that Plurals are dangerous and should never be used.
Each on board ATMEGA48 chip can take 5.5 V before shorting
Collectively, the group of 8 ATMEGA48 chips can take 5.5 V before shorting.
In addition to the ambiguous plural, we should always be aware of ambiguous “only”, “also”, and a few other terms that really make it difficult to understand the true meaning of a sentence. One comment that Berry made that made me happy was that he hopes now that we are aware of such issues (ambiguous requirements), he hopes that we wake up in the middle of the night like he does, worrying about ambiguously placed plurals and only statements.
I love that. Berry clearly has a passion for Requirements Engineering. I feel like everyone should wake up in the middle of the night worrying about something they just programmed/wrote/designed/planned. It reminds us that we actually care about what we are doing. When he made that statement, everyone in the room laughed. But hell, if we aren’t doing something that makes its way into our subconscious, maybe we aren’t trying hard enough or maybe we aren’t doing the right thing.
Why was this talk so important for me? Because these are trivial errors that make their way into our development cycle. They can be fixed prevented, and awareness of the subconscious disambiguation is a good start. I read requirements and I write them too. I have written ambiguous things in the past which come back to haunt me later on. I don’t advocate the process of overly heavy requirements gathering distant from the dev cycle. I also do not like the ultra-documentation light version of development I see peppered around the industry today. Requirements aren’t ‘cool’? Too bad, you are being paid to write working software, that does what the customer wants. Find a middle ground and make sure when you capture the business needs you articulate your understanding well. Secondly make sure that you agree with your clients/stakeholders early and often. While an Agile process might spot these errors sooner, it is still time wasted if we write requirements that mean 10 different things. Let me end with a note which goes back to Software Engineering.
The famous 1:10:100, (sometimes seen as 1:10:100:1000) rule describes the ratio of the price to fix a bug during design/development/production stage. Why do we spend so much time arguing about and comparing programming languages when we are writing code that is doomed to be incorrect anyway. I feel like people are always so rushed to figure out how to solve a problem, and care more about code syntax than proper grammar. After all a misplaced comma, or ambiguous ‘every’ could end up costing you thousands.
Requirements -> Design -> Develop -> Test -> Verification -> Validation
Lets make it clear the first time; because no matter if you are a 6 months waterfall, or 3 week sprint, realizing you built the wrong thing always feels stupid – and that requirements stage is a lonnnnggg ways away from the validation stage.
Side note (since I seem to enjoy commenting on my own posts), I asked Berry during the lecture “In your experience, have you found a language less ambiguous than English that reduces software bugs?” He replied saying that he speaks 5+ languages all which have the same / similar inherent problems. The biggest point is that English is a domain constraint that we must learn to work with.
Well that sucks, because if you told me that writing requirements in German will eliminate all my requirements bugs, maybe I will learn German. Why are people so eager to switch from PHP -> Ruby, while the fundamental problem is probably the language of the coder.
Berry stopped me – “We don’t live in a perfect world, and right now, you will likely have to write your requirements in English”
Crap, maybe in the future we will be writing requirements in Esparanto… (lol)
At last! Smnoeoe who understands! Thanks for posting!
Some examples:
REQ1: Website Should support Multiple user ? (How Many users, 100….10,000)
REQ2: Website Should support multiple browser
(IE, Safari, FF- Versions ?, OS ? ) Is it required to test all supported browser on Mac, Linux)
– Ayan Nigam