Sunday, November 21, 2010

Designing For Change - Part 3 of 3 (Applying the GoF Advices)

In part2, we discussed about the approaches to design suggested by GoF and what they actually mean when applied to the problem and how they help in achieving code qualities. If you have not gone through the last blog, then please read that first as in this blog we will work through a design problem and see how we can use advice given by GoF to solve it.

Let us consider a file transfer program which sends stream of data over network. It has a 'Sender' which works in following steps:
  1. Compress the data using LZW (Lempel-Ziv-Welch) algorithm
  2. Encrypt the data using  AES (Advanced Encryption Standard)
  3. Transmit the data using TCP (Transmission Control Protocol)


Now, say we have a requirement that we need to switch the transmit method between TCP and UDP depending on some user input. So as an obvious solution we go ahead and create a new class, say, 'SenderUDP' which inherits from 'Sender' class and overrides the 'Transmit()' method, and this solves the problem.


Now, say, we have got a new requirement that the 'Sender' should also be able to work with multiple compression algorithms (for e.g. RLE, SFC, Huffman Coding etc) and encryption algoritms (for e.g. DES, Tripple DES, Serpent etc). Proceeding in previous manner, for compressing with RLE and transmitting with  UDP, we create a new class, say, 'SenderUDP_RLE' which inherits from 'SenderUDP' and overrides 'Compress()' method. For compressing with RLE and transmitting with TCP, we create a new class, say, 'SenderRLE' which inherits from class 'Sender' and overrides 'Compress()' method, and so on and so forth. Continuing this way, we will have a class hierarchy which would look like as shown in picture below.


So "Is there any problem with this approach?". Looking carefully at the hierarchy, we can see:
  • There is lot of code duplication (Redundancy). For, example classes 'SenderUDP_RLE' and 'SenderRLE' both uses RLE for compression but they have their own local copy of code for this.
  • Class explosion. For each new variation, the number of class is increasing multiplicative. If we have 'p' types of compression, 'q' types of encryption and  'r' ways of transmitting data, we will end up having p*q*r classes.
  • Weak cohesion and tight coupling. Each of the classes are responsible for doing three unrelated tasks - compression, encryption and transmission. Code for all these tasks have easy access to state of 'Sender' but at the cost of tight coupling.
Now let us try to apply the approaches to design as suggested by GoF. 
  • Consider what should be variable in your design and 'encapsulate the concept that varies'
  • Design to interfaces, not to implementations
  • Favor object composition over class inheritance

There is no specific order in which these approaches should be applied. They kind of work in parallel. Let us start with first advice: Consider what should be variable in your design and 'encapsulate the concept that varies'. Let us analyze the 'Sender' class which is responsible for compressing, encrypting and transmitting the data. we can see that we have different ways(algorithms) in which we can compress, encrypt and transmit data and in future we may need to use new algorithms to perform these tasks. Hence, we would like our design to be able to accommodate new ways of compressing/encrypting/transmitting data with least possible TCO(Total Cost of Operation). In order to achieve this, we identify that, Compression, Encryption and Transmission, are the three concepts that should be variable in our design.


To encapsulate the variation in these three concepts, we need to define three conceptual entities. Remember, GoF said Design to Interfaces, not to Implementations. So let us define three interfaces to represent these concepts:
  • ITransmit containing Transmit() method for transmitting data
  • ICompress containing Compress() method for compressing data
  • IEncrypt containing Encrypt() method for encrypting data

Since the 'Sender' class is the one that will be marshaling these operations, we need to define relationships between the 'Sender' class and the above interfaces(concepts) we defined. Remember, GoF said Favor object composition over class inheritance. So let us make the 'Sender' class have a reference to an instance of each of these concepts (or interfaces).


Finally, we implement these interfaces in concrete classes with each concrete class responsible for one particular algorithm. The final design will look like as shown in the picture below:



Let us analyze some of the main benefits that this design brings us:
  • The 'Sender' class is now decoupled from the fact which particular algorithm(or concrete class) is used for compression/encryption/transmission. It only knows that the object whose reference it has will do the work it. Hence, we have loose coupling. 
  • For each of the concepts(interfaces) we defined, we can now have as many variation as needed, with each variation implemented in a separate concrete class, without the 'Sender' class being affected even slightly. Hence, the design can now easily accommodate any new way of compressing, encrypting or transmitting data.
  • Also, for 'p' types of compression, 'q' types of encryption and  'r' ways of transmitting data, we will end up having only (p + q + r) classes as opposed to p*q*r class previously. Hence, problem of class explosion is solved.
  • No redundancy. Each piece of code doing a particular thing occurs only once in design.
  • Testability is highly improved. The 'Sender' class and each of the algorithm for compression, encryption and transmission can now be tested in isolation.
  • The 'Sender' can switch between different algorithms for compression, encryption and transmission at run-time, without re-instantiating the full 'Sender' class.

If you have studied Design Patterns before, you will see that we have just came up with a design pattern called "The Strategy Pattern". Now you see how does the advices given by GoF helps the design to evolve step by step. Following are few basic points we should keep in mind about design patterns:
  • Design Patterns are examples. Each pattern is an example of a design that follows this general advice by GoF well, in a given context. Each pattern is manifestation of these advices and code qualities playing in particular context. 
  • Design patterns are discovered, not invented. They are often what you would do, if you thought of it, to solve the same problem. 
  • Studying design patterns is a good way to study good design, and how it plays out under various circumstances. 
  • Even if you do not see any cataloged design pattern in your design, you should always follow the advice by GoF as these advices are key forces that eventually lead to a good design.

Friday, November 12, 2010

Designing For Change - Part 2 of 3 (Approaches to Design)

In part1, we discussed about the problem of user requirements and code qualities which are first step towards a good design. In this blog, we will be discussing about approaches to design. If you have not gone through the last blog, then please read that first as this blog builds on top the discussions in part1.

Approaches to Design
You might have heard about the famous book on software design patterns "Design Patterns: Elements of Reusable Object-Oriented Software" by Erich Gamma, Richard Helm. Ralph Johnson and John Vlissides often referred to as GoF (Gang of Four). This was the first book on design patterns in software engineering, published in 1994. The book was essentially a catalog of best practice design patterns. The first two chapters basically treat us on object oriented programming(OOP) design and paradigm as they understood it, which is actually different from the OOP design that we learnt first time during our college days when we got our hands on with OOP languages like C++ and Java. However since this was primarily an academic book (it was in fact Eric Gamma’s phd thesis), a lot of people have hard time in extracting this advice from those first two chapters. These advices are difficult sometimes for people who had grown up with previous OOP concept, to understand what exactly they mean and how exactly it is supposed to help us. Basically, GoF said three things:
  • Design to interfaces, not to implementations
  • Favor object composition over class inheritance
  • Consider what should be variable in your design and 'encapsulate the concept that varies'

Design to interfaces
As we read 'Design to interfaces', we might think all we can design to is interfaces and public method of the entities in my design. But this is not what GoF actually meant. They were actually trying to talk about what is revealed and what is hidden and how decisions are made in design. What they meant is an entity should be used with complete disregard to its implementation. We should design classes and method signatures purely from the perspective of consuming entities.

To understand this let us consider an example, say you are sitting with your friends. You go to your friend 'A' and ask him "Tell me your driving license number?". 'A' takes out his wallet of his pocket, removes the driving license from it and read the license number. Then you go to friend 'B' and ask the same question and he calls back his home and gets the driving license number. Then you go to friend 'C' and he tells you the license number from memory and so on. The point to be noted here is that, you asked each person same question in same way to get the same result, but with different people getting it in different ways. However, if you would have asked "Take out your wallet, remove driving license and tell me your driving license number". Then this would have worked fine with friend 'A' but not with 'C'. The reason being, in this case you have been 'implementation specific' in using the person and hence tightly coupled with the 'implementation' of getting driving license number. So the lesson is, design to outward behavior(interfaces) and not to specifics of inner implementation.

Also GoF said, to hide the implementation of your entities and for a good reason, so that you have the freedom to change the implementation later on. This improves coupling as we cannot couple to what is hidden(implementation) and relation will exist only between public interfaces.

As we saw in part1, Programming by Intention is a systematized way of designing to interfaces, at the method level. Now, let us analyze the relationships between classes in a design. Let us consider two abstractions, AbstractionA and AbstractionB as shown in figure below. Impl1 to Impl3 are implementations/derived classes of AbstractionA and Impl4 & Imple5 are implementations/derived classes of AbstractionB.


Now if these two issues(or whatever they are in the design) need to have relationship in between, then we might be tempted to create a relationship between the implementations (or concrete classes). But the GoF is saying if you can get away with it, try not to do that. Try to keep relation up high between the abstractions. There are multiple benefits of doing this. The obvious one is, keeping relationship high between the abstractions, results in fewer relationships(one to one or one to many) as opposed to n*m (3*2 in this case) relationships between implementations.

So summarizing the things, two points should be kept in mind.
  • First, when considering the signature of a method (its name, parameter list, and return type), make this decision based upon the need of the entity(ies) that will consume it, not from its implementation details. 
  • Second, when creating relationships between entities (typically classes), try to do this between abstract types, not concrete types, whenever possible.
 
Favor object composition over class inheritance
The next thing that GoF said is favor composition over inheritance. Before we delve deeper into it, lets first see what these things are. Let us take an example here of a network socket which communicates stream of data over the network but uses two different ways of compressing data before sending. So using inheritance what we have is, a base class 'socket' and two derived classes 'cmp1' and 'cmp2' with different ways of compressing data.


The class 'socket' will have all the base behavior of socket with probably a default behavior for compressing data or it may be an abstract class. The derived class will have the compression behavior overridden one way in 'cmp1' and another way in 'cmp2'. This is called class inheritance, which is done at design time. Another way of doing the same thing would be to use contained polymorphism. That is to pull out the compressing idea out into a base class and create two versions of it and use it through delegation at run time.

What GoF were really saying was that this kind of inheritance where the socket is inherited into different version is actually inheritance for specialization. This distinction is critical to understand what GoF are actually talking about because if you will read the book of GoF, you will find that inheritance has been used all over the book. Seems like they are not following their own advice :).

Remember that they said “Favor” not that don’t use inheritance at all. In this case here we are using inheritance to specialize something real (socket). What GoF are saying is don't use inheritance for that, at least most of the time.  Its better to use composition. However, you may think we are using inheritance in case of 'composition' also. But point to be noted here is that we are not specializing something real here. The 'compression' idea is just a concept, not a real thing. Specific compression are real but generic idea of compression is not anything that you can create an instance of, because it is just a concept. This is inheritance for categorization. Like dogs and cats are animals. Animals off-course are not real things but dogs and cats are. So GoF are saying that use inheritance for categorization and use composition to handle variation at run time rather than through inheritance at design time.

Looking at the two cases
Specialization:
  • As we can see, in case of specialization, we have one less class than we have in case of composition, as we need one extra class to represent the conceptual entity (compression) for delegation to work. We often think having fewer number of classes in our design is good. But literally that’s not the case. Otherwise best designs in the world would have had single class only :). Adding classes is not necessarily a bad things as long as it gets you something.
  • Also, since the compression code is part of socket so it has easy access to state of socket. But it also means that it is completely coupled with the socket code. The problem with that is one class is having two different responsibilities(weakly cohesive), compressing data and transmitting. Hence, the compression code with a bug can harm the transmitting code and vice-versa or it can create side effects which is dangerous.
  • Also inheritance for specialization works well only if nothing else in socket varies and as more things start varying it will create redundancy and tight coupling which we do not want. Also, choosing specialization here would mean that we are predicting that nothing else in future will vary and we know we are not good at 'predicting' business, hence by choosing inheritance for specialization we will not be setting ourselves for a design that is accommodating to changes in future.
  • Another point to note is, this works well only if socket doesn’t have to switch between compression methods at run-time. In case we need to change the compression method at run time, say based on some user input, then we will have to re-instantiate the socket class and dispose the old one and along with that we will also need to restore the state in which the previous socket object was.
Composition:
  • Composition does adds an extra class, but it improves cohesion. Now each class has only one responsibility (strong cohesion). Socket class is about transmitting only and each compression class is about one particular compression technique only.
  • Also, we can vary socket (different version of it) without changing the compression object. This considerably increases the testability of design. We can test the socket and the compression in completely isolated environment. Unit test that passes for compression today will pass tomorrow also irrespective of changes in transmitting code and vice-versa.
  • Also with delegation we can handle different compression types in one run-time session without affecting the state of socket instance (by using setters on socket class). It allows us to defer the decisions until run-time.
So what GoF is saying is, that, we should lean towards delegation rather than inheritance. If we have one way or the other, don’t choose inheritance (for the reasons we just covered). Specializing functions with inheritance is a short path to problems. Inheritance for specialization does work but does not scale up. Till one thing vary and there is no requirement for dynamism, its fine, but as more things start to vary, the design starts to fall apart. But with composition approach, you can keep doing it again and again and again. There might be other behaviors with socket that might vary in future, so we can pull out those variations. And actually this improves the things by making socket simpler and simpler. The point that should be kept in mind is our focus is not just on what the design will do for us today but also where it is leading us to and how easily it can accommodate changes that we cannot foresee now.

Encapsulate the variation
Whenever we talk of encapsulation, we essentially think of hiding the data. But actually encapsulation is any kind of hiding at all and whenever we do it, it always help us, as hiding anythings gives us the freedom to change it later. So what GoF said:
  • Identify the varying behavior or consider what should be variable in your design.
  • Define an entity that encapsulates this variation conceptually.
This is often interpreted as a "Design Up-Front" point of view, because of the notion that certain things "should be variable".  In fact, given that the book was published in 90's, this may be the case.  However, in lean-agile software development, we can still follow this advice in a new context: something should vary when we have a requirement, based on business value, for it to do so.  The critical aspect here is, that such variations should be encapsulated.  Also, we must acknowledge that this refers to any variation, not simply varying behavior.  For e.g. it could be varying relationships, cardinality, sequence, construction, dependencies, structure etc. All these things can vary, and when they do, all these variations should be encapsulated.  In a sense, every design pattern encapsulates a different varying thing, or set of things, conceptually.

Basically, taken all together what is being said is we want to treat the things which are conceptually similar as if they are exactly the same thing. This allows us to deal with things purely at abstract level.


In the figure above, client here is not even coupled to fact that 'AbstractionA' here is an abstraction, from its point of view it might be a simple single concrete class. We want to have that kind of relationship even when we know we have variation here. Because we want to be able to add another variation by adding a new implementation rather than modifying the existing implementation. This is called Open closed principle – open for extension closed for modification.

How GoF Advice Promotes Quality
Design to Interfaces:
  • Helps eliminate redundant relationships
  • Avoid subclass coupling
Favor aggregation over inheritance:
  • Promotes strong cohesion
  • Helps eliminate redundancies
  • Makes designs more dynamic
Encapsulate Variation:
  • Promotes encapsulation
  • Decouples client objects from the services they use
  • Leads to component-based architecture

Coming Up Next
In part3 i will
take up a design problem and demonstrate how to apply these approaches to design while trying to work out the final design. Till then stay tuned :).

Monday, November 8, 2010

Designing For Change - Part 1 of 3 (Code Qualities)

Software Development is a vast field which has evolved enormously in last few decades in terms of complexity of software and its fields of application. As the business needs of users is changing and becoming more and more complex with time so is the complexity of software. Hence a great deal of emphasis is being paid now to engineering processes that are followed by product development teams throughout the product cycle. Another matter of fact closely related to this is that requirements given by customers(users) keeps on changing and with the fact, that a software product is shipped in incremental-versions built on the top of previous version, one of the challenges that software developers face is - How to design for changing requirements?

The Problem of Requirements

If we ask experienced software developers, with certain level of expertise, what they know to be true about requirements they get from customers. The most frequent answers we will get to hear are:
  • Requirements are incomplete and usually wrong.
  • Requirements and users are often misleading.
  • Requirements do not tell the complete story.
As Alan Shalloway mentions in his book 'Design Pattern Explained', one of the things that we will never get to hear is "Not only were our requirements complete, clear and understandable, but they laid out all of the functionality we were going to need for the next five years!"

So the bottom line is that requirements always change and for very simple set of reasons:
  • As users interact and discuss with developers and see new possibilities for the software, their view of their requirements change.
  • As the developers become more familiar with the software, their understanding or user's problem domain changes.
  • The environment in which the software is being developed changes. Fifteen years back the trend was towards developing desktop applications but with advent of internet (and its associated benefits) the shift is now toward web based application to the extent of trying to get an online operating system.
With that said, it does not means that we can give up on collecting good and relevant requirements. Rather than complaining about changing requirements, we should change our development process to address it more effectively. We can design our code in such a way that impact of changing requirement is minimum. The code can evolve and new features can be bolted on while incurring least possible TCO (total cost of operation).

The first step towards a good design is coding practices that ensure high code qualities.

Code Qualities & Coding Practices

First Principles of Coding Rules
As a developer, following factors should be kept in mind while coding down the software:
  • Objects must be testable.
  • There must be no redundancy.
  • Objects should be understandable.
  • Objects should not be excessively entangled with each other.
All of these principles maps to particular code quality.

Cohesion
Cohesion refers to the fact that how closely methods in a class or operations within a method are related to each other. A very good definition for Cohesion has been given by Steve McConnell, in his book 'Code Complete: A Practical Handbook of Software Construction' (Redmond: Microsoft Press, 1993, p. 81), as follows:

"Cohesion refers to how 'closely the operations in a routine are related.' I have heard other people refer to cohesion as clarity because the more that operations are related in a routine (or a class), the easier it is to understand the code and what it is intended to do."

Weakly cohesive classes and methods are those that do many unrelated tasks. The code often appears to be a confused mass.

Strong cohesion is related to clarity and understanding. A strongly cohesive class is one that has single responsibility and everything in it is about fulfilling it. Commonality-Variability analysis can help in determining this. And a strongly cohesive method is about fulfilling one functional aspect of that responsibility. Programming by intention can help us to code with better method cohesion.

I will briefly describe what Commonality-Variability analysis and Programming by intention mean. These topics need separate blog to do justice with them. I will be writing about them in a separate blog. In a nutshell, Commonality-Variability analysis refers to process where we have a set of given entities. We examine each of the entities and try to find out what is common among them. There are different methods for figuring out commonalities between a given set of entities. These commonalities become our abstract class and interfaces. Once we are done with finding commonalities, we look for variations within these commonalities. These variation then become the concrete implementations of commonalities it belongs to.

The term Programming by intention was coined by the XP (Xtreme Programming) group. The idea behind it similar to what is called top-down programming. The best way to understand Programming by intention would be to work through an example.

public void PrintReport (string custID) { //..... }

Say I want to code up a routine to print report for all employees who handled a given customer. As I start, first thing I will need to do to is to get all the employees from the database. Now, what I do is, I pretend that I already have a method called 'GetEmployees()'.

public void PrintReport (string custID)
{
Employee[] emps = GetEmployees(custID);
//....

}

Note that this method does not really exist, so in effect what I am doing here is making a prediction as to how I would like that method to work. Next, I am thinking what parameters I would pass it. Well, it should be cutomerID and it will return an array of employee. The point to be noted here is that I cannot possibly be influenced by the way 'GetEmployee()' is implemented because actually it is not.

So the message is, what we need is influence in the right direction when figuring out the interface of a class or method. Normally, what we tend to do is, code up the routine first, and then think what kind of interface could it have. Usually this produces interface that are not very stable as implementation changes over time. However if we think of the interface from the perspective of purely how it will be used, that is a more stable perspective as use does not change as much as implementation does.

public void printReport (string custID)
{
Employee[] emps = GetEmployees(custID);
If(SortReq(emps)) SortEmployees(emps);
PrintHeader(custID);

PrintFormattedEmployees(emps);

PrintFooter(custID);

Paginate();

}

So proceeding this way, we pretend that methods 'SortEmployee(...)', 'PrintHeader(...)' etc already exist. Now all these methods that I pretended to be present will off course be private methods on this object. That way I have made the object easy to use by breaking it into nice cookies of small pieces. We call the bigger method here as 'sergeant' method, which basically calls other private methods and marshaling them. It may have bits and pieces of code here and there. Each of these private methods is about one particular thing. This promotes cohesion and makes sergeant method read like comments. Just one pass through the method will tell both the logical flow of method and what it is trying to do. And note that there is no extra work involved here. We are writing same amount code (that we would have written otherwise) but we have higher degree of cohesion here.

Coupling
Steve McConnell defines Coupling, in his book 'Code Complete: A Practical Handbook of Software Construction' (Redmond: Microsoft Press, 1993, p. 81), as follows:

"Coupling refers to 'the strength of a connection between two routines [or classes]. Coupling is a complement to cohesion. Cohesion describes how strongly the internal contents of a routine [or classes] are related to each other. Coupling describes how strongly a routine [or class] is related to other routines [or classes]. The goal is to create routines [and classes] with internal integrity (strong cohesion) and small, direct, visible, and flexible relations to other routines (loose coupling).'

We can broadly classify Coupling into four types:
  • Identity: One type coupled to the fact that another type exists.
  • Representational: One type coupled to the interface of another.
  • Inheritance: Subtypes are coupled to their super class, in that any change in the super class will propagate downward.
  • Subclass: One type coupled to the fact that there are subclasses to other type (polymorphic), and what specific subclasses exist.

No Redundancy
No Redundancy basically means "One rule in one place" i.e. a piece of code doing one particular thing should occur once and only once throughout the product code. Redundancy is not just redundant functions or redundant state. It can also be redundant relationship between classes/interfaces, design etc.

Anything that is redundant will cause maintenance problems and wastes developer's time and effort during bug fixing as any fix will be required to be repeated as many times as the code is duplicated.

Avoiding Redundancy
  • “Copy and paste” is a common source of redundancy.
  • Before doing copy and paste always ask yourself - What is the code I want to reuse? Can I make a new method that can be used by both the current method and the new place where I need this? If two entities need this, how likely will there be a third, fourth, etc… May be I need a service anyway.
  • However, “Cut and paste” is okay :)

Testability
Testability of an object refers to fact how easily and cleanly an object can be tested. Testability of an object is very critical as it directly determines the ability of test code to test the product and hence it affect the overall quality of final product. Testability directly depends on Cohesion, Coupling and Redundancy. Testing is hard and expensive when code is:
  • Tightly Coupled – “I cannot test this without instantiating half the system.”
  • Weakly Cohesive – “This class does so many things, the test will be enormous and complex!”
  • Redundant – “I will have to test this in multiple places to ensure it works everywhere.”
Unit testing is a good thing. As software developers, whether we agree or not, while writing down the product code we should always ask ourselves “If I were to test this, how would I do it?” If we find the design would be very hard to test, ask ourselves “Why isn’t this more testable?” Thinking this way is very powerful and produces results.

Standards for Good Code
Summarizing what we discussed till now, standards of good code in order of importance is:
  • Be testable.
  • Contains no redundancy (once and only once)
    • Deal with each bug, change or extension once
    • Need to look fewer places to figure out an issue
  • Loose coupling.
    • No unexpected side effects to search for
    • Program flow is more logical, predictable
  • Strong cohesion
    • Cohesive classes and methods do one thing
    • Finding bugs is easier
  • Clarity, Confidence, Maintainability.
  • Make changes with greater confidence.

Coming Up Next
In part2 i will discussing in detail 'Approaches to design' and how does code qualities we discussed here tie up with them.


Click here to go to part2.