Java SE application launcher KickStart published

Today, I took the time to publish an old project of mine after there were some requests to do so. The project is a small and lightweight launcher framework to deploy and start Java SE applications in a cross-platform manner.

The JAR Hell is well known and I run into some issues, when I had to deploy small console applications (file converters mainly) on Linux, Windows and SunOS. My goal on this former project was to deploy some 3rd party libraries like loggers and so forth together with some own applications on different platforms. The whole project was quite complex and the main libraries were already available. To prevent a duplicated development, the idea was to reuse the JARs and to deploy them on the different OS. The question was how…

I definitely did not want to unpack all JARs to add Classpath entries to the MANIFEST.MF files. I also did not want to shade everything together to have an easy reuse of 3rd party and own libraries. Additionally, I also wanted to exchange own libraries on the fly without unpacking the former deployed libraries. So the basic concept KickStart was born.

KickStart is a small executable jar and a properties file. These two are copied somewhere into the filesystem of the OS. Additionally, some directories created and configured in the properties file. After that, all libraries are copied into the locations needed and everything is ready to run with a simple:

java -jar kickstart-1.0.0.jar MainClass parameter1 parameter2 ...

KickStarts looks into the directory for the 3rd party libraries and into the own libraries directory and binds everything together internally into one classloader. Afterwards, the given MainClass is looked up and the static main method  is called with the provided parameters.

With this simple approach it is easy to have a collection of JARs which may contain multiple applications. A simple shell script hides the call to KickStart and can be added to the PATH environment variable.

With this approach, the redeployment of libraries is possible as long as the interfaces are stable. So during the development it is possible to exchange the libraries with newer, more optimized ones.

KickStart was published at PureSol Technologies Open Source Website at: http://opensource.puresol-technologies.com/kickstart

Advertisements

Common Anti-Patterns: String Usage for Number Manipulation

This is a short post about a topic which is frustrating to me from time to time. The topic is about number manipulation by utilizing string operations.

I do not know why, but from time to time I see code and talk to people who have some tasks to do with numbers and they do not have a better idea than printing them into a string or character stream and applying string manipulation functionality to get their needed results. For me, it is strange to think about something like that, because I think it is against common sense to solve number issue with strings. Mathematics is the way to do that.

I have to extreme examples here in this post which also caused major issues in the developed software. There were other examples, but the two were selected as examples.

Example 1: Rounding of Numbers

I have seen this twice in my career. Once implemented in C and once again in Java. Once it was needed to round to a plain integer and the other time with a give number of digits for a needed precision. In both cases the implementation was done the same way:

  1. The original floating point number was printed into a string.
  2. For the decimal point the position was looked up.
  3. The string was cut at the decimal point and number in front of the point was taken and saved into a new variable.
  4. For checking whether to round up or down, the digit after the point was checked to be ‘5’ or larger (by character comparison) to know whether to round up or down.
  5. The number in front of the point was converted back into an integer.
  6. If it was to be rounded up, 1 was added.

For the arbitrary precision case, the position of the decimal point was shifted by the number of digits related to the precision.

When I see something like that, I get frustrated. It is a waste of calculation power, because string operations are much more expensive than mathematical calculations and the procedure has several flaws:

  1. What happens if there is an exponential notion of the number? We need to cut at the ‘E’, too and take care of the number behind it.
  2. If localization is involved, what happens then? I live in Germany and we use a comma as decimal sign and the point is used for thousands.
  3. It is quite hard to assure correct behavior is arbitrary precision is to be applied.

The most simple solution for this issue is always to have a look for an already distributed function like Math.round in Java. A general procedure can be in C code:

double num = ...;
int n = (num < 0) ? (num - 0.5) : (num + 0.5)

It is assumed for that algorithm that a cast just truncates the digits right from the point. This rounding procedure is much fast than string manipulation.

To add a functionality for arbitrary precision rounding, it can be easiest achieved by multiplying the precision. For example assume the function:

double round(double d) {
    return (d < 0) ? (d - 0.5) : (d + 0.5)
}

To round num to an arbitrary precision we can do the following for two digits:

double num = ... ;
double rounded = round(num * 100.0) / 100.0;

We have rounded now to two digits.

Attention: I see the discussion coming already, but I do not take floating point number effects into account here. I know, that floating point numbers are not 100% correct due to internal representation, but this is not solved by string manipulation either as soon as the numbers are put back into doubles for example.

2. Example: Calculating Mantissa and Exponent

The task is here to extract the mantissa and the exponent from floating point number into a 4 byte signed integer for the mantissa and a signed byte for the exponent to send everything over the network wire to systems which might have different binary floating point representation, no floating point representation due to usage of fixed decimal values or no exponential representation in strings. This may happen in integration projects. The separation into integers is not a bad idea and the target system can calculate its internal representation as needed.

The procedure I have seen in C code was:

  1. Print the number into a string (via sprintf with “%e”).
  2. Cut the string into two pieces at the ‘e’ to separate mantissa and exponent.
  3. Convert the exponent into an integer assuming it fits into the byte.
  4. The mantissa string’s dot is removed.
  5. The length of the mantissa string is checked for length and shortened to 9 characters to later fit into the 4 byte integer.
  6. The mantissa is converted into the final 4 byte integer.
  7. The exponent is subtracted by the length of the mantissa string – 1 to get the correct exponent for the large mantissa.

There also some flaws except of the performance impact especially in applications which deal heavily with number which was/is the case in the observed situation:

  1. Again, localization: What happens in Germany with the dot?
  2. It is assumed that the exponential string representation has always one digit in front of the decimal point.
  3. The exponent may become greater than 127 or smaller than -128.

It is much better to do some math here. I have to admit, the math here is not trivial any more because the exponential and logarithm functions need to be applied and the mathematical rules need to be present, but this is assumed to be know by software developers and software engineers. They do not need to know them by heart, but they should know they are there and where to look them up. Here is the mathematical algorithm in words:

  1.  Calculate the logarithm of base 10 of the number to be converted. Remember: log_10 (x)  = log(x) / log(10). Most programming languages only offer the natural logarithm with base e.
  2. Round the exponent from 1. up to an integer. Up rounding is needed later to not exceed the capacity of the 4 byte integer mantissa.
  3. Calculate the mantissa by dividing the number by 10 up to the power of the exponent got from 2.
  4. The number from 3. is guaranteed to be smaller than 1 due to the up-rounding of the exponent in 2.
  5. Multiply the mantissa by 1 billion to get the maximum precision of our two integer exponent representation and round it into our 4 byte integer.
  6. Subtract 9 from the exponent in 2. to get the correct exponent to the new mantissa.

That’s it. It is maybe not that easy to understand mathematically, but it is optimized for computers. A computer is better in computing numbers than in manipulating string.

Conclusion

In our daily work we are confronted with a lot of tasks which are not so easy to solve on first glance. We always should look out for the best solution for the computer to perform, for the developer to implement correctly and for other developers to understand.

Since computers are optimized for computations, mathematical solutions are to be preferred for performance reasons. It can also be assumed that software developers and software engineers have some understanding of mathematics to find correct solutions. If not, there are also forums to ask people for help. It is not very probable that a problem at hand was not solved by someone else, yet. The most problems occurred already somewhere else and a proofed solution is always better than something new.

JUnit test reporting

Some time ago in 2011, I asked a question in stackoverflow about test report enrichment of JUnit tests. Because I was asked several times on this topic afterwards, and today again, I write down  the solution Ideveloped and used that time. Currently, I do not have a better idea, yet.

The issue to solve

For a customer detailed test reports are needed for integration tests which not only show, that everything is pass, but also what the test did and to which requirement it is linked to. These reports should be created automatically to avoid repetitive tasks to save money and to not frustrate valuable developers and engineers.

For that, I thought about a way how to document the more complex integration tests (which is also valuable for component tests). A documentation would be needed for classes and test methods. For the colleagues responsible for QA it is a good help to see to which requirement, Jira ticket or whatever the test is linked to and what the test actually tries to do to double check it. If the report is good enough, it may also be sent out to the customers to prove quality.

The big question now is: How can the documentation for each method and each test class put into the JUnit reports? JUnit 4.9 was used at the beginning of this issue with Maven 3.0 and not I am on JUnit 4.11 and Maven 3.1.0.

The solution

I did what was suggested in stackoverflow, because I did not find a way to enhance the surefire reports directly. The solution for me was the development of a JUnit run listener which collects some additional information about the tests which can be later used to aggregate the reports.

The JUnit listener was attached to maven-surefire-plugin  like:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-failsafe-plugin</artifactId>
    <version>2.12</version>
    <configuration>
        <properties>
            <property>
                <name>listener</name>
                <value>fully.qualified.class.name.to.your.listener</value>
            </property>
        </properties>
    </configuration>
</plugin>
The listener needs to extend org.junit.runner.notification.RunListener and there you can overwrite the methods testRunStarted, testRunFinished and so forth. The listener needs to be put into a separate Maven project, because a listener cannot be build and used in the same project, so another project is to be created separately.
For the tests itself, I also created an annotation where I put it in the requirements id and other information needed:
@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.TYPE, ElementType.METHOD})
@Documented
public @interface TestDoc {
    public String testId();
    public String rationale();
    public String bugId() default "";
    /* Put in what you need... */
}
A test class can then look like that:
@TestDoc(testId="123", rationale="Assure handling of invalid data does not cause a crash.", bugId="#42")
public class InvalidStreamingTests {

    @Test
    @TestDoc(testId="124", rationale="Check for stream which aborts and leads to incomplete content.")
    public void testStreamAbort()
    {
        /* The actual test code */
    }
}
In the run listener one gets the reference to the actual test which can be scanned for annotations via reflection.The Description class is quite handy, one can retrieve the test class as easy as:
String testClassName = description.getClassName();
The information found there can be stored away, what I did in a simple CSV file. After the tests run, these information can be gathered into a fancy report. The information which can be collection reaches from start timestampes, over the actual test class calls, the result of the tests (assertions and assumptions) to the finish time stamps. The classes can be check via reflection at all times and all information can be written into a file or even to a database or server.
The reporting was done offline later on, when I first used this solution. But, if some more work effort is possible, a Maven report plugin (or a normal plugin) can be written which collects the information in the post-integration-test phase for example. Here the limitation is the imagination only. It depends whether only a technical report is needed as TXT file or a fancy marketing campaign needs to be build around it. In daily life it is something in between.

Continuous Improvement in Software Development

I already wrote about “The Killer Difference between Mechanical Engineering and Software Engineering” and “Similarities in Quality Assurance between Cambodian Silk Painting and Software Development”. Related to this some comments on continuous improvement in software development and refactoring.

As mentioned earlier, there is not classical production process in software development. In classic production, we have repetitive work and quality measures like Cp and Cpk which are monitored over time and if they are out of certain limits (warn limits and out of specification limits), actions are taken. Over time this leads to a better production process by continuous monitoring and continuous improvement by taking the metrics from the monitoring process. In software development we need to do something else. We need a new approach (see post about the difference on mechanical engineering and software engineering).

During software development we need to use raw models to guide the development like source conventions, design patterns, and standard designs for error handling and such stuff. A clean, layered architecture also helps to guide development and leads to good code on green field projects (see the article about QA for silk painting).  But, as soon as the already present code is changed, the code degrades slowly, but steadily. Here a totally new approach needs to come in: We need to improve the existing code as steadily as the code degrades. As the entropy in physics tells us: The entropy is increasing steadily as long as there is no energy used to decrease it. That is why we need to put a lot of effort into the permanent improvement of existing code.

The buzz word for this process is “Refactoring”. It is needed to have a fixed amount of time planned for continuous improvement of existing code by refactoring. I suggest an amount of 5% to 10% of time as a minimum for each project. Depending on the actual state of the project it may be needed to plan more. Up to 50% is realistic if a new brown field project is  started (a new development is started on existing code) to smooth the way for the new project. There are some points to consider:

  1. Is the refactoring done independently? Are some people planned to do only refactoring on the code independently of the other developers? This can be done to change the architecture or some designs. Exchanging frameworks, cleaning up parts of the software or separating some parts from another are tasks which can be considered to be done, too.
  2. Is the refactoring done during development? The refactoring can be done during the development of new functionality. This approach needs a strong mindset of the developers. For each new functionality to be implemented, the code is evaluated first and refactored to provide a clean ground for the new functionality. It is better to clean the effected parts first, before adding new functionality.

Both approaches have their advantages and disadvantages. I suggest to do both: It is best to train developers to do some refactoring before new functionality is implemented to get a clean implementation of new functionality. These refactorings are allowed to be made after it was clarified that enough tests back up the changes. If there are not enough: These tests need to be created. In the daily standup each developer needs can tell her colleagues which parts are to be refactored to avoid conflicts due to simultaneous code changes. Additionally, a senior developer or architect may be needed to be asked for approval if the changes are more risky or effect too much code.

Independent refactorings should be added as tasks to the backlog as soon as they touch the overall design or architecture. These refactorings should be planned separately, but it is to be assured, that they are not forgotten. A good backlog is needed for that and a good set of rules on how to determine the priorities for each ticket.

Can Programs Be Made Faster?

Short answer: No. But, more efficient.

A happy new year to all of you! This is the first post in 2014 and it is a (not so) short post about a topic which follows me all the time during discussions about high performance computing. During discussions and in projects I get asked about how programs can be programmed to run faster. The problem is, that this mind set is misleading. It always takes me some minutes to explain the correct mind set: Programs cannot run faster, but more efficient to save time.

If we neglect that we can scale vertically by using faster CPUs, faster memory and faster disks, the speed of a computer is constant (by also neglecting CPUs which change there speed so save power). All programs run always with the same speed and we cannot do anything to speed them up by just changing the programming. What we can do is, to use the hardware we got as efficient as possible. The effect is: We get more done in less time. This reduces the program run time and the software seem to run faster. That is what people mean, but looking on efficiency brings the mind set to find the correct leverages on how to decrease run time.

A soon as a program returns the correct results it is effective, but there is also the efficiency which is to be looked at. Have a look to my post about effectiveness and efficiency for more details about the difference between effectiveness and efficiency. To gain efficiency, we can do the following:

Use all hardware available

All cores of a multi-core CPU can be utilized and all CPUs of the system if we have more than one CPU in the system. GPU or physical accelerator cards can be used for calculation if present.

Especially in brown field projects, where the original code comes from single core systems (before 2005 or so) or system which did not have appropriate GPUs (before 2009), developers did not pay attention multi-threaded, heterogeneous programming. These programs have a lot of potential for performance gains.

Look out for:

CPU utilization

Introduce mutli-thread programming into your software. Check the CPU utilization during an actual run and look for CPU idle tines. If there are any, check your software whether it can do something at the time the idle times occur.

GPU utilization

Introduce OpenCL or CUDA into your software to utilize the GPU board or physics accelerator cards if present. Check the utilization of the cards during calculation and look for optimizations.

Data partitioning for optimal hardware utilization

If a calculation does not need too much data, everything should be loaded into memory to have the data present there for efficient access. Data can also organized to have access in different modes for sake of efficiency. But, if there are calculations with amounts of data which do not fit into memory, a good strategy is needed for not to perform calculations on disk.

The data should be partitioned into smaller pieces. These pieces should fit into memory and the calculations on these pieces should run in memory completely. The bandwidth CPU to memory is about 100 to 1000 faster than CPU to disk. If you have done this, check with tools for cache misses and check whether you can optimize this.

Intelligent, parallel data loading

The bottle neck for calculations are CPU and/or GPU. They need to be utilized, because only they bring relevant results. All other hardware a facilities around that. So, do everything to keep the CPUs and/or GPUs busy. It is not a good idea to load all data into memory (and let CPU/GPU idle), then start a calcuation (everything is busy) to store the results afterwards (and have the CPU/GPU idle again). Develop you software with dynamic data loading. During the time calculations run, new data can be caught from disk to prepare the next calculations. The next calculations can run during the time the former results are written onto disk.This maybe keeps a CPU core busy with IO, but the other cores do meaningful work and the overall utilization increases.

Do not do unnecessary things

Have a look to my post about the seven muda to get an impression about wastes. All these wastes can be found in software and these lead into inefficiency. Everything which does not directly contribute to the expected results of the software needs to be questioned. Everything which uses CPU power, memory bandwidth and disk bandwidth, but is not directly connected to the requested calculation may be treated as potential waste.

To have a starter look for, check and optimize:

Decide early

Decide early, when to abort loops, what calculations to do and how to proceed. Some decisions are made in code on a certain position, but sometimes these checks can be done earlier in code or before loops, because the information is already present. This is something to be checked. During refactorings there might be other, more efficient positions for these checks. Look out for them.

Validate economically

Do not check in functions the validity of your parameters. Check the model parameters at the beginning of the calculations. Do it once and thoroughly. If these checks are sufficient, there should be no illegal state afterwards related to the input data. So they do not need to be checked permanently.

Let it crash

Check only input parameters of functions or methods if a fail of those be fatal (like returning wrong results). Let there be a NullPointerException, IllegalArgumentException and what so ever if something happens. This is OK and exceptions are meant for situations like that. The calculation can be aborted that way and the exception can be caught in a higher function to abort the software or the calculation gracefully, but the cost to check everything permanently is high. On the other side: What will you do when a negative value come into a square root function with double output or the matrix dimensions do not fit in a matrix multiplication? There is no meaningful way to proceed, but to abort the calculation. Check the input model and everything is fine.

Crash early

Include sanity checks in your calculations. As soon as the calculation is not bringing more precision, runs into a wrong result, gives the first nan or inf values or behaves strangely in any way, abort the calculation and let the computer compute something more meaningful. It is a total waste of resources to let a program run, which does not do anything meaningful anymore. It is also very social to let other people calculate stuff in the meantime.

Organize data for efficient access

I have seen software which looks up data in arrays element wise by scanning from the first element to the position where the data is found. This leads into linear time behavior O(n) for the search. This can be done with binary search for instance which brings logarithmic time behavior O(log(n)). Sometimes, it is also possible to hold data in memory in a not normalized way to have access to it in different ways. Sometimes a mapping is needed from index to data and sometimes the other way around. If memory is not an issue, think about keeping the data in memory twice for optimized access.

Conclusion

I hope, I could show how the focus on efficiency can bring the right insights on how to reduce software run times. The correct mind set helps to identify the weak points in software and the selection of the points above should point out some directions to look into software to find inefficiencies. A starting point is presented, but the way to go is different for every project.

5S Methodology and Software Development

The 5s methodology is used to keep the working environment clean, ordered and efficient. I came across this methodology when I was working in a production area for a semiconductor factory. This system was referenced from time to time when internal audits showed some weaknesses in regard to efficiency, order or cleanness. For a short reference on 5s methodology have a look to: http://en.wikipedia.org/wiki/5S_%28methodology%29.

The key topics Sorting, Straightening, Shine, Standardize, and Sustain are quite weak translations of the Japanese words Seiri, Seiton, Seiso, Seiketsu, and Shitsuke due to the wish to translate it into English words which also start with the S letter. Nevertheless, they express roughly the idea behind that and details are explained below. Again like for the post about the Seven Muda, I also translate these topics into the field of Software Engineering as I understand it.

Sorting (Seiri)

This principle has a strong relation to the Muda Inventory, Over-Processing and Over-Production.

Classic meaning:

The meaning here is: Remove everything unnecessary. Check all tools, materials, and machinery and remove everything which is not needed. This cleans out the workspace, makes spaces and removes distraction. The rate of defects decreases due to a lowered risk to use wrong tools or materials and more space means less incidents.

Software Engineering:

For software engineering it is the same, but it is twofold:

  1. For development process: Remove all tools and stuff in your workspace, IDE or PC which is not needed. These tools distract the developer, make the workspace cleaner as mentioned above and also sometimes more stable (everyone who uses Eclipse with a lot of plugins know, what I mean).
  2. For Architecture and Design: Remove all components, interfaces, libraries, and systems which are not needed. These are added as soon as they are needed. For example: To implement everything in patterns right from the beginning does not make much sense, when  not needed, yet. The requirements may change and what was though at the beginning is needed, will not be needed further on. Only use and implement, what is needed and postpone everything else into the future when needed.

Not hitting the correct meaning, but also part of it is duplicate code. Duplicate code is something which is redundant. Redundancy is also something which needs to be cleaned out. Duplicate code is a nightmare for maintainability and should be avoided in all means.

Straightening (Seiton)

This principle has a relationship to Muda Transport, Motion and Waiting.

Classic meaning:

This principle is about straightening the processes. Everything should be processed in an efficient way. Transport ways need to be shortened, motions to be avoided, unnecessary tasks to be eliminated and wait time to be reduced. This principle can only be applied in iterations with close observations.

Software Engineering:

In software engineering this principle can be used as a driving factor for lean architecture and design. The Muda Transport, Motion and Waiting in post about the Seven Muda give hints were to look out in software engineering.

Shine (Seiso)

Classic meaning:

This is about cleaning and ordering the workspace. As soon as everything is clean and ordered, the station is ready for usage. In each shift or on daily basis, cleaning and ordering should be scheduled. On such a workspace, process flaws and defects are better to find and the work is easier, cleaner and safer.

Software Engineering:

For software engineering, I would refer to Clean Code and Refactoring. Write clean code and clean the code as soon as bad smells are detected. This keeps the code clean and erosion is prevented. Bugs are easier to spot in clean code and also easier fixed.

With refactoring the architecture and design stays clean, too. This assures an understandable architecture which support bug fixing and improvements.

These actions should take place during normal work, but also scheduled at the end of sprints for instance. Code Reviews can also set in place in critical parts of the software to assure the right measure of quality. A time budget from 5% to 50% depending of the state of the code should be scheduled. In brown field projects with a lot of legacy code massive cleanup can help to improve the later development of new functionality dramatically. But, also in green field projects erosion takes place and should be fought with a 5% time budget at least. Have a look to the books Refactoring by Martin Fowler and Clean Code by Robert C. Martin for details, or Working Effectively with Legacy Code by Robert C. Martin.

Standardize (Seiketsu)

Classic meaning:

In fabrications all work stations should be standardized, what means that they should look, function and feel all the same. It is more easy to train people on a new station that way, the quality is higher due to a lower defect rate and it is also cheaper if it is possible to reuse procedures, tools and material.

Software Engineering:

For software engineering, there are two possible meanings:

  1. All developers should use the same tools for development. It is more easy then to maintain a development environment where only one IDE is present, one build system, on OS and so forth. Only for testing there may be some variation, but for pure development, it is easier to deal with one kind of tool for one purpose.
  2. Within the software everything should be handled in a standardized way. So the architecture should define standards and also design. For instance it could be a standard that all components of a larger system communicate to each other with a REST interface. There only one REST library is used. It would be worse if all components would talk with another protocol like SOAP, RMI, EJB and so forth. For design it is the same. Exception handling for instance should be defined how it should be done. How are files handled? Coding guide lines and so forth.

Standards help that people can identify parts in larger systems more easily. Understandability and Maintainability improve dramatically.

Sustain (Shitsuke)

The first four points are hard enough to accomplish, but this sustaining point is even harder. What you did and accomplish in the first four sections is a large step to an efficient production environment. This is something which is done in form of a project, but a project is time limited. The real art of 5S is now to keep the state what was accomplished and to even improve it. That’s a huge leap! The goal is to establish a control system which checks for instance from time to time the current situation in a kind of internal audit and to raise the issues found. The issues should be fixed as soon as possible. With a regular check of the other 4S and an improvement of the findings, the current state can be sustained and even improved. But, this needs a lot of attention and energy.

In Software Engineering the buzz words Code Smell (or just Smell), Refactoring and Architecture Refactoring come to mind. As soon as there are bad smells, an action needs to be taken to fix this issue. The longer the issues is present, the more it manifests itself and the harder is it to be fixed. As the Asian proverb says: It is easy to change the direction of a river at its source…

Further Enhancements

Sometimes some enhancements are added. These points express some enhancements which should be taken care of, too. I only explain them shortly, because they are quite self explanatory.

Enhancement to 6S: ‘Get Used to it’ (Shukan)

This point is often added to the original 5S. With the checks and fixes in the Sustaining part, people get trained to keep an open eye on 5S. Over time everybody should develop a habit of fixing everything which is not in order to have an easier and more efficient life. It is good to create a company culture for 5S.

Safety

Classic meaning:

This is very easy: Keep everything and order and additionally, watch out for sources of accidents and prepare everything that this accidents cannot happen.

Software Engineering:

Here it is about the quality of source code and architecture for accidents like crashes, wrong results, stability and so forth.

Security

Classic meaning:

Keep people out which are not supposed to be in certain area. Keep secrets secret and confidential data confidential.

Software Engineering:

Build  your software in a way that only authenticated people are allowed to change settings which they are authorized to and keep data protected from people which are not allowed to see them.

The Seven Muda (Wastes) and Software Engineering

I was introduced into the term Muda (waste) when I was working for a semiconductor fab. I learned that there a seven of them and that  a close look out to these wastes can reduce costs and increase quality dramatically. A short introduction about it can be found here: http://en.wikipedia.org/wiki/Muda_%28Japanese_term%29.

I want to write about these wastes with a shifted focus. The seven Muda were formulated for classical production and ‘real world’, but as we can see later on, these principles can be used for software engineering, too. Looking out for these wastes can help to make architecture, design and code lean and clean.

Transportation

Classic meaning:

The classic meaning is the reduction of transportation. This means, one should look out for everything what is transported and how these transports can be minimized. In fabrication it means for example that the transport for goods can be optimized by ordering larger quantities, to use vendors which are closer by and so forth. The transportation costs can be reduced and the margin can be increased. Transportation does not add value to the product, but it adds risk. During any transport operation the risk is there for breaking, loosing and delaying the product.

Software engineering:

Transportation can be translated here for example to IO. Avoid transportation over network, to disc and what so ever. If IO can be reduced, the timing is better, bandwidth is saved and other applications and systems are not affected negatively by a bandwidth exhaustion due to excessive use of IO of one system. To safe IO, good data models should be used for a high reuse of data. Double fetches could be avoided, only data which is needed is fetched from a DB for example and not the whole database is read just in case… Savings here leads directly into more responsive systems, reduced costs for IO facilities and high throughput.

Inventory

Classic meaning:

The meaning of Waste of Inventory is quite easy. It is about the waste to have raw material, finished products and work in progress laying around without the prospect to monetize it. It may or may not be sold. In this state it is a potential waste and should be avoided. A waste of inventory may lead into selling the products under value or even into dumping them.

Software engineering:

In software engineering inventory is twofold:

  1. Source Code: Writing source code which is not requested by the customer (directly or indirectly) may not lead to revenue. So it is a waste of time and also resources to produce it. Only functionality which brings in revenue is to be developed. Everything else is potential waste, like finished products laying in a warehouse without a demand by customers. A lot of money can be saved by letting developers produce software which adds value and brings in money. Also trial and error development (or Programming by Coincidence like described in The Pragmatic Programmer by Andrew Hunt et. al.) is waste. All development versions which are dumped on the way to the final version are waste. Some thoughts in advance can save a lot of time, money and trouble.
  2. Data: With the ‘Big Data’ discussion the Waste of Inventory is put into public again. Every piece of data stored costs some money. Even one pays storage by cents per gigabyte, the big amount of data makes it expensive. Data should be stored only if needed and selected carefully. Costs for storage can be reduced. Due to date is transported to the storage facilities, transport costs are also reduced.

Motion

Classic meaning:

When transport is about bringing the goods from one facility to another or from one machine to another, than Motion is about the handling of products during the production process. The more handling is involved during production, the more time is needed for that action and risk is added for damage and low quality. Also, motion is mechanical motion and lead into a degeneration of the machines used. For people it is the same. To much handling of products lead to illnesses and other issues which also cost money in form of sick days. Avoiding motion reduces costs for maintenance, sick days and broken products.

Software engineering:

Motion is software engineering is not so easy to define. The closes equivalents in my opinion are:

  1. Motion = unnecessary things done in software. This may be an animation too much which is not needed, but might break the application by its presence and wastes CPU time. It may be a storage operations too much just to be save some data temporarily for a case of power failure, but it the data could be recalculated if needed. There might be a watchdog to much. Defensive programming is find, but too much is not needed. Things done unnecessarily lead into waste of time and resources. The software runs longer and wastes CPU time. Sorry, I do not have a better explanation, but maybe you get the point.
  2. Motion = unnecessary things done during software development process. This can mean unnecessary work due to Programming by Coincidence due to missing design sessions. It can also mean writing unnecessary documentation, design papers and such stuff. It can mean unnecessary meetings, conference calls and status presentations.

Waiting

Classic meaning:

Every product in Work which waits for something, does not add value, consumes space and the delays may lead to a bad reputation. All wait times need to be reduced. This can be done by queue managing. Have a look to the book The Principles of Product Development Flow: Second Generation Lean Product Development by Donald G. Reinertsen for more information.

Software engineering:

In software engineering, the most obvious waste due to waiting would be a programmed delay or sleep in a program to wait for something. This is obviously not a good design. Better do a design with asynchronous execution and notification. A program should always do something meaningful if possible. Do not wait for something to happen. Do something in meanwhile and wait for notification for example or have some processes in parallel which fill up the CPU time of a sleeping thread. All waits are a waste of customers time. This should be avoided, otherwise it leads into frustration without adding value to anything.

Over-processing

Classic meaning:

In classic fabrication this means: You do something better, more accurate or more beautiful than required. The customer is paying for a product with a negotiated specification. This specification needs to be met, but not more. More work on the product will lead into higher production costs, more time needed and more risk for damage without a monetary compensation.

Pay attention: Over processing can be a part of a marketing strategy and a customer satisfaction program. By over delivering a customer may be surprised positively which may lead to a returning customer, a higher order for the next time and so forth. This is not over processing as it is meant above. This is part of a strategy which brings higher revenue in future.

Software engineering:

Over processing is quite the same as in classic engineering. A software product calculates more accurately than needed. The performance tuning was done extensively to get the last microseconds out of the calculations. And there are much more things like that. As long as the product is good enough, we should stop working on features already done. It does not bring more value to the customer.

Here too: Please pay attention for over delivering. This is a magic tool if done right. See above at the classic engineering section.

Over-production

Classic meaning:

Over-production is simply the production of more pieces of a product than needed the time of production. There is a risk that not all products which were produced can be sold. The avoidance of over production reduces costs, reduces the amount of resources needed for production and is also good for the environment.

Software engineering:

Over-production has two meanings in software engineering, as far as I can see it:

  1. Over production of results: A software product which produces more results than needed, wastes resources and time. This is not what customers want and that is also nothing they want to pay for. At least, provide configuration possibilities.
  2. Over production of features: In software engineering (as in all engineering disciplines), engineers tend to over-engineering. Full blood engineers want to make the product perfect, feature rich, shiny and so forth. This might lead to feature bloat. Every feature which is not requested by the customer does not add value. A customer will not pay more money for functions they do not want to use. That’s why a lot of products come in different flavors like community, basic and enterprise version. The customer chooses what features are needed and pays for exactly them.

Defects

Classic meaning:

Defect products need repairing or if they can not be repaired, need to be dumped. Both choices cost money. Additionally, the reputation is influenced negatively which costs future money due to customers not wanting to pay again for a product from the same manufacturer. It becomes even worse as soon as the defect damages something on customer site and the customer asks (un-)politely for regress. A good customer support division can compensate a lot, but this is expensive, too. So: Defects should be avoided. They always waste a lot of money.

Software engineering:

For software defects the same facts are valid like for mechanical engineering. Defects cost money and reputation. So, the best is not to have any. Avoiding defects by excessive testing and quality control is cheaper than handling angry customers, doing failure analysis, bug fixing, patch releasing and loosing future customers.

Additionally as 8th Muda: Latent skill

There is an additional unofficial 8th Muda: Latent skill. Officially, it is spoken about utilizing the skills of employers. People which were hired to fill out a certain position might be able to do much more or more valuable work than what the position requires. These people should be given an oportunity to grow and do what they are capable of. Additionally, a lot of employees want to learn more and want to be trained. It is not only about getting a higher salary, but also about personal grow and satisfaction.

In my opinion there is another site of Latent Skill Waste: It is about machinery. Some high-tech machines are capable of doing more, than there were bought to do. They can be utilized if it is possible. In IT this is were cloud computing was invented. It is partly waste by waiting and waste by latent skill, when servers are not utilized due to too less work. With cloud computing utilization of servers can be increased. This utilization comes in two flavors: Doing more of the same work (reducing waste of waiting) or running other services in parallel (reducing waste due to latent skill). A higher utilization means more revenue and therefore more profit, because the deprecation costs are the same.

A Final Thought

The Muda are not meant to be used for cost reduction in first place. The mind set is not correct, in my opinion. The Muda are about efficiency. To do cost reduction, efficiency needs to be increased, that is correct. But, to think about cost reduction only leads into decisions which might hurt quality and effectiveness. About the difference in mind set and practical approach, I might write about later on in another post.

%d bloggers like this: