
  ESSAYS ON SOFTWARE ENGINEERING

  Software Bloat - Is it Here to Stay?    (c)  1994  by Herb Chong


  Have you ever wondered how all that space on your hard disk is used? Doesn't
  it seem like yesterday that a 10M hard disk, or any hard disk at all, was an
  incredible amount of storage that would take a long time to use up? I went
  through my old Byte magazines to research this article and tallied up the
  hard disk sizes in some of the systems reviewed and previewed. Combining
  that with some data published in PC Week and a little black magic, I ended
  up with this chart. It purports to show the average hard disk size shipped
  with the "average" system today. The 1994 figure is PC Week's estimate for
  the end of 1994.  Extraordinary -- isn't it?

  If you take the numbers and do a little arithmetic, as I did, you will find
  that the average disk size almost doubled each year from 1986 to about 1990,
  and then more than doubling since then. I don't think it is any coincidence
  that Microsoft Windows 3.0 hit the market place in May of that year. People
  have more applications on their systems if they are Windows users than if
  they are DOS users and those applications are bigger. Windows applications
  tend to come with more features and are just generally bigger than their DOS
  counterparts.

  Is there an end in sight to this rapid growth? To answer this question, we
  need to look at some of the reasons for the rapid software size growth these
  past several years, what causes lie behind those reasons, and finally, what
  assumptions they create about how people use their systems.

  More - Cheaper - Faster -, and Sooner!

  Take a look at the Windows applications on your computer. Have you upgraded
  any of them since you got your first version? What has changed from version
  to version? Each upgrade promises that you simply cannot do without the new
  features that the older version doesn't do.The packages are getting skinnier,
  especially if you get the CD-ROM version. They seem to come out at an ever
  increasing rate on ever increasingly tight delivery schedules. It's a trend
  that started in the `80s and is continuing. Let's look at these and other
  factors and how they influence software size.

  Staying Even with the Competition

  It seems that the only real justification for an upgrade is to get new
  features that you want. Marketing's job is to convince you that you really
  need these features. Otherwise, they are not going to make any more money
  from you. A software vendor in the PC world doesn't sell you a subscription
  yet....they make a one-time transaction. To stay in business, a software
  vendor must continue to sell to new customers. What better way to get new
  customers than to convert all your old customers to new ones by obsoleting
  everything they own? If they're going to make people pay for their software,
  they have to convince people that they need something they don't already
  have.

  Just in case you have any doubt, the marketing department spends a lot of
  time and money convincing you why the latest features are ones you really
  need to have and what new things you are going to be able to do with their
  new version that you can't do with the old. There is no doubt in my mind
  that all new features are useful. The real questions are how useful they are
  and to how many people? As the software market and consumer sophistication
  mature, it's harder and harder to find genuinely useful features for a large
  portion of the users.

  Nonetheless, if you decide that one or two features are sufficiently useful
  to you to upgrade, you'll upgrade to get them. When you do, you get all the
  features you don't need as well. The programmers spent time writing and
  debugging the code,and the code ends up on the installation disks and your
  hard disk. You pay for them all. The software vendors and the programmers
  will argue that the cost of adding all these features isn't a lot more than
  adding some of them, and this will allow them to satisfy more people than
  they would otherwise be able to. No doubt this is true, but there is a fine
  line between adding a feature simply for the sake of adding a feature and
  adding a feature because many users can't do without it.

  Let's use Word for Windows as an example. I use Word for Windows the most
  of all the applications on my desktop.  One of the handiest features to come
  along in Version 6 is the AutoCorrect feature. If I forget to hold down the
  shift key when I begin a sentence, it capitalizes it for me when I press the
  space bar.  If I forget to hold down the shift key in the middle of the
  sentence when I press the "i" key, and then space, it upper cases it for me.
  If I hold down the shift key too long and the first two letters of a word
  are capitalized, it lower cases the second letter. It remembers that I type
  "don;t" frequently when I really mean "don't" and fixes the mistake.
  Autocorrect is a really useful feature because I'm a self-taught touch
  typist and I have picked up some bad habits.

  Again using Word as an example, I have yet to find someone who prefers to
  move text by using drag-and-drop instead of cut and paste, either via the
  keyboard, toolbar or menus. It's harder to position the cursor for an exact
  paste and so people frequently drop the text in the wrong place. I know that
  it took someone a some nontrivial amount of time to get it working...and it
  does what it is supposed to do. How many people really benefit from it? Not
  nearly as many as Microsoft hoped when they introduced Word for Windows
  Version 2.0 and highlighted this as one of the most significant new features.

  There's the competition too. After Lotus introduced SmartIcons into its Ami
  Pro word processor and received favorable press, Microsoft and WordPerfect
  had to follow suit, whether or not it fit into their style of working. As
  soon as one of the big three word processors introduces a new feature into
  their program, it becomes a point of comparison between the programs. Adding
  features becomes a game of marketing and programming one-upmanship to come
  up with new features for these programs. The features themselves makes the
  competition play catch-up and allows the program to reach out to yet more of
  the users who might otherwise choose something different. Every extra line
  on the features comparison chart cost you more money and disk space, whether
  you use it or not.

  RTFM (Read The Fine Manual)

  Have you noticed that manuals are getting thinner and thinner? I have. As I
  upgrade my one hundred or so Windows applications on my main computer, I
  manage to find more and more shelf space to put the third party books I have
  to keep on buying to understand something that isn't in the manual anymore.
  That shelf space comes from the new version's manuals occupying less space
  than the ones the old version occupied. I somehow manage to net out at about
  the same amount of space as I used to.

  The information that used to be in the manual has to go somewhere. If you
  are willing to live with slow access times and keeping the right CD-ROM in
  your drive at all times (I'll ignore those of you with jukeboxes), you need
  any extra disk space for the on-line versions of the manual.  If you don't
  want to do this or don't have a CD-ROM drive, you have to put the manuals
  onto the hard disk. Yes, it's nice to be able to look up things from wherever
  you are, but how many of you actually prefer the on-line manuals to the
  paper ones?  There are too many things that just aren't easily suited to
  on-line use. This includes tutorials and detailed reference information.

  I ordered an upgrade from Microsoft Visual C++ 1.0 to 1.5 recently. It only
  comes on CD-ROM media and doesn't come with manuals. You need to pay $100
  for the manual set.  This is a continuing trend in Windows software
  distribution.  Hardcopy manuals have been shrinking and shrinking. The
  information formerly in hardcopy is being shifted to on-line documentation
  because it's cheaper for the vendor. In this day and age of increasing
  competition and ever diminishing profit margins, trading a $30 manual for a
  few $1 diskettes is something that can't be ignored.  Reducing the cost of
  the software is somewhat offset by the extra disk space for the floppy
  disks and the space taken up on the hard disk.

  Guess who has to pay for the exchange of disk space for manuals?  Marketing
  has always managed to sell or at least confuse the issue by concentrating on
  the great things you can do with on-line help like hypertext and searching,
  that you can't do with a hardcopy manual.  Frankly, the Windows Help Engine
  isn't anything to brag about. I can do a few things with the on-line help
  that I can't do with the manual. There are also, a lot of things I can't do,
  like reading it without turning on the computer, or having to use low
  resolution text and graphics instead of phototypeset output, or being able
  to mark it up with notes and little drawings. On-line help is great when I
  can't carry the manuals with me, but when I'm in my office surrounded by my
  bookshelves, on-line help is annoying if it's all I have.  I rely on the
  Visual Basic On-line Help because the manual is too thin to be helpful.  It
  keeps referring me to the on-line help for the real answers and I get to
  pay for this privilege.

  I Want It Yesterday !

  The average Windows program is much more complicated than the average DOS
  program trying to do the same thing.  The event driven model of application
  interaction places a heavy burden on the application programmer to take
  care of all sorts of details about making their application run. A few years
  ago, object-oriented class libraries and C++ became the next great thing in
  Windows programming. Some people went to a lot of trouble packaging up all
  the details and providing defaults for everything so that unless you, the
  programmer, wanted something different from the default, you didn't have to
  write anything. The class library took care of everything.  Programmer
  productivity shot up. What used to take a year to design and write now took
  a couple of months.  Marketing folks went nuts. Now they could promise even
  more to their customers and still have a good chance of delivering.

  With such pressure from all sides, programmers really haven't got much
  choice. They have to use development tools that let them get as much correct
  function as possible with as little effort as possible. Everybody else is
  using them.  The tools, however, have a major drawback: they are profligate
  in their use of memory and disk space. People used to complain that the
  Windows equivalent of the famous "hello, world" program took up 20K of
  memory, which in the DOS equivalent would occupy a measly 800 bytes. Yet a
  program that takes up 10 times that much space barely rates a blink, because
  that is what C++ class libraries like Microsoft Foundation Classes and
  Object Windows Library impose on the programmers. Turn on debugging and then
  you see disk and memory requirements grow by another factor of five.

  It's all part of how the C++ language and the Intel object format are
  defined. Whenever a programmer references a variable, it has to be included
  as part of the program whether it is used or not. The compiler can't even
  try to tell until you bring everything together at linking time whether
  something might or might not be used. In the days of C programming, it
  wasn't so bad because all the various variables a program could use were
  split across many header files and a programmer could be selective about
  which ones they used.  That helped cut down on the number of referenced but
  not used variables in a program. With C++, whenever you use a class library,
  you have to include the entire class hierarchy every time.  Doing otherwise
  is extremely error prone and just plain inelegant. Declaring a variable of a
  type in the leaf of the class hierarchy brings in everything above it right
  up to the top, - all their member variables and all their member functions!
  In the case of MFC and OWL, this can be a total of several hundred for every
  variable a programmer declares in their program.

  When a linker processes object files to produce an executable, it knows some
  thing about which functions and external variables are used throughout. C
  and C++, however, do not permit the linker to eliminate unused code. Partly
  it is because of how C and C++ allow you to abuse the language and cause
  references to such objects outside of the compiler's knowledge, and partly
  because the Intel OBJ format doesn't store enough information for the linker
  to unambiguously tell if a function is really unused or not.

  There isn't much choice but to leave them in. Borland thought this was
  enough of a problem to invent an extension to the OBJ format to allow the
  linker to know for sure whether something was needed or not and eliminate
  redundant code. So Borland Pascal for Windows programs using the same OWL
  class library can come in at 2/3 to 1/2 of the size of C++ programs using
  OWL.  Do you see a stampede toward using Borland Pascal as the standard
  Windows development tool? Most developers don't seem to care. Most can't
  afford to care.  Productivity is what they are measured on. Once again, you
  pay extra for the programmer's productivity. Unfortunately, the programmer
  doesn't benefit from what you pay.

  It Works! What More Do You Want?

  Imagine you are a new programmer on a project. The program you will be
  working on has been around for about three years. Remember this is Windows
  and C++, not COBOL.  There have been four programmers before you who have
  worked on the code. They are no longer working on it because they have been
  promoted or moved on to other things.  Your job is to take the list of
  features the team leader has negotiated hard with the marketing folks about
  and turned your part of that list into something that works. There's no
  documentation, - one hundred thousands lines of code, - and no-one to ask!

  Do you dare take out any code?  After all, it works now, more or less.  Much
  safer to work in this bit here, work in that bit there, and generally change
  something only after you are absolutely sure of how it works. During testing,
  you find that sometimes garbage appears in your input.  If you fiddle with
  it a bit, the program doesn't crash, and things seem to keep on working. All
  your predecessors except the original programmer probably did the same thing.

  Any Windows program that has been around for more than a version or two is
  going to become harder and harder to add features to.  First of all, new the
  features are more and more pervasive and more and more complex. They just
  can't be hacked in an afternoon. Second, adding these features stretch the
  original design more and more, frequently pushing it in directions that were
  never intended or deliberately avoided.  Programs more than a few versions
  old quickly become frightful patchworks of elegance and ragged code right
  next to each other.  It becomes harder and harder to enhance.

  Put another way, programmer productivity is not as high as it should be.
  With today's deadlines for software delivery, especially in the Windows
  software arena, delays in delivery are very unhealthy. The faster a company
  can deliver new releases, the more money they make and the happier the share
  holders are. It doesn't leave much room for tuning, redesign, and other such
  things that refine the way a program works inside. If it's not visible to
  the user, it's not a feature. Features sell. Taking that long pause to
  re-architect for the future means no new releases for a while. No releases
  means no income. Guess where management wants you to spend your time?

  When Will It End?

  Just how far can these trends continue? Remember reading about how cars were
  made in the `50's?  Each year, there seemed to be a different bump or lump
  (some people called them fins) on a car.  This year's lump was in and last
  year's lump was out.  It kind of came to an abrupt halt in the mid 60's.
  People suddenly wised up. Cars weren't really all that different from year
  to year.  It was marketing of features that didn't really have much to do
  with what people wanted in a car.

  I think that we are in a situation with Windows software where there are so
  many people new to software and using tools when they really don't know much
  about computers yet.  They are swayed by the advertising and press that new
  versions of programs receive in review after review.  When most people are
  able take a serious, educated look at what they do and what they need in
  software, I think that software sales are going to drop off.

  Corporations are slower to adopt new version software because they spend
  more time defining the real costs of software. They understand that the real
  costs includes payment for upgrades they don't need, advertising to convince
  them that they can't do without some feature or another, or that they will
  be left behind by an implicit warning against obsolescence without some
  wonderful  upgrade or another. They know they will pay again because after
  the upgrade, they won't have enough room for all the other software that
  they need, or enough CPU to run that essential piece of software, and never
  enough colors to bring those games truly to life.

  When consumers get fed up with being led around by the nose by the major
  software vendors, we'll see the rate of growth in computing power, RAM and
  hard disk space slow. Until then, we're going to continue to make everyone
  in the business richer.

  HERB CHONG has been a contributing writer for Windows Sources, is a
  Contributing writer for The Cobb Group's Inside Microsoft Windows; and is
  the Contributing Editor of WindoWatch.


