The SWIG Redevelopment Effort
David Beazley
Department of Computer Science
University of Chicago
Chicago, IL 60637
beazley@cs.uchicago.edu
$Header: /cvs/projects/SWIG/Doc/Devel/Attic/whitepaper.html,v 1.1.2.2 2001/09/02 12:45:13 beazley Exp $
1. An Introduction
One of the biggest problems faced by people writing software is the
problem how to make software easier to use, more interactive, and more
modular. Typically, the computer science community has approached
these problems by focusing on formal design methodology and highly
specified frameworks built around notions of software components,
object-oriented programming, and anything labeled as "best practice"
(whatever that means). Although this type of approach is perhaps
appropriate for very large software projects involving hundreds of
programmers, software engineers, and managers, I've never met a sane
programmer who really enjoys writing software in such an environment.
Furthermore, a large number of software projects are undertaken by
small groups of people who would not classify themselves as
professional software developers or software engineers. Typical
examples might include scientific computing software, specialized
systems for engineering applications, or just about any kind of
experimental research and development project. These are the types of
programming projects "in the small" that are my primary interest.
First, programming projects in the small should not be confused with
the toy programs one might write as part of a class project or when
solving exceedingly trivial problems. More often that not, a software
package written by only a few people may have been developed over a
period of several years and may contain of hundreds of thousands of
lines of source code. Furthermore, due to limited manpower, these
projects are likely to rely on a variety of third-party packages and
programming libraries to accomplish certain tasks. Finally, it is not
uncommon for such software to have been developed in a relatively
piecemeal fashion with little if any formal design. The developers
may also be burdened with the task of supporting a large base of
legacy code that is critical to the application, but which is too
complicated to simply rewrite from scratch. As a result, the software
developed in such an environment may be a tangled web of code that
gets the job done, but which is less than ideal in terms of its
usuability and overall design.
Of course, one does not need to look very far to see examples of this
kind of development. For instance, I would claim that just about
every successful project within the Open Source community has been
developed in this way. As a more specific example, Swig itself was
developed in a relatively adhoc manner over a period of two years.
Although it was my intent to have a relatively clean design at the
start, the system has since evolved into a very tangled mess of
monolithic C++ code. It's not that I wanted to end up in this
situation--rather the experience gained by Swig's early users pushed
the system in an unanticipated direction that the original design
failed to address. In many ways, it is ironic that SWIG should end up
in this particular state given that this is exactly the type
of situation that Swig was built to address!
Naturally, this brings us to the overall motivation behind SWIG itself.
In a nutshell, SWIG is a software development tool that aims to make it
easier to do the following:
- Build user interfaces to existing software. For example, the
primary reason for Swig's emphasis on scripting languages is not that
scripting languages are cool (which they are). It is that interpreters
make great user interfaces for a wide variety of applications.
Furthermore, interpreters can be used to build more advanced user
interfaces using toolkits such as Tk.
- Repackage an existing system as a collection of modules. The primary
motivation for this is that working with software organized as a collection
of loosely coupled modules generally results in greater flexibility and
reduced maintainance cost in comparison to a huge monolithic package. Since
scripting languages naturally promote the creation of modules and Swig makes it
easy to integrate scripting languages with existing software, Swig also serves
as a module building tool.
- Work with software in a rapidly changing, experimental, and
underspecified environment. One of the reasons why people don't like
formal component frameworks and over-specification is that they may
not know how a system is actually going to look or evolve when they
start a project. As a result, excessive formality is viewed as more
of a burden than a benefit. Swig, in a sense, turns this whole
scenario around by being highly adaptable and allowing the programmer
to write the software however they want as opposed to forcing programs to
be written within a rigidly defined set of rules.
- Serve as a rapid prototyping and testing tool. Given the
non-invasive way in which Swig works with existing software, it allows
developers to experiment with different modules, languages, and
methods of organizing a system. As a result, Swig can be used in the
prototyping and development stages of a project even if the final
package makes no use of Swig, scripting, or any of its related
modules.
I also want to emphasize that the target users of Swig are not professional
software engineers. Rather the system is designed to be very easy to use for
more ordinary people who just happen to be working on programming projects as
part of their work or for fun (physicists, engineers, hackers, etc...). It is also
designed to provide a certain element of "instant gratification" if you will. I believe that
the following quotes from a SWIG user survey put things in the right perspective:
- "Easy to use, no need to worry about language internals. It is a boon for application
developers, like me."
- "I really love the fact that the learning curve is short and flat."
- "Since SWIG has proven to be rather easy to use, I find I can carry out
the types of wrapping activities which would otherwise have been the responsibility
of a computer scientist."
- "I came, I saw, I wrapped. And it ran. Woo hoo!"
2. Problems with SWIG
Despite the early success of SWIG, the system suffers from a number of serious
limitations. Furthermore, these problems are not easily fixed within the current
design.
- The C/C++ parser is incomplete. SWIG only understands a
limited subset of C and is based on an incorrect representation of C
datatypes that prevents the proper handling of "const", references,
pointers to functions, and other more complex types. In addition,
fundamental things like C++ function overloading still don't
work. Although 99% of the common cases work and there are workarounds
for certain situations, these limitations are still annoying.
- The SWIG module system is all wrong. In the current
implementation, SWIG modules are created using C++ inheritance. This
has a number of unintended consequences. First, it restricts the
functionality of a module to a fixed
set of virtual function calls made deep inside the parsing engine. As
a result, it is not possible to write highly specialized modules that
don't quite fit into the normal module scheme. Second, it makes the
module system unnecesarily complicated and too tightly coupled. For
instance, there is no way to write a module that operates outside of
the SWIG framework or which might be useful on its own. Finally, I
believe that the C++ module system alienates the user community
because it is too complicated and there aren't that many C++ programmers. With a simpler
module interface, I believe that the system would be much more accessible
to the user community and people who want to write modules.
- Why stop at C and scripting?. Although SWIG does a great
job of building scripting interfaces, there is no practical reason to
restrict its functionality in this way. For one, it is probably
worthwhile to consider alternative input languages including Fortran
and CORBA IDL. Second, there are a variety of secondary tasks that
one might be able to do with such a system such as analyze the
structure of application interfaces, generate documentation, provide
interfaces to databases, and provide tools to help modularize existing
software. Although these sound like lofty goals, I believe that the system
should be flexible enough to allow such applications.
Of course, the real trick is how one goes about solving these issues
without making Swig excessively complicated--both from the point of
development and use.
3. SWIG Redevelopment: Modules
Simply stated, the primary goal of SWIG redevlopment is to redesign
the SWIG compiler as an extensible set of loosely coupled modules
(Note: it is not my intent to radically change the way in which an
end-user uses SWIG).
In this context, my intent is to allow a module to be virtually anything
that might be part of a compiler or which would interact with a
compiler in some manner. For example:
- Preprocessors.
- Parsers.
- Code generators.
- Code browsers.
- Documentation generators.
- Optimizers.
- Testing tools.
- Other development environments.
Unfortunately, as programs go, compilers tend to be extremely
complicated. Therefore, to make any sort of module system work, the
mechanism by which modules interact and exchange data needs to be
extremely powerful and extremely simple.
To address these problems, SWIG redevelopment is based on a few fundamental ideas:
- All data will be internally represented using an XML-like scheme
in which every piece of data is identified by a unique element "tag"
and a set of associated attributes. Manipulation of the data in turn
will involve nothing more than making an appropriate association of
the "tags" with some sort of "action" to be performed. Unlike an
approach in which objects are placed into a rigid C++ class hierarchy,
the XML-based approach allows a virtually unlimited number of
different object types and attributes to be created and manipulated without ever
having to recompile anything. As a result, this would allow modules to easily
extend the system in novel ways. It should also be added that this
data representation greatly simplifies the underlying core of
the system because an XML-like representation can be
built entirely using nothing more than a hash-table object and a
few fundamental datatypes such as strings and lists.
- All underlying data structures will be built using a dynamic type
handling mechanism and a small collection of fundamental datatypes
including strings, lists, and hash tables. There are several
advantages to this approach. First, dynamic typing generally results
in substantially less code if done correctly. For instance, in my
own experiences using Objective-C vs. C++, I found that my dynamically
typed Objective-C programs were up to 5 times smaller than their C++
counterparts. Furthermore, dynamic typing is also one of the reasons
why scripting languages are so powerful.
-
Modules will interact with each other and exchange data using the XML-scheme
previously described. Due to the flexibility of this approach, this allows
modules to be written in a relatively stand-alone manner. Furthermore, the
use of XML may simplify the development of external tools that do not share
any commonality with the SWIG executable or its internal data structures.
- Dynamic loading. Closely associated with loose-coupling, the SWIG module
system should optionally support dynamic loading of compiler modules. This might
be accomplished in two ways. First, I believe that SWIG itself should
provide a scripting interface that allows its modules to be dynamically
loaded into a variety of scripting languages. Second, SWIG
should probably implement some sort of module loading system that allows modules
to be used without the optional scripting interace.
Finally, it should be noted that the implementation language of choice for
the SWIG redevelopment effort is ANSI C. There are several reasons for this:
- ANSI C is highly portable and available everywhere.
- C provides the performance necessary to implement a few critical aspects of a compiler.
- C is the ultimate glue-language in the sense that it can be interfaced
with just about anything if you know what you are doing. This will be especially important
if we want to interface with third-party compiler construction tools.
- It is perhaps the most widely spoken programming language--making it a good choice
to encourage community involvement and the creation of additional SWIG modules.
- Dave likes it.
4. The Initial Module Set
The following list describes the proposed modules that will be part of the new
system:
- Swig. The Swig module contains a small core of functionality that is used
by the rest of the system. Features include access to the Swig library, command line
parsing, error handling, and a few common datatypes including a somewhat generic representation of
types.
- DOH. DOH is the dynamic type library that provides the fundamental
data structures used by the system as well as run-time support for dynamic typing.
- Preprocessor. A full C/C++ preprocessor with some extended macro handling
capabilities.
- CParse. A completely redesigned C/C++ parser that attempts to fix all of the parsing
problems in SWIG1.1. In particular, it will treat C/C++ datatypes correctly and support a
number of new C++ constructs. However, it is somewhat unlikely that this parser will
fully support all of C++ (at least not initially).
- XMLParse. A parsing module that can read XML files and turn them into a SWIG parse
tree structure. The initial plan is to simply put a thin wrapper around the expat for this.
- XMLWriter. A code generation module that can simply dump all of the internal
data structures out as a huge XML document.
- Tcl. A code generator for Tcl.
- Perl. A code generator for Perl.
- Python. A code generator for Python.
- Guile. A code generator for Guile.
- Java. A code generator for Java.
- Testing. A testing module that is designed to aid in the construction
of testing scripts. More details to be provided later.
- Documentation. A replacement for the SWIG1.1 documentation generation
system. The precise details need to be determined, but it is likely that this
system will produce both plain ASCII files or XML files.