This is better than the first tries, but it still isn't up to strawman level; call it the "paperman" description. If you have comments, or find things wrong or inconsistent (even if it's just a typo), write me in the scons-dev mailing list, or note things that need to be fixed below the horizontal line; I'll get to them as quickly as possible. Thanks!

The section on "Typical Tool Processing" is not completely updated: it's only partially edited and it still needs some reorganization, but I've been sitting on this partial update for too long. I'll get back to this edit as soon as I can.

Platform and Tool Configuration

The overall objective is to allow a SCons users more control over the tools they configure. It's intimately tied up with configuration of the platform, since many of the decisions about the right tool are predicated on which platform is selected.

This description starts with a series of overall requirements, then discusses first platform configuration and then tool configuration. The description continues with a discussion of the flow within a typical SCons Tool, since it differs slightly from the existing technique. The description ends with a section on backward compatibility.

(Comment: As I was writing this page, I found myself flipping back and forth as to whether a Tool module configured a tool (that is, a single command) or a toolchain (that is, a series of commands). The current Tool modules actually implement toolchains (e.g., the gcc.py module provides the environment variables for the compiler, the linker, the static archiver, the shared archiver, and the bundle archiver). This isn't good modularization, which suggests that there should be a higher-level module explicitly for toolchains that can invoke one or more tool modules as building blocks. That isn't in this proposal (should it be?), but it's something that should be kept in mind for the future.)

Overall requirements

This is a set of goals that new scheme must achieve to be viable. It's a checklist, with the things I think are taken care of checked off. Some goals are stronger requirements than others. Consider them the holy grail, always sought after, never achieved:

General flow

I'm going to invent unusable names below, mostly so they'll have to be changed before implementing them. Inventing good names is not one of my skills; I'll leave that to others who have a better imagination than I do.

Moreover, I don't claim a new class is the best way to implement this enhancement; it's likely that it can all be combined in an existing class (configure context comes to mind...).

The idea is that the user can create an InformationAboutPlatformAndTools (IAPAT) instance for each platform that is of interest, seeding it with suitable information. This establishes a number of "interesting" values about the platform. Tools are then configured for this platform by running a method and giving it the name of the toolchain wanted.

An InformationAboutPlatformAndTools instance is passed to the Environment constructor, which uses the configured platform and tools for its operations. (If no tools are configured when the instance is passed to the Environment, a default set of tools is configured at that time.)

Platform configuration

There's a tension here between the SCons platform configuration and the GNU autoconf platform configuration. The former has a small set of key identifiers (aix, posix, os2, win32, ...) that completely categorizes the tool selection. The later uses a canonical string of either three or four values (cpu-vendor-os or cpu-vendor-kernel-os).

The SCons scheme is adequate for limited cases of cross-compiling (a win32 program on mingw, for example) but is inadequate for more complicated cases (a Sparc SunOS program on a PPC Mac). A scheme that will be simple for the simple cases yet flexible enough for the complex cases is not easy to do. The complete description is complicated enough to require a separate specification to get all the details right, but I believe that it can be done.

To outline this approach, we need a short aside to discuss the --build, --host, and --target command-line options.

GNU uses a shell script called config.guess if the --build option isn't given, which goes through a complex series of gyrations (including probes of the filesystem) to determine the canonical name of the current machine. It uses a shell script called config.sub if a value is given to one of the options, which canonicalizes the name and provides aliases for common architectures. An advantage of the scheme proposed here is that it avoids having to implement anything like config.guess, an incredibly messy piece of code, at the risk of not distinguishing between some platforms that config.guess considers different.

We also postulate that the InformationAboutPlatformAndTools initializer will canonicalize input values (similar to config.sub, but not as extensive), so a CPU of ppc is converted to powerpc, for example. It takes zero, one, three, or four arguments:

The complete set of GNU autoconf aliases recognized by config.sub includes hundreds, perhaps thousands, of aliases for obsolete machines. I don't propose that we implement all of the autoconf aliases, since the effort to completely reverse-engineer config.sub would be a herculean task (and we won't even think about the issues with the license). We probably should implement a subset, but the choice of subset belongs in the separate specification for the initializer. There are four general types of autoconf alias, which we will list here to give a flavor:

The InformationAboutPlatformAndTools initializer converts the input value(s) into a set of output values. It's probable that not all of them should be calculated every time; some of them would be calculated on demand. Comment: the idea is that this would make it easier for configuration routines to generate their definitions, but it's just as logical that the calculation could be done in the configure routine that needs them. It's an interesting trade-off.

These are potential choices for calculated values, just to give a flavor:

The selection above is intended to be a bit controversial. Almost certainly there are other things that should be present. Discussion is welcome.

Tool configuration

Once the InformationAboutPlatformAndTools instance has been initialized with the platform information, tools can be configured.

Again, there's a tension between the GNU tool configuration, which is based on the name of the command, and the SCons tool configuration, which is based on the name of a module that sets up the information (environment variables) for the tool. The former is easier for a user to understand, but it requires that the knowledge of all possible command names be hard-wired into the configure macros; the latter requires that a list of module names be consulted when specifying the tool selection order. Each scheme has its strengths and weaknesses.

We propose a flow in the next section below that combines the two methods. I'd like to say that it combines the strengths, but others will probably suggest that it compounds the weaknesses. It requires that the Tool modules be extended (additional entry points will be required), but I'm pretty sure it won't affect backward compatibility.

This is a very concrete description, far more detailed than a model or specification usually is. Although I think it would work, I'm not wed to it, so feel free to suggest alternatives.

Tool configuration is achieved by processing a list of tool IDs. (A tool ID is a string identifying what processing should be done.) Processing is done by (recursively) mapping the tool IDs through a dictionary to produce (usually) module names. The mapping is via a dictionary that maps a tool ID to a list of strings that are to be processed further.

When examining a tool ID, it may be one of four things:

Callables and modules are not evaluated when they are encountered; they are accumulated in a list. Only if the top-level list expansion exists is the list evaluated.

Default Tools

In addition to the dictionary of tool IDs (which we will assume is called p.tool, where p is an instantiated IAPAT), there is a list of default tools that are configured if nothing is configured specifically. (If one tool is configured, all of them must be.) We will assume this list is called p.tools although it could just as well be p.tool['default'] by convention.

Here's a simple example of part of what would be set up for IRIX:
      p.tools = ['any', 'STDtoolchain', ... ]
      p.tool['STDtoolchain'] = ['any', 'SGItoolchain', 'GNUtoolchain']
      p.tool['SGItoolchain'] = ['all', 'sgicc', 'sgic++', 'FC', 'sgilink']
      p.tool['GNUtoolchain'] = ['all', 'gcc', 'g++', 'FC', 'gnulink']
      p.tool['FC'] = ['any', 'f95', 'f90', 'f77', 'g77', 'fortran']
      ...
The default set of tools can be adjusted, including replacing them whole cloth, by changing p.tools. New tools are created just by adding an entry in p.tool. The order of tool selection can be modified by changing p.tool['CC'], for example. Normally, a missing toolchain is ignored, but it can be made mandatory by setting the first element of the list to 'one':
      p.tool['FC'][0] = 'one'
to require a FORTRAN compiler to be present, for example.
(Comment: this is still not fully baked and may change as it is worked out more carefully.)

Command Variables

The command variable is the environment variable that specifies the command on the command line. Here is a set of potential command variable names, based on the tools supported by GNU. Autoconf has been around a while, so it's probably pretty comprehensive in this regard. (And I probably missed some!) (TODO: Support for other configure macros that set the command variable as a side-effect of selecting a particular mode for a tool.)

For comparison, this table shows command variables and the existing SCons Tools that set them up:

AS

386asm as gas masm nasm

CC

aixcc bcc32 cc gcc hpcc icc icl intelc* mingw* msvc* mwcc* sgicc suncc

CXX

aixc++ c++ g++ hpc++ sgic++ sunc++

DC

dmd

FORTRAN

aixf77 cvf f77 f90 f95 fortran g77 ifl ifort

LINK

aixlink gnulink hplink ilink ilink32 link linkloc mslink mwld sgilink sunlink

AR

ar mslib sgiar sunar tlib

* Sets up both CC and CXX.

In addition, each of these Tools sets up a command variable that is the upper case of its name:

Java

jar javac javah

TeX

dvipdf dvips gs latex pdflatex pdftex tex

parse

lex yacc

macros

m4

GUI

qt

FFI

swig

package

tar zip

???

midl msvs rmic rpcgen

The configured platform and set of tools is passed to an Environment when it is instantiated. The values set up by the platform and tools are copied (??? maybe there's a more efficient scheme ???) into the Environment instance, and off we go.

Typical Tool Processing

(See note above about not yet knowing which of these subsections I like best.)

Processing III

The essence of the merged flow between SCons and GNU is handled when the Tool module is processed. It's more complex than this sketch covers, but let's focus on the differences and worry about the other details later. Moreover, this short outline doesn't distinguish between what's done by the code in the Tool module and what's handled by the infrastructure. Here's a review of the things that Tool processing must resolve:

Optimization: I can imagine that the higher-level IAPAT logic may end up considering a given Tool several times before settling on the Tools to configure. It's likely that caching the results of the tool's cogitations may be a significant speedup. There are other things that should be cached as well: the GNU C Tool runs the compiler to determine the version and there's no need to do that more than once (per full path name of the compiler). And the probe for executables is likely to be repeated in different IAPATs, so that could be cached.

exists/generate v. detect/apply

Currently, the Tool module is probed when it is encountered to see if the tool exists. That testing needs to be extended to detect if the command variable is present and if the command name is one recognized by the module.

The command variable may be a space-separated list (think of LEX='lex flex'). In this case, the command names must be tried in succession and the first match chosen. We'll need a convention to deal with spaces in a command name; maybe it should be cracked with shlex().

(((TODO: It's not immediately obvious what to do if the command name is recognized but the command doesn't exist, or if multiple modules recognize the tool name. This is the least-baked portion of this proposal; there will need to be a lot of detail resolved before it can be implemented.)))

Not all tools are cross-compilers, so only some tools will need to do anything about it. At the risk of oversimplifying, Tools that support cross-compiling need to apply the appropriate cross-compile prefix (and suffix?) to their command name(s) when detecting if the tool exists and when constructing command lines. Otherwise, there's little the Tool modules themselves need to do to support cross-compiling (most of the pain is elsewhere).

Tool setup is in two stages. The first stage is when the tool is configured into the IAPAT. The second stage is when the IAPAT is applied to an Environment. The first stage is only done once, while the second stage may take place any number of times. The idea is to make the second stage as fast as possible, so the trade-off will be to get as much as possible done during the first stage.

During the first stage, variables are set in the IAPAT. (Note that the override variables for an IAPAT are distinct from the environment variables, so the Tool module will always be able to set the environment variables intelligently.) Probably iapat['BUILDERS'] should be filled in, but there may be no need to set up the Builder methods. There doesn't appear to be much else done by Tool modules; anything else should be considered on a case-by-case basis.

During the second stage, environment variables are copied to the new Environment and Builders are set up. There may be other tool-specific initialization behavior that will have to be considered on a case-by-case basis. (The strategy I would recommend would be that Tool modules could nominate a function to be run during phase two.)

In general, a Tool module sets up one or more command variables. Unlike the current semantics, where the last Tool configured is the winner and sets up its construction variables, the new semantics should be that the first Tool configured for a given command variable should be the winner. Probably the easiest way to do this is for the IAPAT to keep a list of configured command variables and for each Tool to check the list to see if it should configure a given variable.

If there's no override for the command variable (or if it's empty), the processing is very similar to the existing SCons machinery. The exists() method of the module is probed to see if the tool should be configured; these methods would have to be enhanced to detect if the command variable has already been configured. If it hasn't been, the method should determine the command name by checking to see if it exists in the PATH and return the name. (This is what most of them already return; it's not much of a leap to require it in all cases.) [Note: cross-compile] [Note: return value]

If the command variable is overridden, it's a space-separated list of command names. (This usage is not common, but autoconf supports it for things like LEX='lex flex'.) In this case, the exists() method must check each name in the list against the names it recognizes. If it recognizes a name, it should return that name. [Note: cross-compile] [Note: return value]

[cross-compile]: Tools that support cross-compiling must deal with the cross-compiling prefix and suffix. When generating names, they must add the prefix and suffix to the name they use; when recognizing names, they must strip the prefix and suffix (if it exists) before checking to see if it's a name they know.

[return value]: A possible alternative is that the exists() method return a bound function (or wrapper of some kind) that will generate the environment variables for the specified command name, but this is an implementation detail of the API that should be worked out later.

xxxxxxxxxxxxxxxxxx TODO: got to here

If the command variable is overridden, it's a space-separated list of command names (not common, but think of LEX='lex flex'). Each name in the list is tried in turn to see if it exists in the PATH using the module's cross_compile_prefix member to know if the prefix should be applied. (Since I anticipate that this scan will be employed heavily, the result should be cached.) If none of the names are found, the method returns failure.

If a valid command is found, the module is asked if it recognizes the root name. If it doesn't, the method returns failure. Otherwise, an object is returned wrapping the object that will cause that command name to be generated when requested.

This requires five changes to each Tool module:

In these examples, we're going to assume a Tool module named 'xxx' whose command_variable is 'XXX' and which sets up a context variable named XXXFLAGS:

Processing II

The essence of the merged flow between SCons and GNU configure is in the ApplyModule method. (The flow for the other Apply* methods is more complex than suggested above as well, but let's focus on ApplyModule here and worry about the others later.)

The command variable that this module sets up is defined by the command_variable member of the module; it is set when the module is initialized. To avoid setting up the command multiple times, the command variable is checked to see if it's already in the context. If it is, we just return success.

If there's no override for the command variable (or if it is empty), it's very similar to the existing SCons scheme. The exists() method in the module is probed to find the tool's command name if it exists (this is what most of them already return; it's not much of a leap to require it in all cases). If the command doesn't exist, failure is returned; otherwise, a wrapper around the module is returned that will generate context variables for the specified command name.

If the command variable is overridden, it's a space-separated list of command names (not common, but think of LEX='lex flex'). Each name in the list is tried in turn to see if it exists in the PATH using the module's cross_compile_prefix member to know if the prefix should be applied. (Since I anticipate that this scan will be employed heavily, the result should be cached.) If none of the names are found, the method returns failure.

If a valid command is found, the module is asked if it recognizes the root name. If it doesn't, the method returns failure. Otherwise, an object is returned wrapping the object that will cause that command name to be generated when requested.

This requires five changes to each Tool module:

In these examples, we're going to assume a Tool module named 'xxx' whose command_variable is 'XXX' and which sets up a context variable named XXXFLAGS:

Original processing

In this section, we're going to consider configuring a tool named 'xxx' whose command variable is 'XXX' and which sets up an environment variable named XXXFLAGS to contain its command-line flags. Not all of the logic described here would go in a tool module; some of it would be contained in the higher-level logic described above.

Let's look at what a tool module must do:

Let's consider what has to be done to configure the 'xxx' tool.

If the command variable (XXX in our example) is present as a keyword, it contains a space-separated list of names (think of LEX='lex flex'). (If a list is given, that should work, too.) This list overrides the normal list otherwise used.

If the command name is not overridden, then each module is tried in turn. The module knows the command name(s) that should be tried. Each name is tried in turn, and if one is found, the module is used to configure the tool.

If the command name has been overridden, each name is checked to see if it is in the PATH. If it is, each module is polled to see if it recognizes the name. If a match is found, that module is used to configure the tool, otherwise a generic configuration is used.

If the tool compiles to binary code, the cross-compile prefix is prepended to the name when searching the PATH and setting the command variable. (The GNU convention is that cross compilers have a prefix similar to the canonical name in front of the command name.)

If the tool cannot be configured, an exception is raised. (??? or return an error? ???)

Examples:

Extensibility

The default toolchains can be changed by assigning a new list of toolchain names to the appropriate member in the platform configuration information.

The default modules to examine for a toolchain can be changed by assigning a new list of module names to the appropriate index for that toolchain in the platform configuration information.

The set of known toolchains can be extended by adding a list of module names to a new entry of the index of toolchains.

Backward compatibility

This proposal moves SCons much more toward the set of GNU canonical names. Although the --build command-line option would default to the current SCons platform identifier, this has the risk of causing some subtle backward incompatibilities.

If no InformationAboutPlatformAndTools instance is passed to the Environment constructor, a default is used. This is somewhat different from way an Environment is currently instantiated, so it has the risk of causing some subtle backward incompatibilities.

For compatiblity with GNU autoconf, the default set of toolchains is configured with **ARGUMENTS for the keywords, so command-line overrides will be will be used during toolchain setup. A SConscript that already uses an environment variable defined by a tool will perform differently.


Add comments and fixes below here.

PlatformToolConfig (last edited 2008-07-19 21:38:15 by GregNoel)