Welcome to the Google Summer of Code for 2007
This is the SCons idea page. If you're new to SCons, let us introduce ourselves to you. If you're interested in submitting a proposal, look at our proposal requirements. There's also a copy of the announcement that we sent out originally about our participation.
Otherwise, just dive right in! Each idea listed below has only a short description of the task; if there's more information beyond the few paragraphs here, it will be in a separate page. And if you think of an idea that's not listed here, don't hesitate to suggest it---just bring it up either on the developers' mailing list or on the mentors mailing list. SCons is a general-purpose tool, so we don't have as concrete a list of objectives as a more focused project would have. Moreover, it's not possible that we've thought of all the good ideas!
- Fix bugs
- Glob() function
- Automake and WAF
- SubstInFile builder
- Revamped option handling
- Smarter treatment of intermediate targets
- Don't rebuild if only comments differ
- Shed Skin Python-to-C++ compiler
- Performance Improvements
- Distributed Compilation
- Better Java Support
- Improved Target-File Caching
- Another idea
As with all projects, bugs and requests for enhancements and new features accumulate. Many of them only involve a couple of weeks worth of effort, but the primary developers are busy enough that there's just not enough time to get to them. Dealing with a few small bugs, a couple of medium enhancements, or a single large new feature (and including tests and documentation) is a tremendous way for inexperienced programmers to get their feet wet and gain experience.
The idea here is to choose a reasonable set of bugs to tackle: not too hard and not too easy. You can get advice from the mailing list as to the difficulty of various bugs, but judging how long it will take to do a task is one of the skills that young programmers must develop.
The issues database contains the list of defects that would be the starting place to look for bugs to fix. It also contains the requests for enhancement and new features that can also be mined for potential ideas.
The overall problem is that a SConscript doesn't generally know what the current directory is when it's run, yet it would like to scan for particular names. Moreover, the names may not exist yet, but may be names known to SCons as something it plans to create. The idea here is to write a function (named Glob()) that scans the SCons Nodes, the build directory, the source directories, and any repositories to find all the names that match. Bug#512 describes one instance of this problem.
The BuildDirGlob page offers a starting point, but the solution there only looks at Nodes (and isn't terribly optimal). It needs to be extended to merge in any names found in the actual filesystem. It also needs to be optimized and have documentation written.
This is not a large or complicated task. By itself, it's not sufficient to support a summer's worth of effort, but it could be grouped with other small tasks to fill up the time.
Automake and WAF
For the most part, SCons is very make-like. If you know make, many of the SCons commands feel very familiar. There are commands to make a program out of a set of source files, commands to make libraries, and so forth; for the most part, they are direct transliterations of the equivalent make incantation.
GNU has a tool called automake that takes a description of a build and translates it into a makefile for make to process. In the GNU world, automake is considered the higher-level interface, but in reality, they are both just ways to generate a graph of the build. The graph-runner happens to be built in to make, so automake must create an intermediate file to get the graph to the graph-runner.
The idea here is to have SCons understand the automake model. There appear to be at least two different ways to tackle this need. Neither seems to have an overwhelming advantage over the other; in the long run, both could be supported. Both are difficult, high-risk, high-payoff tasks, and should not be undertaken lightly, but for someone with the right skill set, it could be a fun challenge.
Source files for automake (called .am files because of their most-common suffix) are a well-understood way of describing a build. The most direct approach would be for SCons to parse something very similar to a .am file. If the language understood by SCons was close enough to that understood by automake, it would be possible for an automake user to convert to SCons with a minimum of hassle.
So the task here is to analyze the automake language and create a parser that directly generates the build graph in SCons. Note that the automake language seems simple enough that a full-fledged parser is probably not needed; all that may be required is a scanner to crack the input lines, supervised by higher-level logic to put the pieces together.
WAF is technically a fork of an older version of SCons, but it's been rewritten so extensively that, for all practical purposes, it should be considered a separate code base. It has no documentation, but the samples suggest that it has a model very much like automake: there are generators that yield objects of various stripes (programs, libraries, and so forth), and attributes (sources, compile flags, and so forth) are added to the objects to describe what they should do.
So the task here is to analyze the WAF codebase and examples, determine how to add its features to SCons, and do the integration.
This is a builder that takes a file containing a template and fills it in using a dictionary of names and values. There's a starting implementation in SubstInFileBuilder. This project would turn that into the be-all end-all of template builders. Here's some ideas for how it could be better:
- Currently it just does simple value substitution; it could do full python expression substitution (although that may be overkill?).
- It could take prefix/suffix arguments so template variables could look like %FOO% or $(FOO) or whatever people want. Handling parens like this means maybe handling nested parens in the expressions. Even should allow for no prefix/suffix.
- It should be able to take a construction environment as the dictionary, so construction variables get expanded. This shouldn't be the default because it's a little unsafe (you never know what stuff might appear in an Environment in the future and break your build).
- There could be a better interface for passing the substitution dict than just overriding SUBST_DICT; apply some creativity here! A Value node or something possibly?
It also needs a test suite and documentation. There's some documentation in this wiki at DeveloperGuide/TestingMethodology.
This is a reasonably-sized project, and well-contained so it should be very easy to integrate the result into the release with minimal disruption. It shouldn't take more than two or three weeks: a week for preparation (exploring ideas, specification, writing some tests), a week for implementation and testing, and a week for documentation and fixing bugs.
Revamped option handling
SCons has three completely-independent mechanisms for providing variations in a build:
- An Optik parser for command-line flags. This is a fine system, but it's not extensible to allow flags to be declared after the initial parse.
- A home-grown parser for dealing with variable assignments on the command line and in files. It isn't bad and probably is a reasonable basis for expansion.
A configure subsystem that extracts information from the machine and operating system. It currently has only a few actions implemented, and its integration with other components isn't as smooth as one would like.
All of them have in common that they convert external information into an internal form, so that user code (or even system code) can make decisions based on it. All of them have ways to validate the external information, change the shape of the external information into a more palatable internal form, make decisions based upon the values, and usually have the effect (or sometimes side-effect) of changing the values of variables in the construction environment. The first two also have in common that they want to provide help text to the user upon request.
The idea here is to integrate the three mechanisms, or, at worst, integrate the first two and provide a foundation for the third. Ideally, it should be fully backward-compatible with the existing Options() functionality.
There's been some discussion on the mailing list about this, and a (partial) specification is available. However, there are still a few points for which no consensus emerged; these would need to be resolved. The most likely way to tackle it would be to translate the single-character flags into their token equivalents as a special case so that all the later processing only has to deal with tokens.
Mentor(s): Probably GregNoel
Smarter treatment of intermediate targets
Bug#583 is about implementing something similar to the .INTERMEDIATE and .SECONDARY special targets in GNU make (see the description in the manual). A SECONDARY target is only built if it is needed by a downstream target that's being rebuilt. An INTERMEDIATE target is a SECONDARY that is automatically deleted once all downstream targets using it are successfully built.
This idea will need four things to be implemented:
- The Intermediate() and Secondary() functions to mark Nodes appropriately.
- The logic to delete an intermediate target when the need for it is done.
- The logic to figure out if a target is up-to-date even if intermediate or secondary targets are missing.
- The logic to rebuild missing intermediate targets if a downstream target needs them.
Don't rebuild if only comments differ
If only the comments in a file change, there's no need to rebuild anything that depends on it. This can easily be accomplished by stripping out comments (and other extraneous content) when calculating the signature. Bug#193 discusses this, including some tips for implementation, although the approach suggested there may not be adequate in all cases.
Since this change would cause almost all targets to be rebuilt the first time the new release of SCons is run, this change should be packaged with other updates that do the same. The BigSignatureRefactoring page appears to be the meeting place for coordinating this.
Mentor(s): Probably StevenKnight
Shed Skin Python-to-C++ compiler
This isn't a SCons-related project, per se, but the potential benefit to SCons is so great that we'd like to call attention to it.
Shed Skin is a young project trying to create a compiler that will translate Python source into C++, which can then be compiled into machine code and operate at the fastest possible speed. The potential performance improvements are enormous and SCons could use such a boost. Using Shed Skin on SCons is still a long way away (years), but if Shed Skin isn't stressed by real-world applications, it will never become more than an academic exercise.
Shed Skin's focus is to create a compiler that produces a stand-alone program, completely eliminating any need for the Python compiler and run-time to be present. In contrast, SCons will continue to need Python to be present so it can run SConscripts, and will need to be able to freely call back and forth between compiled and interpreted code, so the focus is very different.
So SCons is a real-world application that stresses areas of Shed Skin that might not otherwise be on their roadmap. It's also more than an order of magnitude larger than the largest program they've ever handled successfully (about 35,000 lines of heavily-commented source, probably corresponding to about 25,000 lines of actual code).
So the idea here is to apply Shed Skin to SCons, study the breakage, pick out an aspect that would help Shed Skin, SCons, and the Python community at large (two out of three ain't bad), and apply to Shed Skin (via the Python umbrella project) to do the work. We can help in some aspects (isolating test cases, suggesting areas to stress, and the like; we're even willing to make some source code changes), but the supervision of the project itself would remain with Shed Skin.
SCons would get much wider adoption if its performance were improved.
* One idea is a SCons daemon that monitors filesystem changes. SCons could query this daemon to quickly learn what files have changed since the last build. The daemon might also keep the SCons environment in memory to make "startup" instantaneous. On Windows, the daemon could easily be implemented using public APIs (FindFirstChangeNotification, ReadDirectoryChangesW). On IRIX there is a bit of prior art somewhere--several years back someone prototyped something like this on IRIX using the fam utility to monitor changes to files--but it never became widespread.
* For big projects with multiple directories it would be nice to have an 'instant build' option that would just build the changed files in that directory, sort of as a syntax check before building and linking everything that depends on the changed files in other directories.
Any project to work on performance should describe how the project plans to measure the performance. It's not enough to just say, "I'll use the Python profiler," the idea is to discuss how you'd use it, what pieces you think you'll measure, how you plan to set reasonable performance targets and track progress against them, etc. Development of any necessary tools, subsystems, or other infrastructure for measuring performance is fine, but should ideally aim for tools that are generally re-usable and useful in the future, not one-offs just for the project.
Mentor(s): probably StevenKnight
A model for being able to parallelize software builds by distributing work across a network of homogeneous machines would benefit many large projects. One possible approach would be to try to integrate (and generalize) some functionality like distcc, perhaps similar to the way that SCons has already integrated some ccache-like functionality. Another more ambitious approach would be to try to make use of existing grid computing systems. Since the scope of this project is only a summer's worth of coding, there probably won't be enough time to solve this big a problem completely, so a better idea is to try to carve out a manageable first step in your proposal.
(Note: Incredibuild recently added support for distributed SCons builds, which would meet the needs of some users.) (Note: Incredibuild is a closed source commercial Product, so it is no option for open source software)
Mentor(s): probably StevenKnight
Better Java Support
SCons' Java support gets dinged by Java programmers for being relatively limited to one particular model: building entire subdirectories of .java files into one or more .jar files. The Java support could use some refactoring (even redesign) by someone who's familiar with the needs of large-scale Java projects, and has an itch to try to use the underlying flexibility of SCons to do better than what's out there. There are a number of different pieces that could be carved out as well:
More Ant-like Behavior?
Ant is obviously the standard for Java compilation. As a general dependency tool, it leaves something to be desired, largely because so much of the nitty-gritty Java dependency management is actually in javac, not Ant. Nevertheless, Java programmers are familiar with the Ant model, and to the extent that we can find SCons-like ways to adopt parts of it, it would help make SCons more attractive (especially to Java programmers who are part of multi-language software projects).
Port to Jython
Making SCons run under Jython, thereby allowing it to run under the Java VM, might also help make it more attractive to the Java community. Since SCons is all Python, it's actually not far from being able to do this, but there's a serious barrier with respective to Java's complete lack of a notion of a current directory and being able to chdir(). Working around that might involve some serious refactoring to completely eradicate chdir() calls from the SCons source.
Mentor(s): ??? Possibly StevenKnight
Improved Target-File Caching
The SCons CacheDir() function provides a framework for sharing built targets between developers, but it's relatively primitive. A Project to tackle one or more potential improvements to it would be extremely welcome. Possible objectives include:
- Administrative tools for managing the cache directory, including aging/clean up of old targets, limiting the size of cache, etc. All subject to configuration options, of course.
- Some mechanism for locking write access to target files when multiple clients try to create the same target--or better, some distribution of locking logic so clients can detect that some other client is in the progress of building a target file that will end up in the cache when finished.
- Measure the efficiency and scalability of caching and address any issues so that it can handle thousands of source files in a tree and tens of thousands of targets.
Mentor(s): probably StevenKnight
Idea subhead one
Idea subhead two