Foreword
--------

Since I started my work on maintaining the OS/2 port of gcc I've learned many
useful things about gcc internals (and even on RTFM) which many programmers
that just use gcc don't know, but which could be useful to them. Of course,
many of the information in this file is just my humble opinion, but I couldn't
resist the temptation to write such a document: a non-shared knowledge is
almost equivalent to its absence.

I will try to maintain this file as up-to-date as possible, adding new
paragraphs in the beginning of the list (when I find any interesting
information), and removing old entries as they become outdated by new
versions of gcc. It would be hard to me to maintain (and for you to read
new versions of this file) if I would sort them somehow by theme, so I'll
sort them just by date (in decreasing order). Thus when viewing a new version
of this file, you can look just at the first few entries (until you find
something that you already read).

--- 01 March 2003
    --- A note regarding compatibility with win32 ABI: by default the compilers
        on Windows will align double words in structures to double word
        boundaries. Such code executes faster on Pentiums and later but
        has the disadvantage of wasting memory and breaks compatibility
        with the published ABI for structures containing doubles. However,
        you still can enable this feature (for example, if you port Win32
        assembly code which refers to such member fields by numeric offsets)
        by using the -malign-double compiler flag. When using this flag the
        #pragma pack(4) will enable the old behaviour, and #pragma pack(8)
        will enable the expected behaviour.
    --- GCC has a very useful option to find why gcc cannot find its libraries,
        paths and so on. If you think it links with the wrong gcc.a, launches
        the wrong compiler (from a different path), you can run gcc with the
        --print-search-dirs switch and see where really it will look for all
        his subcomponents.

--- 24 November 2001

    --- The OS/2 port of gcc (and only it) has a new flag: -malign-strings.
        By default gcc used to align all string constants on a 32-byte boundary
        (!) which is quite too much IMHO. I've added a CPU-type-dependent
        alignment (4 bytes for [34]86 and 32 bytes for pentium and later)
        and also added the -malign-strings=x switch to override this default.
        `x' is a power-of-two, e.g. -malign-strings=5 will align on 32-byte
        bounds (default value with -mcpu=pentium). The compiler itself is built
        with -malign-strings=0, which alone made the C++ compiler cc1plus.exe
        about 150K smaller! It looks like there is someone in the GCC team
        who works for Intel, thus has the target to make gcc generate extremely
        bloated code. This 32-byte alignment is said to be `extremely' useful
        on Pentium III when doing inline strlen's and strcmp's. Phuf. If
        someone cares about that `gain', use -mcpu=pentium or -mcpu=pentiumpro.
        You can still optimize for pentium and not align strings (because you
        either know strlen speed is not critical or simply you don't do strlen
        on static strings) by using the -malign-strings flag (in fact, all the
        compilers were built this way -- most static strings here are just
        messages).

--- 20 November 2001
    --- This entry is going to be long, since its the first entry and
	I already have a lot of interesting information to share :-)

    --- gcc 3.0 has the annoying habbit to align its stack boundary to
        128 bits (e.g. 32 bytes). This generates generally larger files.
        To avoid this, use -mpreferred-stack-boundary=4 or whatever you
        like (the argument is in bytes). The 32-byte stack alignment makes
        sence only when using SSE instructions (but is always enabled,
        no matter whenever you use them or not).

    --- The default CPU type is Intel 386. You can generate faster (but
        generally larger) code by using -mcpu=xxx where xxx is one of
        (as of gcc 3.0.2): i386, i486, i586, pentium (=i586), i686,
        pentiumpro (=i686), k6, athlon.

    --- On all Intel CPUs later than i386 and Athlon the default alignment
        for functions, loops, jump labels is 16 bytes, and for K6 it is
        even 32 bytes!. This can generate lots of unused space (NOPs)
        in your executables, especially if you use lots of small functions.
        and many loops and labels. If you use -mcpu=xxx and you care for
        space more than for speed, use the -malign-loops=2 -malign-jumps=2
        and -malign-functions=2 switch, which will align everything on
        a double word boundary.

    --- The semi-legendary MMX and SSE support which is declared in
        documentation does NOT mean that gcc generates SSE or MMX from C/C++
        code. This just means you can use a number of special gcc builtin
        functions such as __builtin_ia32_emms(), __builtin_ia32_psllw()
        and so on instead of writing directly __asm("emms") etc. Of course,
        it is nicer to write MMX code in a C-looklike manner than directly
        in assembly language (especially counting that GAS uses the
        hard-to-understand for those who're used to Intel syntax AT&T syntax)
        but you have to get from somewhere the respective header files
        and/or docs (I believe they should exist somewhere in the Linux world).

    --- Since gcc 3.0.1-beta I added support for weak symbols in GNU ld and
        emxomf. This arised a number of hardly-to-solve problems, especially
        in OMF since OMF does not support weak symbols. I did a kludge (that
        works surprisingly well) but you should keep in mind some details
        about how they are implemented in order to avoid linking errors.

        Here is how it works. As many people know, gcc never generates OMF
        files directly. Instead, it generates a.out files which then are
        run through emxomf in order to convert them from a.out to OMF.
        Finally, in the link stage, LINK386 is run.

        Now, the weak symbol attribute has effect at link stage. Having a
        symbol marked as weak means that if there is another symbol with the
        same name and also marked as weak, a duplicate symbol error is not
        generated. Instead, linker randomly chooses one of them (usually the
        first one encountered). In fact, the "real" weak attribute also means
        that if somebody refers to a weak external symbol, and that symbol
        is not found anywhere, it is resolved to NULL. But this syntax of weak
        symbols is supported only in the a.out format, not OMF.

        Since I don't have any control on LINK386 (and no sources of it) I had
        to modify emxomf. The only way to do it was to keep a list of weak
        symbols of your project in a external file, which is not lost between
        numerous runs of emxomf. Suppose emxomf convers an a.out file to OMF
        and finds a weak symbol. Now it looks into that table: did we
        encountered this weak symbol already, or it is for the first time?
        If we encountered it, the symbol is converted to a non-exported local
        symbol (so that LINK386 won't barf about duplicate symbols error).
        If not, the symbol is marked as a normal symbol, and it is added to
        that external table, along with the object file name where it was
        encountered for the first time. The file name is needed so that when
        the same file will be re-compiled (for example after you modify the
        source) emxomf should detect that this symbol should be declared as
        external and not as local non-exported symbol, even that it is
        mentioned in the weak symbol table.

        This file is called weaksyms.omf. If you see such a file in your
        project's directory, it means your project uses weak symbols. Don't
        delete it, after some time it will chease to grow and it will contain
        a complete list of weak symbols of your project.

        If your project uses multiple directories, and your makefile changes
        current directory then emxomf can miss the weaksyms.omf file and start
        the list from scratch. In this case you can get duplicate symbols error.
        The solution is to maintain one unique weak symbol list by setting
        the GCC_WEAKSYMS environment variable to point to some file where
        all your weak symbols will be collected:

            set GCC_WEAKSYMS=d:/myproject/weaksyms.omf

        And a final remark. The libstdc++ contains a lot of weak symbols, and
        if your application uses libstdc++ emxomf should know in advance about
        them. For this, before starting the compilation of your project you
        must copy the weaksyms.omf file supplied with gcc (the file name is
        lib/gcc-lib/i386-pc-os2_emx/x.x.x/weaksyms.omf) to your project's
        directory.
