Porting R Packages to x64 Windows

As from R 2.11.0 there is an R-core distribution of R for x64 Windows (Windows running in 64-bit mode on x86_64 aka amd64 chips, as distinct from ia64 Windows for Itanium chips).

This is built using a snapshot of the mingw-w64 toolchain. To distinguish the compiler which targets x64 Windows from that which targets x86 (the usual 32-bit Windows) it is called x86_64-w64-mingw32-gcc rather than gcc, so any explicit references to the compiler need to be changed.

The rest of this document covers hints gleaned from porting almost all CRAN and many BioC packages.

Compiler version

The compiler used on x64 Windows for R 2.11.x is gcc 4.4.4, whereas gcc 4.5.1 will be used for R 2.12.x, whereas the 32-bit Windows build uses gcc 4.2.1 (but that for R 2.12.0 wil use gcc 4.5.0). The GCC compilers have developed quite a bit since 4.2.1 and in particular the C++ compiler is less lax in its requirements for standard-comforming C++ code. gcc 4.4.4 is now used for Linux testing and has shown up some issues, and toolchains based on gcc 4.5.x show more. So better C++ and Fortran code is required (although nothing like as strictly conformant as required by some other C++ compilers).

C/C++ types

On most 64-bit platforms, C type long has 64 bits, but not on x64 Windows. If you need an integer to hold a pointer, the correct C99 type is uintptr_t or perhaps intptr_t. Otherwise be careful to use the correct modern type such as size_t rather than long or unsigned long.

If you need a type of a specific size, use int32_t etc, and include the header stdint.h (see the next section).

One problem that arises is using printf and friends with types such as size_t which on other platforms correspond to unsigned long and so can be converted using the format %lu. ISO C99 has a modifier ll for use with long long, which is 64-bit on x64 Windows, but this is not implemented in MSVCRT.dll. What is implemented are modifiers I, I32 and I64, for size_t and 32- and 64-bit integers respectively. These modifiers are specific to Windows, so special-cased code will be required.

Mingw64 uses the *printf functions in MSVCRT.dll whereas recent versions of Mingw32 use an enhanced version which is statically linked as part of libmingwex. This means that a number of C99 features are unavailable on x64 Windows: one encountered in package igraph is format specifier %Ld for long doubles.

Headers

The C headers available in the x64 toolchain are similar to those used for 32-bit MinGW, but differ in a number of details.

Code which makes use of C99 types such as int32_t should include

#include <stdint.h>
With MinGW this header is often included as a side-effect of including other headers (e.g. wchar.h), but correct code will include it explicitly.

Standard Windows headers are present, but older Unix-alike headers might not be. One example was values.h (which has been replaced by limits.h in C99).

Identifying macros

Compiled code which needs to depend on the OS or compiler does so by code like
#ifdef SYMBOL
...
#endif
where the recommended symbol for Windows has been WIN32. That remains the recommended symbol as it is also defined by the x64 Windows compilers, which also define WIN64. Thus code which needs to differ by pointer size can be coded as
#ifdef WIN64
x64 Windows definitions
#elif defined WIN32
x86 Windows definitions
#else
other platforms
#endif

Problems have arisen where packages have imported code that uses macros that identify compilers rather than Windows, such as __MINGW32__ and _MSC_VER. If possible such code should be re-written to use the preferred macros. Something like

#ifdef WIN32
#define _MSC_VER
#endif
may work, but it may also activate unsupported features of the Microsoft compilers.

Makefiles

A package using src/Makefile.win or a makefile in some other directory (for example a sub-directory of src) will need to be changed to use the correct toolchain. To do so, add at the top of the makefile the line

include $(R_HOME)/etc$(R_ARCH)/Makeconf
and make sure that macros such as $(CC), $(CXX), $(DLLTOOL), $(AR) and $(RANLIB) are used rather than references to explicit tools.

Such a package will need to depend (in the DESCRIPTION file) on R (>= 2.9.0) or later.

This applies equally to packages such as mapdata which compile up an executable to convert data files. Whereas a 32-bit executable might be suitable, there is no guarantee that 32-bit compilers are available when a package is installed.

Note the $(R_ARCH) in the include line. From R 2.12.0 joint 32/64-bit binary packages will be distributed (in the same way as is done for Mac OS X), and to make this possible the 64-bit build is regarded as a sub-architecture (see the 'R Installation and Administration Manual') with name x64 and hence in particular the package DLL is installed in pkg_name/libs/x64. Packages which use configure.win to install dependent DLLs will need to be adjusted to use that path.

Those few packages (e.g. Rcpp, RInside and rJava) which install DLLs and static libraries elsewhere are advised to make use of this convention.

Make macros WIN (values 32 or 64) and R_ARCH are defined in $(R_HOME)/etc$(R_ARCH)/Makeconf (which is included automatically when Makevars.win is used). In addition, R_ARCH is set as an environment variable for use by configure.win. These will be unset in earlier versions of R, so code which uses them should take that into account.

Registry entries

64- and 32-bit Windows programs have different views of the system part of the Windows Registry, so there is unlikely to be confusion for those packages which look in the Registry for locations of software. For example, a 64-bit program will see the information on 64-bit Java, and a 32-bit program will see information on 32-bit Java.

This is equally true for the (optional) Registry entries added by the R installer. However, to enable 32-bit and 64-bit versions of R to be installed simultaneously, the Start Menu and Quick Launch entries have 'x64' added to the application name.

The position is less simple for non-administrative installs of R, for the portion of the Registry they use is shared by different sub-architectures. For ways to overcome this, see a recent version of Writing R Extensions.

Dynamic vs static linking

Some versions of the toolchain used prefer dynamic linking (as do Unix-alike toolchains and recent builds of MinGW). Thus the Fortran runtime may be in a DLL libgfortran-3.dll and the C++ runtime in DLLs libstdc++-6.dll and libgcc_s_sjlj-1.dll (and possibly others and with slightly different names). We have modified the toolchains used to build R and the CRAN binary packages to use static linking for these libraries, but that will not apply to all unmodified toolchains.

The use of dynamic linking has a number of consequences, the most important of which is that users of packages will need these DLLs in a place where they will be found. Another issue is the licence requirements for these DLLs. GCC is distributed under GPL, and this is considered to apply to these DLLs, so distributing them imposes the requirement of promising to supply their sources.

Dynamic linking raises the possibility of having 32- and 64-bit DLLs of the same name in the PATH at the same time, and experiment shows that Windows get confused by that, trying the first DLL on the path and reporting an error (and confusingly the mingw-w64 loader currently reports that the DLL is not a Win32 application — it means that it is not a Win64 DLL). We have used static linking of external libraries where possible as a conservatively safe strategy, otherwise where possible (libxml2, libcurl) arranged to install library DLLs in parallel to the package's DLL. This is not possible for Gtk2, and care with paths will be needed if it is necessary to use package RGtk2 with both 32- and 64-bit Windows R. One way to do this is set PATH in e.g. R_HOME/etc/x64/Renviron.site

Sources of precompiled software

Most precompiled software for x64 Windows has been built with the Windows Platform SDK (a version of Visual C++) or Visual Studio 2005/2008/2010. This often contains DLLs linked against later Microsoft runtime DLLs than the standard MSVCRT.dll, for example MSVCR80.dll, and such DLLs are often not installed in generally accessible locations (other software may have private copies).

Some Open Source projects are using mingw-w64 to target x64 Windows. For example, Gtk+ provides suitable DLLs at http://www.gtk.org/download-windows-64bit.html.

There has been a one-man effort to provide a comprehensive set of tools for x64 Windows called WPG System64. Its main site was www.cadforte.com, which disappeared for January/Febraury 2010, then returned and disapperared again in May 2010 (but see also www.horizonchess.com/wpg64/). This contains DLLs for libxml2 (used for packages XML and igraph) and curl (used for RCurl). It also contains graphviz which we have used to port Bioconductor package Rgraphviz. Note that DLLs from such projects may be dynamically linked against compiler-runtime DLLs: fortunately, this has not been encountered in the DLLs we needed to use.

A small subset of WPG containing enough of graphviz to use Rgraphviz at least for its own tests and CRAN packages using it can be found at http://www.stats.ox.ac.uk/pub/Rtools/goodies/Win64/. The rendering plugins should all work, but not all the visualization ones.

Note that there are cross-compilers from i686 and x86_64 Linux as part of the MinGW-w64 project, and these may well provide the simplest way to build external libraries. Typcially these will be need to be configured by

./configure --host=x86_64-w64-mingw32 --enable-static --disable-shared
This has been used for GDAL and its dependencies expat and proj4, and for SYMPHONY (with some modifications), fftw-2 (ditto), fftw-3, gsl, netcdf 4.0, mpfr and udunits.

Changes for R 2.12.x

Packages which have no src/Makefile.win and only an empty configure.win will automatically be built for both sub-architectures (32- and 64-bit) in R 2.12.x. Other packages are likely to be built only for 32-bit (which suffices for some such as FunctSNP and mapdata) unless extra flags such as --force-biarch or --merge-multiarch are set manually by the CRAN maintainers.

The mingw-w64 toolchain has now adopted the MSVC convention of not using leading underscores on symbols: indeed snapshots of `1.0' since late April 2010 use it, and R 2.12.0 will be built using such a toolchain (see Win64No_toolchain.zip for a current snapshot).

Toolchains which use the no-underscore convention will be incompatible with static libraries compiled under the previous toolchain (although direct linking to DLLs is unaffected). On the other hand, static and import .lib libraries compiled under MSVC will be usable. (R 2.11.x cannot be built with such a toolchain.) Software used for CRANxtras compiled with the no-underscore convention can be found at http://www.stats.ox.ac.uk/pub/Rtools/goodies/Win64No_/

This does mean that writing packages to link directly to external DLLs rather than to import libraries is more portable.


Last edited on 07 October 2010 by Prof Brian Ripley (ripley@stats.ox.ac.uk)