C programming in Mkos version p1
================================
I started writing Mkos in assembly only, but decided early in the
project I'd like to both be able to write kernel code in C, and to
support C as a user space language. Let's see how C programming is
supported in Mkos version p1.
The C standard library
----------------------
All standard library code is written for Mkos; it does not use a 3rd
party standard library implementation. The standard library is
incomplete; I'm writing each part of it as it's needed for the OS.
Overview
--------
The Mkos build system uses the 16-bit
`OpenWatcom `__ toolchain to build a C
program. The wcc compiler compiles C code to Relocatable Object Module
(OMF) object files. The wlink linker links the OMF files and emits a
program image.
After linking, a patch tool populates a small header in the program
image with data the kernel needs to load the program.
Compiling
---------
The wcc compiler takes C programs as input and compiles them to OMF
object files. This is the set of wcc options we use that affect the
output object file. These options are common across the different areas
of the code base (kernel code, user code, etc.).
``0``: emit 8088/8086 instructions
This option makes wcc emit only 8086 instructions, which is required for
our goal of being backwards compatible to 8086.
``od``: disable all optimizations.
The rationale behind disabling optimizations is they may make the
generated code diverge from the source code in ways we don't expect,
making debugging more difficult. As this stage of development, we want
to accept larger code size and suboptimal performance in return for a
more straightforward debugging experience.
``ms``: small memory model.
OpenWatcom defines several memory models, each of which refer to a
specific combination of code and data models. The small memory model
uses small code and small data, so that's the option we use.
``s``: remove stack overflow checks.
By default, wcc emits code at the beginning of every function to call a
function called ``STK`` to check for a stack overflow. The ``STK``
function is presumably defined in the libraries OpenWatcom includes for
its supported OSes. Since we don't use those, it is undefined and wlink
fails. Enabling this option removes the stack overflow checks.
``we``: treat all warnings as errors.
This option causes wcc to exit with an error code when it generates a
warning. A wcc failure in turn causes the entire Mkos build to fail,
making it easier to find bugs at build time.
``zl``: suppress generation of library file names and references in
object file.
By default, wcc inserts into the object file the names of C libraries
corresponding to memory model and floating-point options. Since we don't
use the OpenWatcom libraries, this always causes wlink to fail. Enabling
this option stops wcc from placing these names into the object file.
``zld``: suppress generation of file dependency information in object
file.
By default, wcc inserts into the object file the names and time stamps
of all files referenced by the source file. This information is used by
the wmake utility. We don't use wmake, so we enable this option for a
cleaner object file.
``zls``: remove automatically inserted symbols (e.g. runtime library
references)
We don't use OpenWatcom's default library information or runtime
libraries, so we enable this option for a cleaner object file.
In addition to the options above, we use the ``ad``, ``adt``, and
``add`` options to control automatic generation of Make-style dependency
rules (.d files), which are used by Make in the `usual
way `__
to account for changes in C file ``#include`` directives.
Linking
-------
The wlink linker takes OMF files generated by wcc and links them
together, along with any of their dependencies and the MKZ header
template. The output is an intermediate program image. wlink also
generates a memory map text file for the program.
The wlink linker is driven by a system of directives. Each directive is
given by name, followed by any arguments it has. The ``option``
directive is used to set the values of more general options.
Directives maybe specified on the command line or in linker script
files. We specify the ``map`` option for every program on the wlink
command line. We use the resulting map file to build the final MKZ
program image. The map file is also useful for debugging. Kernel and
user programs have very slightly different linker scripts.
This is the linker script for the kernel:
::
output raw
format dos
option fillchar=0xde
option nodefaultlibs
order
clname DATA segaddr=0x1000
clname CODE
``output raw``
The ``output`` directive overrides the normal operating system specific
executable format and creates a raw binary image.
``format dos``
The ``format`` directive specifies the format of the output file. While
we aren't actually using a DOS file format, this directive is required.
Without it, wlink assumes an OS/2 format and begins output at an
undesired offset in the output file.
``option fillchar=0xde``
The ``fillchar`` option specifies the byte value used to fill gaps in
the output image. We change this from its default value of 0 to make it
potentially easier to identify these areas when debugging at runtime.
``option nodefaultlibs``
This option instructs wlink to ignore default libraries when searching
for any library files.
::
order
clname DATA segaddr=0x1000
clname CODE
The ``order`` directive specifies the order in which classes are placed
in the output image. Any class name not listed is placed after the
listed ones.
We make sure the ``DATA`` class is placed first so the information for
the program loader is in a memory location that the kernel knows at
`load <#loading>`__ time. The value of ``segaddr`` refers to the segment
address where the kernel starts.
This is the linker script for user programs. It is very similar to the
kernel script, with two differences.
::
output raw
format dos
option fillchar=0xde
option nodefaultlibs
option start=main_
order
clname DATA segaddr=0x2000
clname CODE
``option start=main_``
The ``start`` option defines the entry point for the output image. The
value ``main_`` corresponds to the ``main`` function in the C program
being linked. This is necessary for the wlink map file to report the
correct entry point address, which in turns is necessary to correctly
populate the `MKZ <#mkz>`__ data.
Finalizing the executable
-------------------------
The MKZ file format
~~~~~~~~~~~~~~~~~~~
.. raw:: org
#+CUSTOM_ID: mkz
The MKZ file format is the executable file format for Mkos. It is a
simple file format consisting of a small header, followed by the rest of
the program image generated by wlink. These are the fields:
====== ======= ====================================================
Offset Segment Description
====== ======= ====================================================
0x0 DATA 32-bit memory offset of program entry point
0x4 DATA Near address of the "return to kernel" instruction
0x10 DATA Program arguments count (argc)
0x12 DATA Array of pointers to program argument strings (argv)
0x0 TEXT Far jump back to kernel
====== ======= ====================================================
The first 4 bytes in the MKZ header ``DATA`` segment contain the 32-bit
absolute offset of the program's entry point at runtime. The next 60
bytes (total 64 bytes) in the ``DATA`` segment are reserved for the
program arguments (argc/argv, in C terminology). Both of these fields
are used by the kernel to load and execute the program. The first 5
bytes of the ``TEXT`` (code) segment are ``0x ea ``, which is a
far jump instruction. ``CS:IP`` is the 32-bit address of a subroutine to
re-initialize the kernel. This is how the user program transfers control
back to the kernel.
As mentioned above, we link an object file establishing the MKZ header
into every program. The MKZ header is first in link order, so the header
starts at the beginning of the program image at build time, and
therefore at the beginning of the user space data area at load time.
Then the kernel can refer to its data by absolute memory locations
during the load and execute procedure.
The information for the MKZ header isn't available at link time, so what
is linked is a template that merely reserves the required memory. After
linking, the build system uses the wlink-generated memory map file to
populate the MKZ header's program entry point field. The kernel
populates the other two fields at runtime.
Loading
-------
.. raw:: org
#+CUSTOM_ID: loading
The kernel provides only a single syscall to execute a program:
``exec``. ``exec`` replaces the process image in user space with the new
program to run. Then it executes the new program. The MKZ file format
supports the load and execute procedure.
The ``exec`` system call
~~~~~~~~~~~~~~~~~~~~~~~~
The ``exec`` system call takes the following arguments:
- ``path``: the path to the file containing the user program to execute
- ``argv``: an array of pointers to strings containing program
arguments
- ``argc``: the count of elements in ``argv``
It loads and executes the program as follows:
- copies the program image from the file specified by the ``path``
argument into memory, using the user segment as the load base.
- copies ``argc`` and ``argv`` into the corresponding area of the MKZ
header in user space.
- saves the kernel's stack (SP register) into memory in the kernel's
address space.
- initializes the user's DS, SS, and SP registers.
- pushes the address of the "return to kernel" instruction in the MKZ
header onto the user's stack.
- writes the address of the "return to kernel" instruction into the
user's data segment at the location referenced by the above pointer.
- sets the AX and DX registers to ``argc`` and ``argv``.
- jump to the program entry point, through the pointer stored at MKZ
header.
Now the user program begins executing. It has a working stack and can
access its ``argc`` and ``argv`` values. When the program reaches the
end of its ``main`` function, the ``ret`` instruction at the end of the
function causes the CPU to pop the address of the ``jmp`` instruction in
the MKZ header and jump to it (near jump). Then it executes that ``jmp``
instruction to return to the kernel (a far jump). At this point, the
user program has ended and the kernel restarts.