Prologue

Until now, I have made the bootloader and the entry point of the protected mode in the assembly language. This may make you uncomfortable, but it is inevitable for booting. However, now, I have finished the processor booting, and the processor is the protected mkode, so I am ready to write C/C++ code. In this post, I will explain how to write a C++ code and integrate it to “Dragon Slayer” image. What I will do is to

Add a C++ kernel source code.

Build the C++ kernel.

Attach the C++ kernel to the end of the entry written in the assebmly code (POST6).

Jump from the entry to C++ kernel.

Do not be hurry. Step-by-step, I will show how to implement it and explain it as much as I can.

Git repository

How to Create a C++ Kernel

Currently, I wrote a assembly code, and to build it, I used NASM. NASM generates a binary the processor can execute. However, C/C++ could not be built by NASM, but it requires a compiler and a linker. The compiler creates an object file, which has information for memory and sections, and it is called as compile. The linker connects the several object files and libraries, and the output is an executable file. This step is represented as “link”. The filgure 1 shows the build procedure of an assembly and a C/C++ language.

*Figure 1. Build procedure”

Run C/C++ Code

There are three constraints to run C/C++ code.
The first constraint is that the kernel should not use any C/C++ libraries. Right after booting, the kernel has a minimum environment to run itself. It does not have any C/C++ libraries, which can run printf, cout, and etc. When the kernel become stable and all neccessary things are prepared, the kernel can load libraries and an application can use them, but not now. Therefore, for the kernel code, I only use what I implement in the kernel.
The second one is that a location of the kernel is 0x10200. The bootloader is started at 0x0000, and the entry point of the protected mode is located at 0x10000. The size of the entry is 0x200 (= 512 byte), so 0x10200 is the head of the rest space. During the implementation of the kernel, it is important to locate the kernel right after previous component. When a global vaiable or function of C/C++ is referenced, its address is translated to the linear address. The start address is used for the translation. The code 1 shows how to access a variable in 2 cases, 0x0000 and 0x10200 for the sstart address.

{% highlight cpp %}
// C++ code
int g_iIndex = 0

void AddIndex(void)
{
g_iIndex++;
}
{% endhighlight %}

{% highlight asm %}
;; Assembly Code
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Loading at 0x0000
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
g_iIndex: DD 0x00000000 ; Assume that
; g_iIndex is located
; at the first address

AddIndex:
…

mov eax, dword [ 0x0000 ] ; Read g_iIndex and store it
; at EAX
add eax, 1
mov dword [0x0000], eax ; Store new g_iIndex value
; to the address of g_iIndex

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Loading at 0x10200
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
g_iIndex: DD 0x00000000 ; Assume that
; g_iIndex is located
; at the first address

AddIndex:
…

mov eax, dword [ 0x10200 ] ; Read g_iIndex and store it
; at EAX
add eax, 1
mov dword [0x10200], eax ; Store new g_iIndex value
; to the address of g_iIndex

{% endhighlight %}
Code 1. How to access a variable

For the last, The final image should be a fure binary file. Genreally, when GCC generates a executable file, its format depends on the specific OS. ELF and PE file formats are popular examples. These format includes special information to help make better on the specific OS. However, GCC does not know what “Dragon Slayer” is, and the additional information for other OS in the format is not necessary for “Dragon Slayer”. Therefore, I will not load the information to the “Dragon Slayer” image, only code and data will be inserted to the image in a binary type.

How to compile C++ code without a library

Normally, when GCC only compiles a source code, the option “-c” option is used. This option only generates an object file. If the C++ file name is “test.cpp”, then the output file is “test.o”. If you want to change the name, you can use the “-o” option. Also, as I explained, the kernel should not use any libraries, so “-ffreestanding” option is necessary. “Fressstanding” means, it does not use any libraries and it operates by itself. Additionaly, GCC should know the target of the kernel is 32 bits protected mode, so the “-m32” option is mandatory. The code 2 shows how I compile the kernel code with GCC.

{% highlight bash %}
x86_64-pc-linux-gcc -c -m32 -ffreestanding Main.c
{% endhighlight %}
Code 2. How to compile freestanding object

How to link object files

Link is a task to merge object files in order to create a executable file. In this step, sections are relocated in a file, and the entry of first operating code and the loading address are defined. Why do I relocate sections? The reason is to remove information for debugging and symbol. These are not necessary for custom OS, such as “Dragon Slayer”. The elimination is done by stacking a code and data in an order.

Relocate sections by linker script

A section is a component of an executable file and an object file, and it stores code, data, symbols, and debugging information. The main sections in the executable file and the object file are .text, .data, and .bss.

.text is a section to store executable code, such as main() and any functions. During program runtime, the code is not changed, so this section is Read-Only.
.data is a section to store initialized data. If a global variable is initialized or a static variable is initialized with non-zero value, they are stored in .data. Because this section is for data, .data is able to Read/Write.
.bss is a section to store uninitialized data. It does not take an area in the executable file and object file. However, when the files are loaded to the memory, .bss is initialized with “0”.

After compiling source code, the object file has the size of each section, and offset information, not actual address, because it is an intermediate deliverable. According to the merging order, the addresses of the sections can be changed as much.
Merging the object files and determining where the image will be loaded are responsibilities of linker. The figure 2 shows how linker does link

Figure 2. How linker works

The main role of the linker is to merge the object files, manipulate the address of the each section, and connect an external library to a function. These are not easy task, but I do not need to implement them. For linking, linker script is used, and it is similar to something GCC already made. Linker script is /BuildTools/cross/x86_64-pc-linux/lib/ldscripts/elf_i386..x. If you open it, you can recognize some patterns like code 3.

Code 3. Template of linker scipt

linker script has the name of the section, the nameof the integrating section, and criteria of sort, and initial value of the section.
Now, I will introduce how to use the traditional linker script. At first, copy elf_i386.x to kernel32 directory. Relocation for sections are done by moving code and data sections1 to the front. To make it simple, I align each section at one sector by ALIGN (512). By the ALIGN instruction, the size of section becomes 512 bytes. The code 4 shows how I modify elf_i386.x to run “Dragon Slayer”. Bold is what I chnaged.

{% highlight ld %}

{% endhighlight %}
Code 4. elf_i386.x for “Dargon Slayer”

Then, I can make an executable file like code 5.

{% highlight bash %}
{% endhighlight %}
Code 5. Making an executable file

Define the loading address and the entry

It is important to define the loading address of the C++ code, because OS uses the linear address. The address is used to translate the linear address to the physical address.
There is two ways to define the loading address. The first way is to modify the linker script. The code 6 shows how to do it.

{% highlight ld %}
{% endhighlight %}
Code 6. How to modify linker script to define the loading address for the .text section

By modifying the address of the .text section, the addresses of .data and .bss are located at the end of the .text section, automatically. The bootloader is loaded at 0x10000, and the size of the bootloader is 512 byte (= 0x200). Therefore, the loading address of the .text is 0x10200. The figure 3 represents the memory space.

Figure 3. The memory space

The other way is to give the location by using command line. The code 7 shows how to do it.

{% highlight bash %}

{% endhighlight %}
Code 7. How to define the loading address by command line

-Ttext is the option to set the loading address.
If “Dragon Slayer” is based on Linux or Windows, it is necessary to set an entry point function. This can be done by also linker script or command line. The code 8 shows how to set the entry point function in two way.

{% highlight ld %}
…

OUTPUT_ARCH(i386)
ENTRY(Main) // Main is the entry point function
SEARCH_DIR(“~/BuildTools/cross/x86_64-pc-linux/lib”);

…
{% endhighlight %}

{% highlight bash %}
{% endhighlight %}
Code 8. How to set the entry point function

However, by doing code 8, I can only avoid an warning message, but the entry point function is not set. The entry point function is worth, when it is an executable file. The output of this work is a binary image. When an executable file is transformed to a binary image, information for the entry point is eliminated. So, how can I define the first function in C++ code? When the entry of the protected mode end, the processor jumps to the 0x10200. How can I put the specific function at that address? The answer consists of two steps.

Put the specific function at the begin of the file. When a compiler compiles a source code, it follows the order how the functions are located. Usually, the earlier code will be compiled and located early.

In the Makefile, an object file including he specific funtion should be used as the first input for the linker. Linker also makes an executable file with the order of the inputs.

Transform the executable file to the binary file

As I already eaplained, the final image should have only a fure binary. Objcopy is an useful tool to do this. Objcopy is a tool to transform a format(or object) to another format(or object), or extract some specific sections and make a file with them. It is involved in binutils, which I build in a previous post (post2).
It has many options, but necessary options are -j, -S, and -O to make “Dragon Slayer” binary image. -j option is to extract a specific section, -S option is to remove relocation information and symbol, and -O option is to define the format for the output. The code 9 shows how to make a binary image with objcopy.

{% highlight bash %}

{% endhighlight %}
Code 9. How to make a binary image with objcopy

Merge Entry Assembly and C++ code

I have finished to prepare for C++ code environment, and now I will implement C++ code for the protected mode. This is the first step for C++, so it will just print a greeting message.

Add C++ Source File

Befor writing C++ code, I will wrtie header file for the entire protected mode. This header file defines basic data types and data structures. The code 10 is how I implement Types.hpp.

{% highlight cpp %}

{% endhighlight %}
Code 10. Types.hpp

CHARACTER is the character structure to represent an alphabet character. It consists of what a character is and what a color is (post4). Before and after CHARACTER, #pragma pack wraps it up. This pragma aligns the size of the data structure as 1 byte, so it does not make any additional memory space.
Now is the time for Main.cpp. The code 11 shows the Main.cpp, which prints greeting message.

{% highlight cpp %}
{% endhighlight %}
Code 11. Main.cpp

As I explained before, main() function is located at the begining of the Main.c file. In main() function, I uses kPrintString() function to print the greeting message, “C Language Kernel Started~!!!”. After printing the message, the kernel executes the infinite loops with while(1).
kPrintString() function receieves point of X, Y, and a string. X and Y are the first points of the string. They are used with the base video address ( = 0xB8000) to show a character in a monitor.

Modify Entry Code

At the previous post, the entry of the protected mode prints the greeting message, and executes the infinite loop. I made the C++ kernel code, so the entry should jump to the C++ kernel. This is done by changing CS segment selector, and setting a linear address by using jmp instruction. The code 12 shows how the entry is changed.

{% highlight asm %}
{% endhighlight %}
Code 12. Entry.s

Makefile for Kernel32

Now, I have to make a Makefile which can use the assembly and C++ languages. To make it useful, I will use some features of make.
Make supports a funtion which make a list of C++ file in the directory with wildcard. Later, I will add many C++ source code, os this is very useful. The code 13 shows how to use wildcard.

{% highlight asm %}
{% endhighlight %}
Code 13. Wildcard

By using wildcard, I have a list of C++ files, and they are built by GCC. The code 14 shows the pattern rule of make to build C++ source code.

{% highlight asm %}
{% endhighlight %}
Code 14. The pattern rule for building C++ with GCC

With wildcard and pattern rule, I can compil all C++ source files in a directory. How about lnk? To link objects files, I need to know the file names. This is done by patsubst. Patsubst substitues a specific string pattern to a target string pattern. I have a list of C++ source file, and by using patsubst, I can have a list of object files. The code 15 shows how to use patsubst.

{% highlight make %}
{% endhighlight %}
Code 15. How to use patsubst

These are done automatically, but I have an improtant rule to build a entry point function at first. Therefore, I need to make main() build first. How can I give an order to make? To solve this problem, I use subst in make. I know which file has the entry point function, main(). So, I give a special name for it, and other is in the list. However, the patsubst and the pattern rule will take the entry point file which has the entry point function. Therefore, I will remove the entry point file from the lists. The code 16 shows how to remove a special file from the list by using subst.

{% highlight make %}
{% endhighlight %}
Code 16. How to build the entry point file first( = Main.cpp)

This should be adoted to the assembly code. What I should do is to change .cpp to .asm, and add an option -f elf32 to NASM. GCC generates an object file with ELF32 format, so NASM has to generate annn object file with the same format to link them.
One of useful features of make is detecting an updated file. If a file is not updated after last compile, make does not compile again. However, a header does not receive the grace of make in default. To let make know, I will scan all files in a directory, and make a report for dependencies by using -MM option for GCC. -MM option make an output for related header files, except for system hearder, such as iosteam. The code 17 shows how to make dependency report .

{% highlight bash %}
{% endhighlight %}
Code 17. How to make a report for dependencies

Dependency.dep file has the dependencies of all C++ files. If it exists, it is helpful to make right image. How about the file is absent? It will generate the error. Therefore, I add check condition for Dependency.dep, like code 18.

{% highlight make %}
{% endhighlight $}
Code 18. Check the existence of Dependency.dep

For the last, -C option helps me to create an output file in a specific directory.

The final Makefile will be available in the git repositoriy.

Build and Run

Now, I have almost done. However, it does not work, because the size of the image. QEMU can only handle a sector aligned image. The size of my final image is TODO, so it is less than 2 sectors. To solve it, I need to fill the rest space with 0 until 2 sectors. Now I will make a simple program which fills the rest space with 0, detect the size of image, and change the TOTALSECTORCOUNT. in “BootLoader.asm” automatically.

Image Maker

This Image maker is auto tool to check the number of sector, replace the numver of sector in “Bootloader.asm”, and copy the sectors to the image. Before doing this, I need to know where TOTALSECTORCOUNT is located in a “BootLoader.bin”. It is simply checked by using hex editor. To check my “BootLoader.bin”, I use VIM. The code 19 shows how it looks like.

{% highlight hex %}
{% endhighlight %}
Code 19. Hex editor window for BootLoader.bin

You can find how I implement it and more detail information from the git repository.

Result

This is the last. Now, I will implement Makefile for the root directory. It does not require many changes. The code 20 shows what I changed.

{% highlight make %}
{% endhighlight %}
Code 20. Makefile for the root

The figure 4 shows the result of this post.

Figure 4. Result

Epiloge

Alright. This post is longger than previous one. As being long, it is important. It explains how to make a C++ kernel for the first step. Even though I can write a C++ source code, I need to write an assembly code for a while, because “Dragon Slayer” is still in an intializing step. Cheer up. This is tough time, but I learn many things about computer stuff. It will lead me the nice future..

.text, .data, .bss, .rodata ↩

For Veritas

2016년 2월 19일 금요일

7. Writing a C++ Kernel Code