Operating System in 1,000 Lines - C Standard Library

  1. Intro
  2. Getting Started
  3. RISC-V 101
  4. Overview
  5. Boot
  6. Hello World!
  7. C Standard Library
  8. Kernel Panic
  9. Exception
  10. Memory Allocation
  11. Process
  12. Page Table
  13. Application
  14. User Mode
  15. System Call
  16. Disk I/O
  17. File System
  18. Outro

In this chapter, let's implement basic types and memory operations, as well as string manipulation functions. In this books, for the purpose of learning, we'll create these from scratch instead of using C standard library.

The concepts introduced in this chapter are very common in C programming, so ChatGPT would provide solid answers. If you struggle with implementation or understanding any part, feel free to try asking it or ping me.

Basic types

First, let's define some basic types and convenient macros:

common.h
typedef int bool; typedef unsigned char uint8_t; typedef unsigned short uint16_t; typedef unsigned int uint32_t; typedef unsigned long long uint64_t; typedef uint32_t size_t; typedef uint32_t paddr_t; typedef uint32_t vaddr_t; #define true 1 #define false 0 #define NULL ((void *) 0) #define align_up(value, align) __builtin_align_up(value, align) #define is_aligned(value, align) __builtin_is_aligned(value, align) #define offsetof(type, member) __builtin_offsetof(type, member) #define va_list __builtin_va_list #define va_start __builtin_va_start #define va_end __builtin_va_end #define va_arg __builtin_va_arg void *memset(void *buf, char c, size_t n); void *memcpy(void *dst, const void *src, size_t n); char *strcpy(char *dst, const char *src); int strcmp(const char *s1, const char *s2); void printf(const char *fmt, ...);

Most of these are available in the standard library, but we've added a few useful ones:

align_up and is_aligned are useful when dealing with memory alignment. For example, align_up(0x1234, 0x1000) returns 0x2000. Also, is_aligned(0x2000, 0x1000) returns true, but is_aligned(0x2f00, 0x1000) is false.

The functions starting with __builtin_ used in each macro are Clang-specific extensions (built-in functions). See Clang built-in functions and macros.

These macros can also be implemented in C without built-in functions. The pure C implementation of offsetof is particularly interesting ;)

Memory operations

Next, we implement the following memory operation functions.

The memcpy function copies n bytes from src to dst:

common.c
void *memset(void *buf, char c, size_t n) { uint8_t *p = (uint8_t *) buf; while (n--) *p++ = c; return buf; }

The memset function fills the first n bytes of buf with c. This function has already been implemented in Chapter 4 for initializing the bss section. Let's move it from kernel.c to common.c:

common.c
void *memcpy(void *dst, const void *src, size_t n) { uint8_t *d = (uint8_t *) dst; const uint8_t *s = (const uint8_t *) src; while (n--) *d++ = *s++; return dst; }

Sometimes we perform pointer dereferencing and pointer manipulation in a single statement, like *p++ = c;. If we break this down for clarity, it's equivalent to:

*p = c;    // Dereference the pointer
p = p + 1; // Advance the pointer after the assignment

This is an idiom in C.

String operations

Let's start with strcpy. This function copies the string from src to dst:

common.c
char *strcpy(char *dst, const char *src) { char *d = dst; while (*src) *d++ = *src++; *d = '\0'; return dst; }

The strcpy function continues copying even if src is longer than the memory area of dst. This can easily lead to bugs and vulnerabilities, so it's generally recommended to use alternative functions instead of strcpy. Never use it in production!

For simplicity, we'll use strcpy in this book, but if you have the capacity, try implementing and using an alternative function (strcpy_s) instead.

Next function is the strcmp function. It compares s1 and s2 and returns:

Condition Result
s1 == s2 0
s1 > s2 Positive value
s1 < s2 Negative value
common.c
int strcmp(const char *s1, const char *s2) { while (*s1 && *s2) { if (*s1 != *s2) break; s1++; s2++; } return *(unsigned char *)s1 - *(unsigned char *)s2; }

The casting to unsigned char * when comparing is done to conform to the POSIX specification.

The strcmp function is often used to check if two strings are identical. It's a bit counter-intuitive, but the strings are identical when !strcmp(s1, s2) is true (i.e., when the function returns zero):

if (!strcmp(s1, s2))
    printf("s1 == s2\n");
else
    printf("s1 != s2\n");