Hello, World!

Last edited
low-levelelfassembly

Today I learned how to manually create a simple ELF executable that prints the string "Hello, World!". ELF (Executable and Linkable Format) is the standard binary format for executables that languages like C, C++, Rust, and Go compile to on Linux. Windows and macOS use different file types.

To begin, the executable must contain the desired string to be printed, so it must contain the 14 bytes

48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a

corresponding to the 13 characters in "Hello, World!" plus a newline character.

There must also be assembly instructions for the program, notably the write syscall to print the string and an exit syscall to exit the program. Looking at this Linux syscall reference, in order to invoke the write syscall, the rax register should be set to 0x01 to denote the write syscall number, the rdi register should be set to 1 to correspond to the stdout file descriptor, the rsi register should be a pointer to the hello world string, and the rdx register should be set to 14 to represent the number of bytes to write. This syscall corresponds to the bytes:

48 c7 c0 01 00 00 00  # mov rax, 1
48 c7 c7 01 00 00 00  # mov rdi, 1
48 8d xx xx 00 00 00  # some lea instruction for rsi, TBD
48 c7 c2 0e 00 00 00  # mov rdx, 14
0f 05                 # syscall

This reference explains the format of various instructions. The instruction for setting the rsi register depends on where the hello world string is located, which will be resolved later.

In order to invoke the exit syscall, the rax register should be set to 0x3c to denote the exit syscall number and the rdi register should be set to 0 for the exit code. This syscall corresponds to the bytes:

48 c7 c0 3c 00 00 00  # mov rax, 60
48 c7 c7 00 00 00 00  # mov rdi, 0
0f 05                 # syscall

By placing the hello world string right after the instruction bytes in memory, the lea instruction for the write syscall can be resolved by setting the rsi register to some offset of the rip register, which is a pointer to the next instruction. The offset in this case is the number of instruction bytes after the lea instruction, and counting this number gives 25. This resolves the lea instruction to be:

48 8d 35 19 00 00 00  # lea rsi, [rip+25]

Combining everything so far gives the bytes:

# write syscall instructions
48 c7 c0 01 00 00 00  # mov rax, 1
48 c7 c7 01 00 00 00  # mov rdi, 1
48 8d 35 19 00 00 00  # lea rsi, [rip+25]
48 c7 c2 0e 00 00 00  # mov rdx, 14
0f 05                 # syscall

# exit syscall instructions
48 c7 c0 3c 00 00 00  # mov rax, 60
48 c7 c7 00 00 00 00  # mov rdi, 0
0f 05                 # syscall

# hello world string
48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a

ELF files must also have headers, so there’ll need to be an ELF header and a program header, which should be placed before the instruction and string bytes. This Wikipedia page explains the fields in the headers.

# ELF header
7F 45 4c 46              # Magic number of ELF file
02                       # Class (2 = 64-bit)
01                       # Data encoding (1 = little-endian)
01                       # ELF version (1)
00                       # OS ABI identifier (0 = System V)
00 00 00 00 00 00 00 00  # ABI version and padding bytes
02 00                    # Object file type (2 = executable)
3e 00                    # Machine architecture (0x3e = x86-64)
01 00 00 00              # ELF version (1)
78 00 40 00 00 00 00 00  # Entry point address (0x400078)
40 00 00 00 00 00 00 00  # Program header table offset (0x40)
00 00 00 00 00 00 00 00  # Section header table offset (0)
00 00 00 00              # Flags (0)
40 00                    # ELF header size (64 bytes)
38 00                    # Program header entry size (56 bytes)
01 00                    # Number of program headers (1)
00 00                    # Section header entry size (0)
00 00                    # Number of section headers (0)
00 00                    # Section header string table index (0)

# Program header
01 00 00 00              # Program header type (1 = loadable segment)
05 00 00 00              # Flags (5 = read + execute)
00 00 00 00 00 00 00 00  # Segment offset in file (0)
00 00 40 00 00 00 00 00  # Virtual address (0x400000)
00 00 40 00 00 00 00 00  # Physical address (0x400000)
b4 00 00 00 00 00 00 00  # Size in file (180 bytes)
b4 00 00 00 00 00 00 00  # Size in memory (180 bytes)
00 10 00 00 00 00 00 00  # Alignment (4096 bytes)

The 64-byte ELF header, 56-byte program header, 46 bytes of instructions, and 14 string bytes combine to form a 180-byte ELF executable.

Going through some of the notable values in the headers, the 0x400000 starting address in the program header makes the address of the first instruction 64 + 56 = 120 (0x78) bytes after that address, hence the 0x400078 entry point address. The program header table offset 0x40 = 64 is the number of bytes in the file before the program header. No section headers or sections are used.


Putting this all together, I tested that it works!

$ xxd helloworld
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..>.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 0000 0000 0000  ....@.8.........
00000040: 0100 0000 0500 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: b400 0000 0000 0000 b400 0000 0000 0000  ................
00000070: 0010 0000 0000 0000 48c7 c001 0000 0048  ........H......H
00000080: c7c7 0100 0000 488d 3519 0000 0048 c7c2  ......H.5....H..
00000090: 0e00 0000 0f05 48c7 c03c 0000 0048 c7c7  ......H..<...H..
000000a0: 0000 0000 0f05 4865 6c6c 6f2c 2057 6f72  ......Hello, Wor
000000b0: 6c64 210a                                ld!.

$ ./helloworld
Hello, World!

GitHub

Originally published on Substack: https://alyxya.substack.com/p/hello-world.