This is going to be a small article on the basics of working with Assembly Language. We won't go deep into building extensive programs in assembly. The main idea of this post is to clarify the workflow for creating an assembly program and some key concepts so you can comfortably begin your assembly explorations.
Let's first learn about the different assembly syntaxes and types.
If you are interested in this topic you might enjoy my course Bash Byond Basics. This course helps you level up your bash skills. This is not a course on shell-scripting, is a course on improving your efficiency by showing you the features of bash that are seldom discussed and often ignored.
Every day you spend many hours working in the shell, every little improvement in your worklflows will pay dividends many fold!
Assembly languages were created to be an abstraction above machine code (The actual 1's and 0's). But the abstraction is not separate from the hardware that runs it. What this means is that depending on the hardware, we will have different assembly languages. The ISA (Instruction Set Architecture) is the definition of the registers, data types, and instructions supported by a specific computer architecture. The ISA, as you can imagine, changes depending on the hardware. Different architectures are one reason that influences the existence of multiple assembly languages.
Another reason to have different types of assembly languages is the assembler. The assembler is the program that translates from the higher-level assembly language to machine code.
In this post, we are going to focus on Intel's x86 processor. Just because macOS laptops, at the time of writing, run on x86 processors. For x86 we have many Assembler programs (NASM, GAS, YASM, and many more), and each support its own "style" of assembly. We have two main syntax branches for x86, Intel and AT&T (You can read some of the differences in this IBM article).
In summary, we have different assembly languages depending on the architecture, and also depending on the assembler program.
If you want to write assembly, compile it and run it on your computer, you need to make sure you are using the assembly language and assembler that matches your architecture.
Enough background, let's write some code.
Create a file hello_intel.asm with the following content (We'll use Intel syntax):
1 2 3 4 5 6 7 8 9 10 11 12 13section .data message: db "Hello, World!", 0Ah, 00h global _main section .text _main: mov rax, 0x02000004 ; system call for write mov rdi, 1 ; file descriptor 1 is stdout mov rsi, qword message ; get string address mov rdx, 13 ; number of bytes syscall ; execute syscall (write) mov rax, 0x02000001 ; system call for exit mov rdi, 0 ; exit code 0 syscall ; execute syscall (exit)Now we can generate the object file using yasm . If you don't have it installed on your computer, you can install it using Homebrew package manager.
$ yasm -f macho64 hello_intel.asm # this generates hello_inte.o object fileNow we have to use the linker to link it to the system's dylibs (dynamic libraries).
$ ld -lSystem -o hello_intel hello_intel.o # this will generate hello_intel executableIf we run it we'll get our desired output:
$ ./hello_intel Hello, World!
To show you the difference between Intel and AT&T syntax, we are going to write the same program but this time on AT&T syntax. We'll be using the command as . as is the assembler that comes by default in macOS, and as it's common on *nix systems, it uses the AT&T syntax. Let's create a new file, hello_atnt.asm with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14.section __DATA, __data message: .asciz "Hello world!\n" .section __TEXT, __text .globl _main _main: mov $0x02000004, %rax # system call for write mov $1, %rdi # file descriptor 1 is stdout movq message@GOTPCREL(%rip), %rsi # get string address mov $13, %rdx # number of bytes syscall # execute syscall (write) mov $0x02000001, %rax # system call for exit xor $0, %rdi # exit code 0 syscall # execute syscall (exit)As you can see, in AT&T syntax, there are more macros, and the order of the operands is different. Intel syntax feels like we are doing rax = 0x02000004 and in AT&T it feels more like $0x02000004 -> %rax . Let's generate the object file:
$ as hello_atnt.asm -o hello_atnt.o #we specify the object file to be hello_atnt.oNow we can link it in the same way we did with the Intel assembly.
$ ld -lSystem -o hello_atnt hello_atnt.o # we get the executable hello_atntAnd if we run the executable we get what we were expecting:
$ ./hello_atnt Hello, world!Great! We created a simple executable from assembly code. From here you can start exploring the exciting world of Assembly language on macOS.
Final thoughts
When searching assembly language examples, most of them are from the reverse engineering perspective. Which makes sense, fewer people write a whole program in assembly. I think the understanding is complete if we can also write even a simple assembly program.
Anyways, I hope this small post was helpful or at least entertaining :). Let me know what you think, and if you know of any useful assembly language resources, send them my way.
Related topics/notes of interest
** If you want to check what else I'm currently doing, be sure to follow me on twitter @rderik or subscribe to the newsletter. If you want to send me a direct message, you can send it to derik@rderik.com.
Subscribe to the newsletter