Say hello to x64 Assembly [part 1]

Introduction


There are many developers between us. We write a tons of code every day. Sometime, it is even not a bad code :) Every of us can easily write the simplest code like this:


Every of us can understand what's this C code does. But... How this code works at low level? I think that not all of us can answer on this question, and me too. I thought that i can write code on high level programming languages like Haskell, Erlang, Go and etc..., but i absolutely don't know how it works at low level, after compilation. So I decided to take a few deep steps down, to assembly, and to describe my learning way about this. Hope it will be interesting, not only for me. Something about 5 - 6 years ago I already used assembly for writing simple programs, it was in university and i used Turbo assembly and DOS operating system. Now I use Linux-x86-64 operating system. Yes, must be big difference between Linux 64 bit and DOS 16 bit. So let's start.

Preparation


Before we started, we must to prepare some things like As I wrote about, I use Ubuntu (Ubuntu 14.04.1 LTS 64 bit), thus my posts will be for this operating system and architecture. Different CPU supports different set of instructions. I use Intel Core i7 870 processor, and all code will be written processor. Also i will use nasm assembly. You can install it with:
sudo apt-get install nasm
It's version must be 2.0.0 or greater. I use NASM version 2.10.09 compiled on Dec 29 2013 version. And the last part, you will need in text editor where you will write you assembly code. I use Emacs with nasm-mode.el for this. It is not mandatory, of course you can use your favourite text editor. If you use Emacs as me you can download nasm-mode.el and configure your Emacs like this:


That's all we need for this moment. Other tools will be describe in next posts.

x64 syntax


Here I will not describe full assembly syntax, we'll mention only those parts of the syntax, which we will use in this post. Usually NASM program divided into sections. In this post we'll meet 2 following sections:
  • data section
  • text section
The data section is used for declaring constants. This data does not change at runtime. You can declare various math or other constants and etc... The syntax for declaring data section is:
section .data
The text section is for code. This section must begin with the declaration global _start, which tells the kernel where the program execution begins.
section .text
global _start
_start:
Comments starts with ; symbol. Every NASM source code line contains some combination of the following four fields:
[label:] instruction [operands] [; comment]
Fields which are in square brackets are optional. A basic NASM instruction consists from two parts. The first one is the name of the instruction which is to be executed, and the second are the operands of this command. For example:
MOV COUNT, 48 ; Put value 48 in the COUNT variable

Hello world


Let's write first program with NASM assembly. And of course it will be traditional Hello world program. Here is the code of it:


Yes, it doesn't look like printf("Hello world"). Let's try to understand what is it and how it works. Take a look 1-2 lines. We defined data section and put there msg constant with Hello world value. Now we can use this constant in our code. Next is declaration text section and entry point of program. Program will start to execute from 7 line. Now starts the most interesting part. We already know what is it mov instruction, it gets 2 operands and put value of second to first. But what is it these rax, rdi and etc... As we can read at wikipedia:
A central processing unit (CPU) is the hardware within a computer that carries out the instructions of a computer program by performing the basic arithmetical, logical, and input/output operations of the system.
Ok, CPU performs some operations, arithmetical and etc... But where can it get data for this operations? The first answer in memory. However, reading data from and storing data into memory slows down the processor, as it involves complicated processes of sending the data request across the control bus. Thus CPU has own internal memory storage locations called registers:
So when we write mov rax, 1, it means to put 1 to the rax register. Now we know what is it rax, rdi, rbx and etc... But need to know when to use rax but when rsi and etc...
  • rax - temporary register; when we call a syscal, rax must contain syscall number
  • rdx - used to pass 3rd argument to functions
  • rdi - used to pass 1st argument to functions
  • rsi - pointer used to pass 2nd argument to functions
In another words we just make a call of sys_write syscall. Take a look on sys_write:


It has 3 arguments:
  • fd - file descriptor. Can be 0, 1 and 2 for standard input, standard output and standard error
  • buf - points to a character array, which can be used to store content obtained from the file pointed to by fd.
  • count - specifies the number of bytes to be written from the file into the character array
So we know that sys_write syscall takes three arguments and has number one in syscall table. Let's look again to our hello world implementation. We put 1 to rax register, it means that we will use sys_write system call. In next line we put 1 to rdi register, it will be first argument of sys_write, 1 - standard output. Then we store pointer to msg at rsi register, it will be second buf argument for sys_write. And then we pass the last (third) parameter (length of string) to rdx, it will be third argument of sys_write. Now we have all arguments of sys_write and we can call it with syscall function at 11 line. Ok, we printed "Hello world" string, now need to do correctly exit from program. We pass 60 to rax register, 60 is a number of exit syscall. And pass also 0 to rdi register, it will be error code, so with 0 our program must exit successfully. That's all for "Hello world". Quite simple :) Now let's build our program. For example we have this code in hello.asm file. Then we need to execute following commands:
nasm -f elf64 -o hello.o hello.asm
ld -o hello hello.o
After it we will have executable hello file which we can run with ./hello and will see Hello world string in the terminal.

Conclusion


It was a first part with one simple-simple example. In next part we will see some arithmetic. If you will have any questions/suggestions write me a comment.

All source code you can find - here.

Short list of mercurial cheat sheets

I'm using git scm for work and pet project quite long time. If wikipedia doesn't lie, first release of git was in 2005. If i remember correctly, than I am using git since 2009. I knew about git at github :). Two weeks ago I watched classic presentation Linus Torvalds on git. There was a quote:
Hg (Mercurial) is pretty good, but git is better
It was not my first time that I heard Mercurial name. Yes, i head it many times in my life. But unfortunatelly i never used it. I was curious about this dvcs and i decided little plunge into it. Today i finished to read: Mercurial: The Definitive Guide by Bryan O'Sullivan:
It is a really great book, where Bryan describes work with mercural from the start (how to install it, hot to create mercurial repository etc...) to advanced themes (hg hooks, worksflows and etc...). I have no big expirience with mercurial but i had noticed two things for this time:
  • It is much easier than git
  • It is much slower than git
In other parts mercurial similar to git, same commands commit, init, tag and etc...

List of useful mercurial commands


Here is little list of mercurial cheat sheets for mercurial begginers as me: To clone remote mercurial repo to local Dir directory:
hg clone remote Dir
To initialize new mercurial repository in current directory:
hg init
Begin tracking all files
hg add
Begin tracking file test
hg add test
Stop tracking and delete file test
hg remove test
Make commit with message to current repository
hg commit -m "message..."
Show history of changes
hg log
Show status of files
hg status
See who changed, what changed and when
hg annotate file
Lists known remote Repos
hg paths
Lists tracked file changes
hg diff
List changes to test file
hg diff test
List changesets available at remote
hg incoming
Pull all new changesets into local, but does not update work tree
hg pull
Pull all new changesets into local, and update work tree
hg pull -u
Merge working directory with another revision
hg merge
Undo all uncomm­itted changes
hg revert
Push changesets to remote
hg push
To create a new named branch
hg branch feature


GHCi runtime linker found a duplicate definition error

Today I started to play with attoparsec package and tried to write simple parser. I wrote really toy data type like this:


I tried to type:
*SimpleParser> End
in ghci and will get End as I expected. Instead End I got following error:


As we can see in error output GHCi runtime linker found duplicate definition of _hs_bytestring_long_long_uint_hex. Let's see what's wrong bytestring package with:
cabal info bytestring
It will give us output like this:


We can note that there two installed versions of bytestring package. Let's remove old version with:
sudo ghc-pkg unregister --force bytestring-0.10.0.2
That's all. After this error with GHCi runtime linker found a duplicate definition error...... will disappear. Hope this blog post will be useful for all who will meet error like this. Happy coding!

Binary tree and some generic tricks with golang

It is quite easy to create binary tree implementation with Go programming language, but it's not clearly for the first view how to make it generic. I think most of us know what is it Binary tree, so I will no explain it here, you can find in the Internet. Let's try to implement binary tree. As you can read about I want to implement generic binary try, so I don't want to make binary tree for some concrete type like int, string and etc... it must generic and be able to work with any data types. It is not a problem to make it generic, because we have {}interface type in Golang. interface{} type means something like any type. That's way you free to do something like this:


Ok, we have interface{} type for any type. Let's create binary tree structure now:


We can see here that binary tree consist from node and pointers to left and right nodes:
Now need to create initialization helper for our binary tree which will return empty binary tree to user:


Ok, now we have binary tree data structure and some code for it's initialization. Let's go to the most interesting part. Now we create Insert function. It will get 1 parameter, some data with interface{} type and insert new node to the our binary tree. How insert works for binary tree: It get new node and tree and check if this tree is empty it creates new first node with this data. If tree is not empty, it compares new node data with current node value and if it is greater it goes recursively to right branch, and if it is smaller it goes to left node in the same way. Implementation of this is following:


Very simple implementation of inserting but it will not work :) If we try to compile our code, we'll get error that: < operator not defined for interface. From the point of logic it is properly behaviour. Really, interface{} means any type, so golang doesn't know what's type will be there and how to compare two values of this type. Actually we can pass and int and string and MyCustomType instead interface{}. How to be with this? If we will look how another programming languages solve this problem, we will find something interesting. Let's look to Haskell for example: There is Ord type class which provides behaviour for <,> and other function for comparing data. Binary tree in Haskell looks like:


Practically it looks almost like golang implementation but with another syntax. We have current node and recursively defined left and right trees. We can see a here, it is like interface{} type in golang. We just can make instance of Ord for Tree and tell Haskell compiler how to use <, > and other operators for tree. There is method to do this in golang like in Haskell. Let's update our binary tree structure and add one function there:


You can see that we had add new field: lessFun which has functional type which gets two arguments with interface{} type and returns bool. How it help us? Before initialization of new binary tree, user will create function with two interface{} and implement comparing of this two arguments there. if first argument smaller than second it will returns true, and false in against way. Usually user knows what's type will be in binary tree and user must know how compare his types. After defining this function need to pass it to New function, so it will be like this:


And now we can write insert function:


Insert function compares node and new node with lessFun function, so it already knows how to compare data with certain types. For example we want to create binary tree for int, it will be:


Here is compare function which gets two arguments with interface{} type and compares it resulting to int type. Than we create new binary tree with New function and pass our compare function to it. And tries to insert some integer numbers to binary tree. It works perfectly, because current binary tree already knows how to compare nodes with lessFun.

If anybody knows method to do better something like this, please let me know in comments.

Full source code of binary tree you can find - here.