Samuel Jacob's Web Log
Here I log some of the technical things that I come across.
Search This Blog
Thursday, August 29, 2013
Saturday, May 11, 2013
Self Balancing Tree as Heap
In this post I will try to explain how to combine a Heap and AVL tree and get benefit of both them from a single structure.
A self balancing binary search tree such as AVL tree can do faster lookup for a item in the tree in O(log n). Heap s mainly used implement priority queues which needs to find min/max elements quickly. Heap achieve this in O(1) time complexity.
Lets revisit a basic property of binary search tree - In a binary search tree min element is at the left most leaf position. Similarly in a BST max element is at the right most leaf position. So finding min/max element in a BST is O(h) where h is the depth of the tree. If the BST is a balanced tree then O(h) is O(log n) in worst case(otherwise the worst case O(n), since we considering only balanced trees here lets ignore the unbalanced cases). The picture in the right illustrates this basic property of the BST.
The first and simplest optimization comes to mind is store a pointer to min/max elements. This caching will result in O(1) time complexity for finding min/max elements. However this would increase the cost of node insert/delete because the min/max pointer has to be updated during insert and deletion. The cost of finding and deleting a min node is O(log(n)) which is same as if we havent had the cache pointers. The picture in the right shows advantage of having cached pointer to find a min element. Obviously this method cant be used for priority queues where find min/delete operation is used together.
In the above method the problem was the complexity finding next smallest element in the tree from min element is O(log n). So if we have pointer to next smallest element from all nodes then find and delete opearation would be of complexity O(1).
Lets look at the BST from slightly different angle. Usual declaration of BST in C as follows:
When we(me and +Dilip Simha) were implementing ADT for AceOS we decided to experiment BST in a different way. We saw tree as recursive lists rather than a recursive pointers. In the following picture you could see 6 lists(not counting sub-lists):
Now consider this list is a doubly linked circular list. This is illustrated in the following figure. You may argue that this will make the BST to become acyclic directed graph. But for the sake of simplicity lets continue to call this as balanced BST. The picture I left out few arrows to keep it cleaner.
AVL tree in Ace OS is implemented in this way. You can see the data structure declarations below. Initially we did it for reusing the code. But after implementing this we figured out some interesting properties. This balanced tree/graph can find any node in O(log(n)) and also offers findmin operation in O(1) complexity. This also reduces the complexity of delete operation(since we can find right node's left most child in O(1) operation). But delete operation might result in balancing the tree.
A self balancing binary search tree such as AVL tree can do faster lookup for a item in the tree in O(log n). Heap s mainly used implement priority queues which needs to find min/max elements quickly. Heap achieve this in O(1) time complexity.
Min element of a BST is always at the left most |
Cached pointer to Min element at the root |
In the above method the problem was the complexity finding next smallest element in the tree from min element is O(log n). So if we have pointer to next smallest element from all nodes then find and delete opearation would be of complexity O(1).
Lets look at the BST from slightly different angle. Usual declaration of BST in C as follows:
When we(me and +Dilip Simha) were implementing ADT for AceOS we decided to experiment BST in a different way. We saw tree as recursive lists rather than a recursive pointers. In the following picture you could see 6 lists(not counting sub-lists):
- (300, 200, 100, 50)
- (300, 400, 500, 600)
- (200, 250, 275)
- (100, 150)
- (400, 350)
- (500, 450)
Now consider this list is a doubly linked circular list. This is illustrated in the following figure. You may argue that this will make the BST to become acyclic directed graph. But for the sake of simplicity lets continue to call this as balanced BST. The picture I left out few arrows to keep it cleaner.
AVL tree in Ace OS is implemented in this way. You can see the data structure declarations below. Initially we did it for reusing the code. But after implementing this we figured out some interesting properties. This balanced tree/graph can find any node in O(log(n)) and also offers findmin operation in O(1) complexity. This also reduces the complexity of delete operation(since we can find right node's left most child in O(1) operation). But delete operation might result in balancing the tree.
Saturday, March 31, 2012
GCOV internals overview
Sometime back I worked on a small project to extract code coverage information created by gcc from a kernel.
During that time I didn't find any good internal documentation about gcov. So here I discuss about internals of GCOV.
Before jumping to the internals of GCOV here is an example from the man page.
As shown above gcov can show what all code path executed and how many time executed.
Want to try? Here is the quick example.
You might wonder how your ./a.out would create .gcda while exiting. It is because of "-fprofile-arcs" automatically includes libgcov . Libgcov registers itself to be invoked during program exit by using atexit(). (Yes - it wont generate .gcda files if you exit abnormally). And during program exit it just dumps all the branch information to one or more gcda file.
The coverage information is "just" dumped into files by libgcov. So who collects the the coverage information at run time? Actually the program itself collects the coverage information. In-fact only it can collect because only it knew which all code path it takes. The code coverage information is collected at runtime on the fly. It is accomplished by having a counter for each branch. For example consider the following program.
The compiler keep tracks of all the counters in a single file. The data structure outlined in the below picture.
There is a single gcov_info structure for a C file. And multiple gcov_fn_info and gcov_ctr_info. During program exit() these structures are dumped into the .gcda file. For a project(with multiple C files) each C file will have a gcov_info structure. These gcov_info structures should be linked together so that during exit() the program can generate .gcda file for all the C files. This is done by using constructors and destructors.
Generic C constructor:
gcc generates constructors for all program. C constructors are accomplished by using ".ctors" section of ELF file. This section contains array of function pointers. This array is iterated and each function is invoked by _init()->__do_global_ctors_aux() during program start. _init() is placed ".init" section so it will be called during program initialization. A function can be declared as constructor by using function attribute.
"-ftest-coverage" creates a constructor per file. This constructor calls __gcov_init() and passes the gcov_info as argument.
$ gcov -b tmp.c 87.50% of 8 source lines executed in file tmp.c 80.00% of 5 branches executed in file tmp.c 80.00% of 5 branches taken at least once in file tmp.c 50.00% of 2 calls executed in file tmp.c Creating tmp.c.gcov. Here is a sample of a resulting tmp.c.gcov file: main() { 1 int i, total; 1 total = 0; 11 for (i = 0; i < 10; i++) branch 0 taken = 91% branch 1 taken = 100% branch 2 taken = 100% 10 total += i; 1 if (total != 45) branch 0 taken = 100% ###### printf ("Failure0); call 0 never executed branch 1 never executed else 1 printf ("Success0); call 0 returns = 100% 1 }Note - gcov has a cool graphical frontend in Linux - lcov.
As shown above gcov can show what all code path executed and how many time executed.
Want to try? Here is the quick example.
$ gcc -fprofile-arcs -ftest-coverage your_program.c $ ./a.out $ gcov your_program.cDuring compilation with -ftest-coverage option gcc generates a ".gcno" file. It contains information about each branches in your code. While finishing execution, ./a.out creates .gcda file(s) which actually contains which all branches taken. Using these there files .c(source), .gcno(block info) and .gcda(block execution count) gcov command prints the code coverage information in a human readable format.
You might wonder how your ./a.out would create .gcda while exiting. It is because of "-fprofile-arcs" automatically includes libgcov . Libgcov registers itself to be invoked during program exit by using atexit(). (Yes - it wont generate .gcda files if you exit abnormally). And during program exit it just dumps all the branch information to one or more gcda file.
The coverage information is "just" dumped into files by libgcov. So who collects the the coverage information at run time? Actually the program itself collects the coverage information. In-fact only it can collect because only it knew which all code path it takes. The code coverage information is collected at runtime on the fly. It is accomplished by having a counter for each branch. For example consider the following program.
int if_counter = 0, else_counter = 0; void dump_counters() { int fd; fd = open(strcat(filename, ".gcda"), "w"); write(fd, if_counter, sizeof(if_counter)); write(fd, else_counter, sizeof(else_counter)); } int main() { int a = 1; atexit(dump_counters); a = a++ + (a * --a) + a--; if(a) { if_counter++; printf("not zero\n"); } else { else_counter++; printf("zero\n"); } }In the above example if you replace the colored code with gcov equivalent then green colored code is implemented inside libgcov and red colored code is implanted inside your code by gcc. It is easy to understand how the increment operation can be implanted inside your code by gcc. It just inserts "inc x-counter" machine instruction before and after every branch. It should be noted that "inc" machine specific instruction might have side effect on some programs which uses asm inside C. For example in x86 the "inc" instruction affects carry flag. Some assembly code might depends on this and if gcc inserts "inc counter" instruction then it will result in error. I had hard time figuring this out when compiled with -fprofile-arcs a kernel was booting but not able to receive any network packets(it was discarding all packets because the network stack found the checksum was wrong).
1 int main() 2 { 3 int a = 1; 4 5 if (a) { 6 a++; 7 } else { 8 a--; 9 } 10 11 return a; 12 } 13Here is the disassembly (gcc -g3 test.c;objdump -S -d ./a.out)
int main() { 4004b4: 55 push %rbp 4004b5: 48 89 e5 mov %rsp,%rbp int a = 1; 4004b8: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp) if (a) { 4004bf: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 4004c3: 74 06 je 4004cb <main+0x17> a++; 4004c5: 83 45 fc 01 addl $0x1,-0x4(%rbp) 4004c9: eb 04 jmp 4004cf <main+0x1b> } else { a--; 4004cb: 83 6d fc 01 subl $0x1,-0x4(%rbp) } return a; 4004cf: 8b 45 fc mov -0x4(%rbp),%eax }After compiling with profile-arcs, the disassembly looks like
int main() { 400c34: 55 push %rbp 400c35: 48 89 e5 mov %rsp,%rbp 400c38: 48 83 ec 10 sub $0x10,%rsp int a = 1; 400c3c: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp) if (a) { 400c43: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 400c47: 74 18 je 400c61 <main+0x2d> a++; 400c49: 83 45 fc 01 addl $0x1,-0x4(%rbp) 400c4d: 48 8b 05 3c 25 20 00 mov 0x20253c(%rip),%rax # 603190 <dtor_idx.6460+0x8> 400c54: 48 83 c0 01 add $0x1,%rax 400c58: 48 89 05 31 25 20 00 mov %rax,0x202531(%rip) # 603190 <dtor_idx.6460+0x8> 400c5f: eb 16 jmp 400c77 <main+0x43> } else { a--; 400c61: 83 6d fc 01 subl $0x1,-0x4(%rbp) 400c65: 48 8b 05 2c 25 20 00 mov 0x20252c(%rip),%rax # 603198 <dtor_idx.6460+0x10> 400c6c: 48 83 c0 01 add $0x1,%rax 400c70: 48 89 05 21 25 20 00 mov %rax,0x202521(%rip) # 603198 <dtor_idx.6460+0x10> } return a; 400c77: 8b 45 fc mov -0x4(%rbp),%eax } 400c7a: c9 leaveq 400c7b: c3 retqFrom the above disassembly it might seem putting inc instruction while compiling is easy. But how/where storage for the counters(dtor_idx.6460 and dtor_idx.6460 in above example) are created. GCC uses statically allocated memory. Dynamically allocating space is one way but it would complicate the code(memory allocation operations during init) and might slow down execution of program(defer pointer). To avoid that gcc allocates storage as a loadable section.
The compiler keep tracks of all the counters in a single file. The data structure outlined in the below picture.
There is a single gcov_info structure for a C file. And multiple gcov_fn_info and gcov_ctr_info. During program exit() these structures are dumped into the .gcda file. For a project(with multiple C files) each C file will have a gcov_info structure. These gcov_info structures should be linked together so that during exit() the program can generate .gcda file for all the C files. This is done by using constructors and destructors.
Generic C constructor:
gcc generates constructors for all program. C constructors are accomplished by using ".ctors" section of ELF file. This section contains array of function pointers. This array is iterated and each function is invoked by _init()->__do_global_ctors_aux() during program start. _init() is placed ".init" section so it will be called during program initialization. A function can be declared as constructor by using function attribute.
"-ftest-coverage" creates a constructor per file. This constructor calls __gcov_init() and passes the gcov_info as argument.
samuel@ubuntu:~$objdump -t ./a.out | grep -i _GLOBAL__ 0000000000400c7c l F .text 0000000000000010 _GLOBAL__sub_I_65535_0_mainAnd disassembly of _GLOBAL__sub_I_65535_0_main
954 0000000000400c7c <_GLOBAL__sub_I_65535_0_main>: 955 400c7c: 55 push %rbp 956 400c7d: 48 89 e5 mov %rsp,%rbp 957 400c80: bf 00 31 60 00 mov $0x603100,%edi 958 400c85: e8 a6 12 00 00 callq 401f30 <__gcov_init> 959 400c8a: 5d pop %rbp 960 400c8b: c3 retq 961 400c8c: 90 nop 962 400c8d: 90 nop 963 400c8e: 90 nop 964 400c8f: 90 nopgcov_init() implemented in libgcov stores all the gcov_info() passed in a linked list. This linked list is used to walk through all the gcov_info during program termination.
Monday, February 27, 2012
Graph your data
System statistics are hard interpolate since usually they are collected in large quantities and sometimes represents large numbers. Recently I was doing a prototype and wanted to measure how much damage it would to the main project (in terms of performance); so used performance counter feature in the processor to measure some events(cache miss, memory read etc) with and without my code change. But after looking at the numbers I realized it is difficult to analyze such a data. Because each number is 8 digit and I had 16 columns(16 cpu) and 100 rows of data(100 seconds of run).
So I decided to use some graph so that it would be easy to understand the data.
Googled for a GNU graph tool and found gnu plot - this blog is to show how good it is and how easy it is to use. Consider using it if you have lot of numbers. For this post I took some sample data from my ubuntu machine while running stress command.(stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --hdd 2 --timeout 200s)
This graph might not be impressive because it deals with only numbers ranging from 0-100 and the numbers are very steady. Consider a range 0-99999999 and the numbers are fluctuating too much then it will be to graph. The above graph was created by running "gnuplot" with following commands
The nice thing about gnuplot is it will skip the row(line) in the data file if it cant recognize the columns. And also it supports svg and pdf outputs. See what all gnuplot can do at the official demo page.
Googled for a GNU graph tool and found gnu plot - this blog is to show how good it is and how easy it is to use. Consider using it if you have lot of numbers. For this post I took some sample data from my ubuntu machine while running stress command.(stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --hdd 2 --timeout 200s)
#sudo vmstat -n 1 200 > stat.txt #cat stat.txt procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 251124 1827388 4720 32704 6 34 19 38 42 74 1 0 98 1 0 0 251124 1827388 4720 32708 0 0 0 0 104 71 0 0 100 0 13 3 251108 1349912 4728 322540 4 0 4 20 683 1789 42 12 47 0 11 3 251008 1382620 4728 322520 180 0 180 0 1604 1233 89 12 0 0 11 3 251008 1432052 4728 322520 0 0 0 0 1361 1237 90 10 0 0 11 3 251008 1298352 4728 322668 0 0 0 0 1392 1275 90 10 0 0 2 3 251008 1512576 4728 323524 0 0 0 0 20077 14827 59 16 13 12 0 0 251008 1826388 4728 32756 0 0 0 0 45069 25566 0 4 25 71 0 0 251008 1826444 4728 32708 0 0 0 0 59 46 0 0 100 0 ...The following example shows how to create line graph for us,sy columns in the above against time(seconds).
This graph might not be impressive because it deals with only numbers ranging from 0-100 and the numbers are very steady. Consider a range 0-99999999 and the numbers are fluctuating too much then it will be to graph. The above graph was created by running "gnuplot" with following commands
set title 'CPU usage' #set terminal svg butt enhanced dynamic set terminal jpeg set output 'output.jpg' set xlabel 'seconds' #set logscale y set ylabel 'cpu' set key below plot \ "stat.txt" using :13 title 'Application' with lines lc rgb 'blue', \ "stat.txt" using :14 title 'Kernel' with lines lc rgb 'green'You can also intermix two or more datafiles. The following example shows how to graph two different samples collected during different time period.
set title 'CPU usage' #set terminal svg butt enhanced dynamic set terminal jpeg set output 'output.jpg' set xlabel 'seconds' #set logscale y set ylabel 'cpu' set key below plot \ "stat.txt" using :13 title 'Application' with lines lc rgb 'light-green', \ "stat.txt" using :14 title 'Kernel' with lines lc rgb 'light-red', \ "stat1.txt" using :13 title 'Application1' with lines lc rgb 'dark-green', \ "stat1.txt" using :14 title 'Kernel1' with lines lc rgb 'dark-red'The stat1.txt file is generated by running vmstat while the system was stressed by stress --cpu 4 --io 2 --vm 4 --vm-bytes 1M --hdd 2 --hdd-bytes 4096 --timeout 200s
The nice thing about gnuplot is it will skip the row(line) in the data file if it cant recognize the columns. And also it supports svg and pdf outputs. See what all gnuplot can do at the official demo page.
Sunday, May 22, 2011
OpenOCD and NGX USB ARM JTAG
This post describes the steps needed to make NGX’s USB ARM JTAG to work with OpenOCD in windows 7. This JTAG is compatible with colink JTAG and works with IAR Workbench and Keil uVision. To use with these IDEs there is a well defined methods/plug-ins available in the product page and in internet. However to use this JTAG with OpenOCD there is scarce resource in the internet.
OpenOCD can be used to low level debugging, source level debugging (through GDB) and can be used for flashing. OpenOCD exposes a command line interface which can be accessed through telnet. It also provides remote GDB server which also can be reached through TCP connection.
Steps needed for Windows:
1) Plug-In the JTAG to a available USB connector
2) Download libusb-win32 from http://sourceforge.net/projects/libusb-win32/files/
3) Extract libusb-win32 to a folder and run “inf-wizard.exe”
4) Select “USB Serial Converter A” and install driver
5) Download and install openocd from http://www.freddiechopin.info/index.php/en/download/category/4-openocd
6) Attach the JTAG probe to your target ARM board and poweron the target board
7) Create a openocd configurations file (see at the end)
8) Run openocd.exe –f
9) Run putty or telnet and connect to port localhost:4444
After launching arm-none-eabi-gdb.exe run "target remote localhost:3333" to start remote debugging. You can execute low level JTAG commands from GDB by using "monitor " command.
Flashing can be done using the following commands:
reset
halt
sleep 200
wait_halt
flash probe 0
flash info 0
flash write_image erase unlock
sleep 200
reset run
halt
sleep 200
wait_halt
flash probe 0
flash info 0
flash write_image erase unlock
sleep 200
reset run
Open OCD configuration file:
# openocd configurations
telnet_port 4444
# gdb configuration
gdb_port 3333
# cpu configuration
source [find target/lpc1768.cfg]
# interface configuration
interface ft2232
ft2232_vid_pid 0x0403 0x6010
ft2232_device_desc "NGX JTAG"
ft2232_layout "oocdlink"
ft2232_latency 2
telnet_port 4444
# gdb configuration
gdb_port 3333
# cpu configuration
source [find target/lpc1768.cfg]
# interface configuration
interface ft2232
ft2232_vid_pid 0x0403 0x6010
ft2232_device_desc "NGX JTAG"
ft2232_layout "oocdlink"
ft2232_latency 2
Subscribe to:
Posts (Atom)