Friday, November 06, 2009

Oprofile on ARM Linux

Introduction

This article applies to OProfile version 0.9.5. OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries to binaries. It runs transparently in the background collecting information at a low overhead. These features make it ideal for profiling entire systems to determine bottle necks in real-world systems. Many CPUs provide "performance counters", hardware registers that can count "events"; for example, cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: repeatedly, every time a certain (configurable) number of events have occurred, the PC value is recorded. This information is aggregated into profiles for each binary image.

Some hardware setups do not allow OProfile to use performance counters: in these cases, no events are available, and OProfile operates in timer/RTC mode as described below(RTC mode only applicable till linux kernel 2.4).


Cross-Compiling oprofile for ARM arch

Unpack the oprofile-0.9.5.tar.bz2 and go the oprofile directory

[test@localhost]# cd oprofile-0.9.5


Run the below command to configure the oprofile for cross compilation

[test@localhost oprofile-0.9.5]# ./configure –host=arm-linux –-with-kernel-support


Recommendation: Please use any cross compilation environment like CLFS or Scratchbox for cross compilation. My environment is scratchbox. This is important to resolve lots of library dependence and to save lots of time.


Once the configuration is successful, run make to compile the oprofile.

[test@localhost oprofile-0.9.5]# make


Now install the oprofile, during installation you can specify the rootfs as install path, which will directly install or copy the oprofile binaries and libraries to your root file system.


Run the below command to install the oprofile daemon.

[test@localhost oprofile-0.9.5]# make DESTDIR=<INSTALL PATH> install


Running and using oprofile tools

Oprofile can run into two modes, one is by using the hardware performance monitor controller (PMNC for Cortex-A8) and Timer interrupt mode.

Tool summary

This section gives a brief description of the available OProfile utilities and their purpose.

ophelp

This utility lists the available events and short descriptions.

Opcontrol

Used for controlling the OProfile data collection, discussed below in section 4.4

Opreport

This is the main tool for retrieving useful profile data, described below

Opannotate

his utility can be used to produce annotated source, assembly or mixed source/assembly. Source level annotation is available only if the application was compiled with debugging symbols.

Opgprof

This utility can output gprof-style data files for a binary, for use with gprof –p.

Oparchive

This utility can be used to collect executables, debuginfo, and sample files and copy the files into an archive. The archive is self-contained and can be moved to another machine for further analysis.

Opimport

This utility converts sample database files from a foreign binary format (abi) to the native format. This is useful only when moving sample files between hosts, for analysis on platforms other than the one used for collection.

agent libraries

Used by virtual machines (like the Java VM) to record information about JITed code being profiled.


Oprofile in Timer Interrupt Mode

This section applies to 2.6 kernels and above only. In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver falls back to using the timer interrupt for profiling.

You can force use of the timer interrupt by using the timer=1 module parameter (or oprofile.timer=1 on the boot command line if OProfile is built-in).


Oprofile in Hardware Performance monitor counter(PMNC) mode

In the case you have to add the PMNC IRQ number to the oprofile driver for the armv7 driver in side file arch/arm/oprofilr/op_mode_v7.c

Add the below line:

Static int irqs[ ] ={

#ifdef CONFIG_ARCH_OMAP3

INT_34XX_BENCH_MPU_EMUL,

#endif

+ #ifdef CONFIG_ARCH_

+ , //Irq number for the PMNC controller for CORTEX-A8 in your SoC.

+ #endif

};


Using Opcontrol

In this section we describe the configuration and control of the profiling system with opcontrol.

Download the zImage to the target board using rootfs which contains oprofiling tools. And run as below;

First, we need to be the root user to use OProfile. So, either login as the root user, or use the su command and switch to the root user. Next, we need to setup OProfile. We have two options. We can either profile our application with, or without the Linux kernel. If we want to profile with the Linux kernel, we need to reference the uncompressed kernel image file in the /root directory.

#Init Oprofile:

[root@localhost ~] opcontrol –-reset

[root@localhost ~] opcontrol –-init

[root@localhost ~] opcontrol –-vmlinux=/root/vmlinux

#Setup events

[root@localhost ~] opcontrol –e CPU_CYCLES:100000:0:1:1

#Start

[root@localhost ~] opcontrol –-start-daemon

[root@localhost ~] opcontrol –start

# verify PMNC IRQs

[root@localhost ~] cat /proc/interrupt

Note: one entry should be there in /proc corresponding to PMNC irq only in case you are using PMNC mode.

Run any application at this point of time.

# stoping profiling

[root@localhost ~] opcontrol –-dump

[root@localhost ~] opcontrol –-stop

# Deinit

[root@localhost ~] opcontrol –-shutdown

[root@localhost ~] opcoontrol –-deinit

# Get the report

[root@localhost ~] opreport

Overflow stats not available

CPU: ARM V7 PMNC, speed 0 MHz (estimated)

Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No

unit mask) count 100000

CPU_CYCLES:100000|

samples| %|

------------------

337 91.3279 hello

21 5.6911 no-vmlinux

6 1.6260 libc-2.5.so

4 1.0840 ld-2.5.so

1 0.2710 busybox


[root@localhost ~]# opreport --callgraph
CPU: ARM V7 PMNC, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No
unit mask) count 100000
samples % app name symbol name
-------------------------------------------------------------------------------
477 86.1011 hello1 main
477 100.000 hello1 main [self]
-------------------------------------------------------------------------------
15 2.7076 vmlinux _spin_unlock_irqrestore
15 100.000 vmlinux _spin_unlock_irqrestore [self]
-------------------------------------------------------------------------------
7 1.2635 vmlinux check_poison_obj
7 100.000 vmlinux check_poison_obj [self]
-------------------------------------------------------------------------------
4 0.7220 busybox /bin/busybox
4 100.000 busybox /bin/busybox [self]
-------------------------------------------------------------------------------
4 0.7220 vmlinux __do_softirq
4 100.000 vmlinux __do_softirq [self]

Conclusion
It should be noted that Oprofile does not provide 100% instruction-accurate profiles and cannot accept any disturbance to the system at all.
Oprofile is easy to install and run.
Oprofile can give a full profile, kernel + all processes
Very detailed CPU info available with advanced usage.
useful links:

http://oprofile.sourceforge.net/
http://oprofile.sourceforge.net/doc/index.html
http://oprofile.sourceforge.net/doc/internals/index.html
and google for your questions.



My learning continues...............Alim




1 Comments:

At 10:05 PM , Blogger Unknown said...

I am getting "overflow stats not available"... what that exactly means ? is that a kind of error or ???

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home