Short: A bit more complex profiler code for MorphOS Author: morgoth6@box43.pl (Marcin 'Morgoth' Kurek) Uploader: morgoth6@box43.pl (Marcin 'Morgoth' Kurek) Type: dev/c Version: 0.15 Architecture: MorphOS Introduction: ============== There was never a working profiler program for MorphOS. Since we got used to it it wasn't considered to be a problem. But it's nice to know which part of your program takes much CPU time and which part can be optimized for biggest speed gain. This is a reason for which libprf was created. ANyway after some time it was clear that libprf is a bit to primitive for real use then another library was born. This library allows you to get a statistics for running program. It use a small trick from gcc, so can only be used with applications compiled with it (Works for both 2.95.3 and 2.95.4). To see some examples of libpofile.a usage take a look at examples/ directory. Features: ========== * Full program profiling with detailed statistics about each function (See Instrumental Profile Usage) - Shortest/Longest time spend in function - Summary time for function The profiling data is collected in Child/NoChild mode. Then in opposite to old library you can Check the time in both modes. + Child - Function time with all subfunctions + NoChild - Function time without any subfunctions * Profile not only a function but a part of a function (see Simple/Complex Profile usage) * Show statistics about amount of stack used by a program (See StackShow usage) * Call graph Instrumental Profile Usage: ======= First thing you need is to compile your project with -finstrument-functions option and link with libprofile.a of coz (Please note that libprofile.a uses kprintf/kvprintf then libdebug.a is required too) Also it's important to do not try to profile a stripped binaries in that way. And this is enough to get statistics. When program end's libprofile.a will save it's results in a .prf file: [VERSION] 5 [TYPE] 0 [TASK] Shell Process 32756/32768 {test.db/0x1/0x64c/1/722570/709378/722570/709378/722570/709378/722570/709378/722570/709378/722570/709378/3120/8/2976/8} {test.db/0x1/0x6d0/1/1419241/709378/1419241/709378/1419241/709378/1419241/709378/1419241/709378/1419241/709378/3120/8/2976/8} {test.db/0x1/0x754/1/7095647/709378/7095647/709378/7095647/709378/7095647/709378/7095647/709378/7095647/709378/3120/8/2976/8} {test.db/0x1/0x7d8/100/22051/709378/48967/709378/2851980/709378/22051/709378/48967/709378/2851980/709378/3120/8/2976/8} {test.db/0x1/0x85c/1/12090218/709378/12090218/709378/12090218/709378/780/709378/780/709378/780/709378/3088/8/2944/8} {test.db/0x1/0x958/1/12090245/709378/12090245/709378/12090245/709378/0/0/0/0/0/0/3008/8/2864/8} As you can see this file format is no easily readable by human. Previous version has parser created by Krzysztof 'Tarnak' Kretus (AmiChris@fucio.pl) but he seems to be extremaly busy these days then this time I create my own tool. You can found it in tools/ directory. When program end's ... I think this one need a few additional words. There is no perfect way for libprofile.a to detect if the program ended or it's still running, then currently there are three diffrent ways to handle that (Look into profile.h): PRFF_NONE - Libray won't try to detect anything. You need to generate output and cleanup everything yourelf by manualy call PRFCleanup() PRFF_ATEXIT - Library will use atexit() call to be sure to end when the program ends. This is a default mode but unfortunatly it's far from perfect (For example you are not using standard libraries and atexit() function is missing or you want to use atexit() in your own code) PRFF_AUTO - This method assume that program end's when first called function ends (main() in most cases) This works great but only if your code doesn't use exit() or similar function. Call Graph: ============ This function is controlled by PRFF_CALLLOG flag and it makes possible to produce full functions call history. The example call graph can looks like that: Call Log: Shell Process -------------------------------------------------------- Function Name | Offset | Type | Time -------------------------------------------------------- main | 0xa5c | Enter | - test | 0x928 | Enter | - test50 | 0x64c | Enter | - test50 | 0x64c | Exit | 1.01653 test100 | 0x6d0 | Enter | - test100 | 0x6d0 | Exit | 2.00033 test500 | 0x754 | Enter | - test500 | 0x754 | Exit | 10.00206 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.04001 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.03995 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.04000 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.04005 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.03995 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.04000 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.04005 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.03996 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.04000 test1 | 0x7d8 | Enter | - test1 | 0x7d8 | Exit | 0.04005 test | 0x928 | Exit | 13.41971 main | 0xa5c | Exit | 13.41976 Simple/Complex Profile usage: ============================== Unfortunatly the Instrumental profiling is not enough in some cases. Using Simple/Complex method you can get statistics for code block not for a function. A example use of this profiling method can look like this: PROFILE_HANDLE(handle); if(PROFILER_INIT(&handle, "Handle name")) { PROFILE_BLOCK_BEGIN(&handle); { ... a block of code ...} PROFILE_BLOCK_END(&handle); PROFILER_CLEANUP(&handle); } else Printf("Profiler init failed !\n"); There is no need to remove a created profile blocks from the code. If the PROFILE_ENABLED macro is not defined all Simple/Complex macros are ignored. Then to enable/disable it just define or undefine the PROFILE_ENABLED. This profiling method outputs data in to the system log then you need a serial terminal or LogTool utility to see it. A sample output can look like this: # ComplexProfile block (Handle name) start at test.c/main():34 # ComplexProfile block (Handle name) end at test.c/main():45 [17.010817s] As you can see you have a information where the profile block starts (In this case in function main() at \ line 34 in file tes.c) ends (test.c/main():45) and how much time it takes to complete (In this example ~17s) See StackShow usage: ===================== As a additional feature libprofile.a can also generate for you a simple stack usage statistics. This feature required the same steps as Instrumental profiling method, but a diffrent parse tool usage (Look at Using prf-analyser) prf-analyser: ========= This program can be found in tools/ directory and it will help you to parse .prf files to something more "user friendly" prf-analyser elf-file prf-file First argument is executable used to create .prf file and second. Options can be: -c --cpp_demangle This option demangle function names for C++ programs. It accepts two demangle methods: * old - used in gcc 2.9x * new - used in gcc 3.x and 4.x -c --nocalllog This option will force parser to skip displaying of call graph. -d --detailed In the default mode the prf-analyser displays only the summary informations for each function. If you want to get more detailed informations use this switch. A example of default output: ----------------------------------------------------------------------------- Process: Shell Process ----------------------------------------------------------------------------- Function Name | Counter | Offset | Summary [C/NC] ----------------------------------------------------------------------------- main | 1 | 0x958 | 17.04345 / - test | 1 | 0x85c | 17.04341 / 0.00110 test1 | 100 | 0x7d8 | 4.02040 / 4.02040 test500 | 1 | 0x754 | 10.00263 / 10.00263 test100 | 1 | 0x6d0 | 2.00068 / 2.00068 test50 | 1 | 0x64c | 1.01860 / 1.01860 And the same for detailed mode: --------------------------------------------------------------------------------------------------------------------------------- Process: Shell Process --------------------------------------------------------------------------------------------------------------------------------- Function Name | Counter | Offset | Slowest [C/NC] | Fastest [C/NC] | Summary [C/NC] --------------------------------------------------------------------------------------------------------------------------------- main | 1 | 0x958 | 17.04345 / - | 17.04345 / - | 17.04345 / - test | 1 | 0x85c | 17.04341 / 0.00110 | 17.04341 / 0.00110 | 17.04341 / 0.00110 test1 | 100 | 0x7d8 | 0.03108 / 0.03108 | 0.06903 / 0.06903 | 4.02040 / 4.02040 test500 | 1 | 0x754 | 10.00263 / 10.00263 | 10.00263 / 10.00263 | 10.00263 / 10.00263 test100 | 1 | 0x6d0 | 2.00068 / 2.00068 | 2.00068 / 2.00068 | 2.00068 / 2.00068 test50 | 1 | 0x64c | 1.01860 / 1.01860 | 1.01860 / 1.01860 | 1.01860 / 1.01860 -s --stackshow Displays the stack usage statistics for each function enter/exit (Both PPC/68k). A example output: --------------------------------------------------------------------------------------- Process: Shell Process --------------------------------------------------------------------------------------- Function Name | Counter | Offset | Enter (PPC/68k) | Exit (PPC/68k) --------------------------------------------------------------------------------------- main | 1 | 0x958 | 3008 / 8 | 2864 / 8 test | 1 | 0x85c | 3088 / 8 | 2944 / 8 test1 | 100 | 0x7d8 | 3120 / 8 | 2976 / 8 test500 | 1 | 0x754 | 3120 / 8 | 2976 / 8 test100 | 1 | 0x6d0 | 3120 / 8 | 2976 / 8 test50 | 1 | 0x64c | 3120 / 8 | 2976 / 8 -o --ordering Controls output sorting. Default is to sort list by it's address/offset * d - Default * a - Offset * s - Shortest (NC) * l - Longest (NC) * t - Total (NC) * c - Counter * u - Stack usage (PPC) * e - Stack usage (68k) -h --help Display available options. -v --verbose Be a more verbose when parsing both elf and prf file. Not realy usefull for normal user. Can be helpfull if you encounter a bug in parser tool. -V --version Displays program version. FAQ: ===== Q: What C/NC means in prf-analyser output ? A: It meand Child/NoChild. See Introduction. Q: Is there a way to skip profiling for some functions ? A: This can be required for example to avoid crashes in case of some system calls like interrupts, hooks, etc. If you have a function that nukes libprofile.a please try to prepare a small example code to reproduce this and send a e-mail to author. To disable profiling for a function you need to declare it with no_instrument_function attribute. For example this can look like that: ULONG foo(VOID) __attribute__ ((no_instrument_function)); There is macro in libprofile.h if you prefer to use something looking nicer. History: ========= 0.1 - First released version 0.2 - Avoid to break Forbid() 0.9 - Changed name to libprofile.a Restructured prf file format to handle Child/NoChild profiling Added StackShow feature Added Simple/Complex profiling method Now it require 68k Reg Store/Restore only when generation .prf file 0.10 - A couple of small fixes in parser Tiny fix in library 0.11 - Makes possible for parser to display correctly function names longer than 32 characters - Added extern "C" in to includes - Added correct C++ function names demangling - Implemented the call graph function - .prf file format changed to version 6. Nice one marcik :) - Added small utility to reorganize call graph send by marcik 0.12 - Fixed a potential problem with mutithread apps required RegStore/Restore operations - Optimized parsera bit when demangle option is used - Now parser will print function offsets if called without .prf file 0.13 - Implemented PRFF_LOGCL Small mod in includes 0.14 - Removed potential deadlock Modified prf-analyser output a bit 0.15 - Added 'ordering' option to sort prf-analyser output ToDo: ====== There are still some things that can be implemented to make libprf a real profiler. But i think it's current state is enough and if you want to see something new or have some suggestions just drop a e-mail to me. Bugs: ====== There is still some bugs :( Some of them are already located, some of them are still hidden. If you want to help send a bugreport to authors. - Currently libprofile.a count's child summary time for recursive functions incorrectly. Generaly child profiling usability is limited and I see no point to fix that until someone needs that. As a workaround simply ignore child summary statistics for such functions. Author: ========= Marcin 'Morgoth' Kurek (morgoth6@box43.pl) ThX: ===== Harry "Piru" Sintonen and Sigbjørn 'CISC' Skjêret for their great help MichaÅ‚ "Kiero" Woźniak as usual for help with readme :) Krzysztof 'Tarnak' Kretus (AmiChris@fucio.pl) - for previous prf parser utility Marcin "marcik" Kwiatkowski - For some nice suggestions and ideas