This topic did arise from my investigation of SOF interrupts, which you can read about in the next page.
I have a routine "delay_ms ( count )" that you can call to delay for a certain number of milliseconds. It is implemented like so:
static void delay_one_ms ( void ) { volatile int count = 7273; while ( count-- ) ; }Actually I lied. I call this N times to get a N-millisecond delay, but you get the idea. I typically "tune" it using a stopwatch and trial and error and arrive at a number that gives close to a 1 ms delay. My investigation of USB frames revealed that this yields a delay of about 0.913 milliseconds. So I could grab my calculator and do this:
N = 7273/0.913 = 7966Using that value would no doubt be an improvement, but I am curious if we can work on this from the other direction and maybe learn something via the process.
My make command always (or often) yields a file xyz.dump for project "xyz". In this case indeed, I have papoon.dump which contains disassembled code for the entire binary that I flash into the chip.
08000158I search through papoon.dump and find the above. The optimizer has forced the "delay_one_ms" function inline and discarded the name. Note the useless second "bx lr" instruction that has resulted. We even see our magic value "7273". We have an outer loop (that doesn't particularly interest us right now) and an inner loop. The inner loop is:: 8000158: b168 cbz r0, 8000176 800015a: 1e41 subs r1, r0, #1 800015c: f641 4069 movw r0, #7273 @ 0x1c69 8000160: b082 sub sp, #8 8000162: 9001 str r0, [sp, #4] 8000164: 9b01 ldr r3, [sp, #4] 8000166: 1e5a subs r2, r3, #1 8000168: 9201 str r2, [sp, #4] 800016a: 2b00 cmp r3, #0 800016c: d1fa bne.n 8000164 800016e: 3901 subs r1, #1 8000170: d2f7 bcs.n 8000162 8000172: b002 add sp, #8 8000174: 4770 bx lr 8000176: 4770 bx lr
8000164: 9b01 ldr r3, [sp, #4] 8000166: 1e5a subs r2, r3, #1 8000168: 9201 str r2, [sp, #4] 800016a: 2b00 cmp r3, #0 800016c: d1fa bne.n 8000164So, we have 5 ARM instructions. Also the processor is running at 72 Mhz. If this was truly a RISC processor (and it is!) each instruction would take a single clock cycle. But other issues may get involved. The branch instruction could flush the pipeline, and I know we have specified 2 wait states for accesses to flash memory.
Now consider our 72 Mhz processor clock. In 1 ms we will have 72,000 cycle. We also run the above code 7966 times per millisecond. So calculate 72,000/7966 = 9.038. Let's call that 9 processor cycles to run the above 5 instructions. I can believe that, but am unsure how to partition those 9 cycles among the 5 instructions. Certainly each instruction uses at least 1 cycle. That leaves 4 left over. A true ARM guru would know the answer. Here is a wild guess -- the two instructions that access the stack use 3 cycles each and all the other instructions use 1 cycle. That would total to 9. But I lose interest at this point, although several interesting questions do arise.
If our 9 cycles per loop iteration is correct and our goal is to burn up 72,000 cycles to delay a millisecond, we can calculate the number of loop iterations simply enough. N = 72,000/9 = 8000. Let's put that value into our delay function and move on.
When I do this, I get a delay of 1.005 milliseconds. This is good enough for me and certainly better than 0.913. Why aren't we spot on? There are two possibilies.
One is that our 72 Mhz clock is derived from an 8 Mhz crystal of unknown precision. The other is that there is some overhead with setting up the loop, calling the delay_ms function, and in the case of multiple millisecond delays, the loop wrapped around the "delay_one_ms(): function. This leads to the following idea. Why not have a single loop and calculate N*8000 as the delay amount, i.e. the following.
void delay_ms ( int ms ) { volatile int count = ms * 8000; while ( count-- ) ; }I do this and still measure a 1.005 millisecond delay. I'll blame the crystal. If I was endlessly curious and had time to burn, I might try running this on different boards, but I'm not that curious.
Tom's Computer Info / [email protected]