The other problem turned out to be memcpy. But there is actually nothing wrong memcpy (the same code runs plenty fast on the BBB), but I am convinced that the D cache on the Orange Pi (Allwinner H3) is not really enabled or working.
My attack on this problem is two pronged. Once is that I am taking time to read the ARM documents, which is no small chore. The other is to look at NetBSD startup and cache initialization. I have sometimes begun to wonder if my hardware may be broken, but then I remember that NetBSD can run on the Orange Pi and when it does run, it does a transfer at a proper fast speed.
Kyu / [email protected]