log in | register | forums
Show:
Go:
Forums
Username:

Password:

User accounts
Register new account
Forgot password
Forum stats
List of members
Search the forums

Advanced search
Recent discussions
- Archive Edition 27:2 reviewed (News:)
- Drag'n'Drop 13i3 edition reviewed (News:1)
- Wakefield Show 2024 in Pictures (News:5)
- April 2024 News Summary (News:2)
- RISC OS 5.30 arrives (News:2)
- WROCC May 2024 meeting - Gerph talks games (News:)
- Upgrading your RISC OS system to 5.30 (News:2)
- WROCC May 2024 meeting on wednesday - Gerph talks games (News:)
- uniprint upgraded to 4.50 (News:)
- PhotoDesk 3.23 released (News:)
Latest postings RSS Feeds
RSS 2.0 | 1.0 | 0.9
Atom 0.3
Misc RDF | CDF
 
View on Mastodon
@www.iconbar.com@rss-parrot.net
Site Search
 
Article archives
The Icon Bar: Programming: Clearing the screen
 
  Clearing the screen
  sinns (19:23 1/3/2009)
  Phlamethrower (20:09 1/3/2009)
    adrianl (17:37 6/3/2009)
  tribbles (20:53 1/3/2009)
    sinns (22:27 1/3/2009)
      Phlamethrower (23:35 1/3/2009)
      tribbles (00:21 2/3/2009)
        sinns (05:38 2/3/2009)
          tribbles (20:24 6/3/2009)
 
Simon Inns Message #109439, posted by sinns at 19:23, 1/3/2009
Member
Posts: 73
Hi

I came up with the following ARM assembler to clear the screen as fast as possible... Question is: is there anyway to do it even quicker?

LDR R11, screenStart ; put the screen start address in R11
LDR R10, screenLength ; put the screen length in R10
ADD R10, R10, R11 ; R10 now has the end address stored in it

; Clear the registers with the value to be written to the screen
MOV R0, #0
MOV R1, #0
MOV R2, #0
MOV R3, #0
MOV R4, #0
MOV R5, #0
MOV R6, #0
MOV R7, #0
MOV R8, #0
MOV R9, #0

.clsLoop
]
FOR loop% = 0 TO 24
[ OPT pass%
STMDB R10!, {R0-R9}
]
NEXT loop%
[ OPT pass%
STMDB R10!, {R0-R5}
CMP R10, R11
BNE clsLoop

Any suggestions would be greatly appreciated!

/Simon
  ^[ Log in to reply ]
 
Jeffrey Lee Message #109441, posted by Phlamethrower at 20:09, 1/3/2009, in reply to message #109439
PhlamethrowerHot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff

Posts: 15100
There's no real way of making that loop any faster for modern hardware. You might be able to save a CPU cycle here and there, but with 25 consecutive STMs with 10 registers each you'll always be limited by bus speed.

The only way to clear the screen faster would be to not clear the entire screen (e.g. skip the bits you know will end up being the same as the previous frame, or skip the bits you know will be entirely overwritten), or to rely upon hardware specific features to perform the clear for you (e.g. the application accelerator in the Iyonix CPU).
  ^[ Log in to reply ]
 
Jason Tribbeck Message #109442, posted by tribbles at 20:53, 1/3/2009, in reply to message #109439
tribbles
Captain Helix

Posts: 929
Hi

I came up with the following ARM assembler to clear the screen as fast as possible... Question is: is there anyway to do it even quicker?
There's two basic things I can think of that you can do.

You're using 10 registers - however, you can use R12 and R14 (pop them on the stack).

Secondly, FWICR it's better to use a multiple of 4 registers, so adding R12 and R14 would give you 12 registers (i.e. a multiple of 4).

In addition to these, you're effectively performing replica maths.

I'd do something like this (I'm not writing out everything for you!)

LDR R11, screenStart
LDR R10, screenLength

SUB R10, R10, #48 * 32

MOV Rx, #0

.clsLoop
FOR loop% = 0 TO 31
STMIA R11!, {R0-R9, R12, R14} ; 48 bytes
NEXT
SUBS R10, R10, #48 * 32
BGE clsLoop

; Comment here - see later

ADDS R10, R10, #48 * 32
SUB R10, R10, #48 ; Can't be done in 1 math op
.clsLoop2
STMIA R11!, {R0 - R9, R12, R14} ; 48 bytes
SUBS R10, R10, #48
BGE clsLoop2
TST R10, #32
STMNEIA R11!, {R0 - R7} ; 32 bytes
TST R10, #16
STMNEIA R11!, {R0 - R3} ; 16 bytes
TST R10, #8
STMNEIA R11!, {R0 - R1} ; 8 bytes
TST R10, #4
STMNEIA R11!, {R0}

If you know the size of the screen's going to be a constant, the bit after the "Comment here..." can be precalculated.

Note that this isn't tested, but I have written code like this before. Quite a few times.

But a while ago.
  ^[ Log in to reply ]
 
Simon Inns Message #109444, posted by sinns at 22:27, 1/3/2009, in reply to message #109442
Member
Posts: 73
Thanks for the help smile I'm writing some 3D routines in C and assembler; so any cycles I can save are great. I will try out the suggestions as soon as possible.

The screen size is a constant (640x480 since I am using a 'modern' monitor (on a RISC PC 600 with strongARM)), but I also intend to get the code running on my A3000 too, which is why I am concerned with efficiency.

/Simon
  ^[ Log in to reply ]
 
Jeffrey Lee Message #109447, posted by Phlamethrower at 23:35, 1/3/2009, in reply to message #109444
PhlamethrowerHot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff

Posts: 15100
(on a RISC PC 600 with strongARM)
In that case, you'll certainly be limited by bus speed smile Some fairly unscientific tests I've done in the past have shown that on a RiscPC you've got about 32MB/s RAM bandwidth and 22MB/sec VRAM bandwidth to play with. So the trick to get the best performance is to make good use of the StrongARM's 16k data cache, and if you're writing large amounts of data, especially if it's to somewhere uncached like VRAM, to try to find something the CPU can do inbetween writes to memory (since the bus can typically only sustain a max of 1 byte every 8 CPU cycles).
  ^[ Log in to reply ]
 
Jason Tribbeck Message #109448, posted by tribbles at 00:21, 2/3/2009, in reply to message #109444
tribbles
Captain Helix

Posts: 929
Note that I missed an ADD R10,R10,#48 after the end of the 48 byte transfers.

What kind of 3D are you doing? I've got a lot of experience with it.
  ^[ Log in to reply ]
 
Simon Inns Message #109451, posted by sinns at 05:38, 2/3/2009, in reply to message #109448
Member
Posts: 73
I'm really just experimenting building wireframe routines at the moment, then on to hidden line removal and polygon-fill. Never really done any 3D so I am plodding along and learning as I go. My intention was to learn in C/C++ and then rewrite in assembly and/or rewrite to use SDL (so I can port it 'back' to linux wink )

I have a whole ton of modern machines I could use instead, but I love the sleek simplicity of the RISC PC which also forces me to think 'can it be done faster' at each stage. Hence my 'CLS' question.
  ^[ Log in to reply ]
 
Adrian Lees Message #109497, posted by adrianl at 17:37, 6/3/2009, in reply to message #109441
Member
Posts: 1637
There's no real way of making that loop any faster for modern hardware...
Not unrolling the loop that far would probably be beneficial, simply because such a long instruction sequence is liable to cause more Icache fetches from RAM, and force other code out of the Icache.

...or to rely upon hardware specific features to perform the clear for you (e.g. the application accelerator in the Iyonix CPU).
Or, of course, use the OS calls which stand a better chance of being optimised, if you don't know the exact machine/CPU being used. In this specific case, the graphics card itself can obviously do a far better job than the AAU.
  ^[ Log in to reply ]
 
Jason Tribbeck Message #109499, posted by tribbles at 20:24, 6/3/2009, in reply to message #109451
tribbles
Captain Helix

Posts: 929
I'm really just experimenting building wireframe routines at the moment, then on to hidden line removal and polygon-fill. Never really done any 3D so I am plodding along and learning as I go. My intention was to learn in C/C++ and then rewrite in assembly and/or rewrite to use SDL (so I can port it 'back' to linux wink )

I have a whole ton of modern machines I could use instead, but I love the sleek simplicity of the RISC PC which also forces me to think 'can it be done faster' at each stage. Hence my 'CLS' question.
Good ideas - that's pretty much how I started off.

The final things I did in this area was texture mapping, but I did that in C as assembler was becoming a bit too much hard work!
  ^[ Log in to reply ]
 

The Icon Bar: Programming: Clearing the screen