Skip to: Site menu | Main content


Welcome to PSP-Programming.com, a place for developers to get together.

Welcome to the forums. Here you can find other user tutorials as well as homebrew releases and the source code repository. You can also ask for help with your code here and post your own homebrew!

PSP-Programming.com Forums
February 10, 2012, 01:57:00 PM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length

News: Join our IRC channel: ##psp-programming on freenode
Home Help Search Shop Login Register
Digg This!
Pages: [1]
Print
Author Topic: Some VFPU clocktick analysis  (Read 1560 times)
Raphael
Global Moderator
Hero Member
*

Karma: +230/-10
Offline Offline

Posts: 1431
193700.11 points

View Inventory
Send Money to Raphael


View Profile WWW
« on: August 10, 2006, 01:43:06 PM »

Hi, I just wrote a small bench program that approximates the ticks a specified operation takes and let all vfpu listed on http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html#sec4.9
go through it. Since I found it rather informational, I decided to share the results here. Maybe even some discussion will come from this, so if you have questions on the results, ask, or if you have other information, share it.
Everything was benched with the PSP at default speed, ie 222Mhz, so the ops/µs will increase by 33% when PSP is set to 333Mhz. Tick counts won't change though (tested), so they are reliable. The results also include any latencies induced, so interlacing costly ops with independant other ops might decrease the real tick cost somewhat.

UPDATE: I added ulv.q and usv.q ops for unaligned loads/stores
Code:

OP ops/µs ticks/op
vadd.q ~220 1
vsub.q ~220 1
vdot.q ~220 1
vmul.q ~220 1
vhdp.q ~220 1
vdiv.q ~4 56
vmmul.q ~14 16
vmin.q ~220 1
vmax.q ~220 1
vabs.q ~220 1
vneg.q ~220 1
vidt.q ~77 3
vzero.q ~77 3
vone.q ~77 3
vrcp.q ~56 4
vrsq.q ~56 4
vsin.q ~56 4
vcos.q ~56 4
vexp2.q ~56 4
vlog2.q ~56 4
vsqrt.q ~56 4
vasin.q ~56 4
vnrcp.q ~56 4
vnsin.q ~56 4
vrexp2.q ~56 4
vi2uc.q ~220 1
vi2s.q ~220 1
vsgn.q ~220 1
vcst.q ~220 1
vf2in.q ~220 1
vi2f.q ~220 1
vhtfm4.q ~56 4
vtfm4.q ~56 4
vmidt.q ~19 12
vmzero.q ~19 12
lv.q(cache) ~219 1
lv.q(mem) ~4 68
ulv.q(cache)~109 2
ulv.q(mem) ~4 68
sv.q(cache) ~32 7
sv.q(mem) ~2 111
usv.q(cache)~16 14
usv.q(mem) ~2 111


Well, what I can say after this, is that the vector division, apart from memory reads/writes, is the most costly, so avoid that whenever possible. Also doing mem loads/stores from/to cache is to be recommended, so watch your data structures and accesses.

If I find time, I'll maybe also bench the triple, pair and single ops for comparison. Maybe also some comparison to MIPS counterpart ops would be useful (esp for vdiv, vmmul where it's not clear whether vfpu is really faster).

NOTE: If I missed something important, please LMK, I'm basing these results on my current knowledge of op tickcosts and latencies, which might not be 100% correct. So these results are also not warranted for Razz
Logged

Don't push the river, it flows.
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki
http://www.homebrew-illuminati.co.uk - serious homebrew development for all platforms
Alexander Berl
"A good mod is a combination playground monitor, priest, big brother/sister, psychiatrist, professor and more."


Pages: [1]
Print
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!
Page created in 0.145 seconds with 25 queries.
Sister Sites: Guitar Hero 4   BrokeniTouch.com