Skip to: Site menu | Main content


Welcome to PSP-Programming.com, a place for developers to get together.

Welcome to the forums. Here you can find other user tutorials as well as homebrew releases and the source code repository. You can also ask for help with your code here and post your own homebrew!

PSP-Programming.com Forums
February 04, 2012, 10:19:07 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length

News: Check out the Code Section!
Home Help Search Shop Login Register
Digg This!
Pages: [1]
Print
Author Topic: Random crash in (inline / vfpu) asm code  (Read 2316 times)
Noware
C/C++ Developer
C/C++ Developer
Hero Member
*

Karma: +41/-2
Offline Offline

Posts: 685
37495.68 points

View Inventory
Send Money to Noware

Avatar by: Jason Hise


View Profile
« on: April 06, 2009, 10:09:29 AM »

Hi,

I'm trying to port a C function to asm, but sometimes it crashes when I recompile,
so my quess I missing some syncing of registers, inline asm, etc

Code:
//
// r = ( t * t * (3.0f - 2.0f * t) )
//
static inline float SmoothCurve(float T)
{
    float result;
    __asm__ volatile (
        "mtv        %[t], S000\n"           // S000 = t
        "viim.s     S001, 2\n"              // S001 = 2.0f
        "vmul.s     S001, S001, S000\n"     // S001 = 2.0f * t
        "viim.s     S002, 3\n"              // S002 = 2.0f
        "vsub.s     S002, S002, S001\n"     // S002 = 3.0f - 2.0f * t
        "vmul.s     S000, S000, S000\n"     // S000 = t * t
        "vmul.s     S000, S000, S002\n"     // S000 = (t * t * (3.0f - 2.0f * t)
        "mfv        %[result], S000\n"
        : [result]"=r"(result)
        : [t]"r"(T)
    );
    return result;
}

Thx in advance,
 Noware

[EDIT]
Note: the code is slower then the C implementation, it's more or less a practice to get the calling function (perlin noice) in asm, I already have dot3 and lerp in (vfpu) asm working without any problems
« Last Edit: April 06, 2009, 12:46:18 PM by Noware » Logged

Reporter - What do you think of western civilization?
Gandhi - I think it would be a good idea!


Flatmush
Has a normal user title
Administrator
Hero Member
*

Karma: +84/-26
Offline Offline

Posts: 1046
12906.27 points

View Inventory
Send Money to Flatmush

The Omniscient One


View Profile WWW
« Reply #1 on: April 08, 2009, 09:41:33 AM »

I don't know why it would be crashing, my knowledge of inline asm is somewhat lacking but I can see some obvious optimizations.

Firstly, loading the immediate 2 and then multiplying is a waste of time, to get 2.0f * t just add t to itself.

And although I don't know exactly how the vfpu works (i.e how deeply pipelined it is, etc) I would re-order the instructions like so.

Code:
static inline float SmoothCurve(float T)
{
    float result;
    __asm__ volatile (
        "mtv        %[t], S000\n"           // S000 = t
        "vadd.s     S001, S000, S000\n"     // S001 = 2.0f * t
        "viim.s     S002, 3\n"              // S002 = 3.0f
        "vmul.s     S000, S000, S000\n"     // S000 = t * t
        "vsub.s     S002, S002, S001\n"     // S002 = 3.0f - (2.0f * t)
        "vmul.s     S000, S000, S002\n"     // S000 = (t * t * (3.0f - (2.0f * t))
        "mfv        %[result], S000\n"
        : [result]"=r"(result)
        : [t]"r"(T)
    );
    return result;
}

The inline asm looks very similar to the inline asm I used reliably in funclib, are you certain that it's the asm which crashes?

Edit: Oh and if you are applying this to a large data set then you could use the .q vector instructions.
« Last Edit: April 08, 2009, 09:44:50 AM by Flatmush » Logged

Firmware History: 2.60 -> 2.71 -> 1.50 -> 3.03oe-c

I am nerdier than 66% of all people. Are you nerdier? Click here to find out!I am 62% loser. What about you? Click here to find out!NerdTests.com User Test: The Can I Run A Business Test.

Hehe I'm a "Hero Member" because I bought posts back when they were in the shop.

Creator of FlatEditPSP, funcLib and flAstro
Noware
C/C++ Developer
C/C++ Developer
Hero Member
*

Karma: +41/-2
Offline Offline

Posts: 685
37495.68 points

View Inventory
Send Money to Noware

Avatar by: Jason Hise


View Profile
« Reply #2 on: April 08, 2009, 10:01:20 AM »

Hi Flatmush,

Ah, you are right about the 2*, its a waste of cycles thx Wink

maybe I should use lv.s and sv.s instead of mtv and mfv

do you have a link to your funclib source code ?

Quote
Edit: Oh and if you are applying this to a large data set then you could use the .q vector instructions.
Yes, I know and for some instruction you can use .t (triple), or as far as I can see set the ordering like c000[0, y, x, 1], etc, but that is currently for me a little bit to compilcated!

Noware

[EDIT]
yes, I quess this makes sense
        "vmul.s     S000, S000, S000\n"     // S000 = t * t
        "vsub.s     S002, S002, S001\n"     // S002 = 3.0f - (2.0f * t)

or even move the viim.s after the first mul
        "vmul.s     S000, S000, S000\n"     // S000 = t * t
        "viim.s     S002, 3\n"              // S002 = 3.0f
        "vsub.s     S002, S002, S001\n"     // S002 = 3.0f - (2.0f * t)
« Last Edit: April 08, 2009, 10:10:20 AM by Noware » Logged

Reporter - What do you think of western civilization?
Gandhi - I think it would be a good idea!
Flatmush
Has a normal user title
Administrator
Hero Member
*

Karma: +84/-26
Offline Offline

Posts: 1046
12906.27 points

View Inventory
Send Money to Flatmush

The Omniscient One


View Profile WWW
« Reply #3 on: April 08, 2009, 10:32:42 AM »

Yeah that would probably be better, putting the immediate after the mul.

Attached is the source file with vfpu stuff in, it's stuff I did a long time back though and probably even less advanced than what you're doing.

Using mtv is probably better than using lv, since I'm assuming on average register transfers are faster than memory transfers though.

It may be even more optimal and accurate to do this function in fixed point on the MIPS itself, depending on the range your data lies within.
Logged

Firmware History: 2.60 -> 2.71 -> 1.50 -> 3.03oe-c

I am nerdier than 66% of all people. Are you nerdier? Click here to find out!I am 62% loser. What about you? Click here to find out!NerdTests.com User Test: The Can I Run A Business Test.

Hehe I'm a "Hero Member" because I bought posts back when they were in the shop.

Creator of FlatEditPSP, funcLib and flAstro
Noware
C/C++ Developer
C/C++ Developer
Hero Member
*

Karma: +41/-2
Offline Offline

Posts: 685
37495.68 points

View Inventory
Send Money to Noware

Avatar by: Jason Hise


View Profile
« Reply #4 on: April 08, 2009, 11:12:16 AM »

Hi Flatmush,

thanks for the source, althrough I have seen most of it already in libpspmath, but together with some other examples it's a good reference to mips/vfpu asm.

Quote
It may be even more optimal and accurate to do this function in fixed point on the MIPS itself, depending on the range your data lies within.
Yes I did this before (~8 years ago) in my Java engine, but personally I prefer using float instead of writing a fix point math library again.

[EDIT]
Quote
The inline asm looks very similar to the inline asm I used reliably in funclib, are you certain that it's the asm which crashes?
well not 100%, but if I use my c/c++ implementationit doesn't crash, if you know the perlin noice code you see he uses three defines
#define s_curve(t) ( t * t * (3. - 2. * t) )
#define lerp(t, a, b) ( a + t * (b - a) )
#define at3(rx,ry,rz) ( rx * q[0] + ry * q[1] + rz * q[2] )

so I added/modified the code a little bit, something like

#ifndef _ASM_PERLIN
  #define s_curve(t) ( t * t * (3. - 2. * t) )
  #define lerp(t, a, b) ( a + t * (b - a) )
  #define at3(rx,ry,rz) ( rx * q[0] + ry * q[1] + rz * q[2] )
#else
  #define s_curve(t) SmoothCurve(t)
  #define lerp(t, a, b) LinearInterpolate(t,a,b)
  #define at3(q,rx,ry,rz) Dot3(q,rx,ry,rz)
#endif

[EDIT2]
I turned on my assert/debug handler and added some bounds check in the code, and now it doesn't crash, so I really think my crash is a syncing/timing issue (Note: in debug my optimize level is still set to -03, I only add some internal type checks, bounds check, asserts, logging, etc)

[EDIT3]
I saw my sub-threads didn't use PSP_THREAD_ATTR_VFPU, so I fixed it but I still have the crash ;(

Noware
« Last Edit: April 08, 2009, 02:28:35 PM by Noware » Logged

Reporter - What do you think of western civilization?
Gandhi - I think it would be a good idea!
Pages: [1]
Print
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!
Page created in 0.292 seconds with 28 queries.
Sister Sites: Guitar Hero 4   BrokeniTouch.com