SWITCH_RATIO for Arm(R) Neoverse(TM) architecture

This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
This commit is contained in:
Chris Sidebottom 2022-12-05 15:17:52 +00:00
parent a5e1fdd525
commit 5b165420b5
1 changed files with 18 additions and 2 deletions

20
param.h
View File

@ -1,5 +1,5 @@
/***************************************************************************** /*****************************************************************************
Copyright (c) 2011-2014, The OpenBLAS Project Copyright (c) 2011-2023, The OpenBLAS Project
All rights reserved. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -3338,6 +3338,12 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
#elif defined(NEOVERSEN1) #elif defined(NEOVERSEN1)
#if defined(XDOUBLE) || defined(DOUBLE)
#define SWITCH_RATIO 8
#else
#define SWITCH_RATIO 16
#endif
#define SGEMM_DEFAULT_UNROLL_M 16 #define SGEMM_DEFAULT_UNROLL_M 16
#define SGEMM_DEFAULT_UNROLL_N 4 #define SGEMM_DEFAULT_UNROLL_N 4
@ -3367,7 +3373,11 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
#elif defined(NEOVERSEV1) #elif defined(NEOVERSEV1)
#define SWITCH_RATIO 16 #if defined(XDOUBLE) || defined(DOUBLE)
#define SWITCH_RATIO 8
#else
#define SWITCH_RATIO 16
#endif
#define SGEMM_DEFAULT_UNROLL_M 16 #define SGEMM_DEFAULT_UNROLL_M 16
#define SGEMM_DEFAULT_UNROLL_N 4 #define SGEMM_DEFAULT_UNROLL_N 4
@ -3398,6 +3408,12 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
#elif defined(NEOVERSEN2) #elif defined(NEOVERSEN2)
#if defined(XDOUBLE) || defined(DOUBLE)
#define SWITCH_RATIO 8
#else
#define SWITCH_RATIO 16
#endif
#undef SBGEMM_ALIGN_K #undef SBGEMM_ALIGN_K
#define SBGEMM_ALIGN_K 4 #define SBGEMM_ALIGN_K 4