Empathy List Archives

gem5-dev@gem5.org

The gem5 Developer List

[L] Change in gem5/gem5[develop]: arch-arm: Add support for Arm SVE fmmla instruction.

Bobby Bruce (Gerrit)

Thu, May 25, 2023 9:36 PM

Bobby Bruce has submitted this change. (
https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email )

Change subject: arch-arm: Add support for Arm SVE fmmla instruction.
......................................................................

arch-arm: Add support for Arm SVE fmmla instruction.

Add support for the Arm SVE Floating Point Matrix Multiply-Accumulate
(FMMLA) instruction. Both 32-bit element (single precision) and 64-bit
element (double precision) encodings are implemented, but because the
associated required instructions (LD1RO*, etc) have not yet been
implemented, the SVE Feature ID register 0 (ID_AA64ZFR0_EL1) has only
been updated to indicate 32-bit element support at this time.

For more information please refer to the "ARM Architecture Reference
Manual Supplement - The Scalable Vector Extension (SVE), for ARMv8-A"
(https://developer.arm.com/architectures/cpu-architecture/a-profile/
docs/arm-architecture-reference-manual-supplement-armv8-a)

Additional Contributors: Giacomo Travaglini

Change-Id: If3547378ffa48527fe540767399bcc37a5dab524
Reviewed-by: Richard Cooper richard.cooper@arm.com
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70726
Reviewed-by: Andreas Sandberg andreas.sandberg@arm.com
Maintainer: Andreas Sandberg andreas.sandberg@arm.com
Maintainer: Giacomo Travaglini giacomo.travaglini@arm.com
Tested-by: kokoro noreply+kokoro@google.com
Reviewed-by: Giacomo Travaglini giacomo.travaglini@arm.com

M src/arch/arm/ArmISA.py
M src/arch/arm/ArmSystem.py
M src/arch/arm/insts/sve.hh
A src/arch/arm/insts/vector_element_traits.hh
M src/arch/arm/isa/formats/sve_2nd_level.isa
M src/arch/arm/isa/formats/sve_top_level.isa
M src/arch/arm/isa/includes.isa
M src/arch/arm/isa/insts/sve.isa
M src/arch/arm/isa/operands.isa
M src/arch/arm/isa/templates/sve.isa
M src/arch/arm/process.cc
M src/arch/arm/regs/misc.cc
12 files changed, 291 insertions(+), 7 deletions(-)

Approvals:
Giacomo Travaglini: Looks good to me, approved; Looks good to me, approved
Andreas Sandberg: Looks good to me, approved; Looks good to me, approved
kokoro: Regressions pass

diff --git a/src/arch/arm/ArmISA.py b/src/arch/arm/ArmISA.py
index 37970dc..31ecbcb 100644
--- a/src/arch/arm/ArmISA.py
+++ b/src/arch/arm/ArmISA.py
@@ -53,6 +53,7 @@
"FEAT_LSE",
"FEAT_RDM",
# Armv8.2

   "FEAT_F32MM",
    "FEAT_SVE",
    # Armv8.3
    "FEAT_FCMA",

diff --git a/src/arch/arm/ArmSystem.py b/src/arch/arm/ArmSystem.py
index c1f5e9f..5a7ae79 100644
--- a/src/arch/arm/ArmSystem.py
+++ b/src/arch/arm/ArmSystem.py
@@ -78,6 +78,7 @@
"FEAT_UAO",
"FEAT_LVA", # Optional in Armv8.2
"FEAT_LPA", # Optional in Armv8.2

   "FEAT_F32MM",  # Optional in Armv8.2
    # Armv8.3
    "FEAT_FCMA",
    "FEAT_JSCVT",

@@ -163,6 +164,7 @@
"FEAT_LVA",
"FEAT_LPA",
"FEAT_SVE",

   "FEAT_F32MM",
    # Armv8.3
    "FEAT_FCMA",
    "FEAT_JSCVT",

@@ -196,6 +198,7 @@
"FEAT_LVA",
"FEAT_LPA",
"FEAT_SVE",

```
   "FEAT_F32MM",
]
```

diff --git a/src/arch/arm/insts/sve.hh b/src/arch/arm/insts/sve.hh
index de1163e..dc18ff3 100644
--- a/src/arch/arm/insts/sve.hh
+++ b/src/arch/arm/insts/sve.hh
@@ -498,7 +498,7 @@
Addr pc, const loader::SymbolTable *symtab) const override;
};

-///SVE2 Accumulate instructions
+/// Ternary, destructive, unpredicated SVE instruction.
class SveTerUnpredOp : public ArmStaticInst
{
protected:
diff --git a/src/arch/arm/insts/vector_element_traits.hh
b/src/arch/arm/insts/vector_element_traits.hh
new file mode 100644
index 0000000..3495bef
--- /dev/null
+++ b/src/arch/arm/insts/vector_element_traits.hh
@@ -0,0 +1,73 @@
+/*

- Copyright (c) 2020 ARM Limited
- All rights reserved
- The license below extends only to copyright in the software and shall
- not be construed as granting a license to any other intellectual
- property including but not limited to intellectual property relating
- to a hardware implementation of the functionality of the software
- licensed hereunder. You may use the software subject to the license
- terms below provided that you ensure that this notice is replicated
- unmodified and in its entirety in all distributions of the software,
- modified or unmodified, in source code or in binary form.
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions are
- met: redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer;
- redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimer in the
- documentation and/or other materials provided with the distribution;
- neither the name of the copyright holders nor the names of its
- contributors may be used to endorse or promote products derived from
- this software without specific prior written permission.
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

+#ifndef ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH
+#define ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH
+
+#include <type_traits>
+
+namespace gem5 {
+namespace ArmISA {
+namespace vector_element_traits {
+
+
+// Make an integral type with the size of IntDestElemType but the
+// signed-ness of IntSrcElemType. The size of IntDestElemType must be
+// greater than or equal to the size of IntSrcElemType.
+template<typename IntDestElemType,

```
    typename IntSrcElemType>
```

+class extend_element
+{

public:
static_assert(std::is_integral<IntDestElemType>::value

             && std::is_integral<IntSrcElemType>::value

             && sizeof(IntDestElemType) >= sizeof(IntSrcElemType),

             "Extended Element Dest and Src types must both be "

             "integer types, and Dest must be at least as large "

```
             "as Src.");
```
using type = typename std::conditional<

   std::is_signed<IntSrcElemType>::value,

   typename std::make_signed<IntDestElemType>::type,

   typename std::make_unsigned<IntDestElemType>::type>::type;

+};
+
+
+} // namespace vector_element_traits
+} // namespace ArmISA
+} // namespace gem5
+
+#endif // ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH
diff --git a/src/arch/arm/isa/formats/sve_2nd_level.isa
b/src/arch/arm/isa/formats/sve_2nd_level.isa
index 4281eeb..440722a 100644
--- a/src/arch/arm/isa/formats/sve_2nd_level.isa
+++ b/src/arch/arm/isa/formats/sve_2nd_level.isa
@@ -1,4 +1,4 @@
-// Copyright (c) 2017-2019 ARM Limited
+// Copyright (c) 2017-2020 ARM Limited
// All rights reserved
//
// The license below extends only to copyright in the software and shall
@@ -2883,6 +2883,29 @@
} // decodeSveFpFusedMulAdd

  StaticInstPtr

decodeSveFpFusedMatMulAdd(ExtMachInst machInst)
{

   RegIndex zda = (RegIndex) (uint8_t) bits(machInst, 4, 0);

   RegIndex zn = (RegIndex) (uint8_t) bits(machInst, 9, 5);

   RegIndex zm = (RegIndex) (uint8_t) bits(machInst, 20, 16);

   uint8_t size = bits(machInst, 23, 22);

```
   switch (size) {
```
```
     case 0x1:
```

       // BFMMLA goes here when implemented.

```
       return new Unknown64(machInst);
```
```
     case 0x2:
```

       return new SveFmmla<uint32_t,uint32_t,uint32_t>(

```
           machInst, zda, zn, zm);
```
```
     case 0x3:
```

       return new SveFmmla<uint64_t,uint64_t,uint64_t>(

```
           machInst, zda, zn, zm);
```
```
     default:
```
```
       return new Unknown64(machInst);
```
```
   }
```
} // decodeSveFpFusedMatMulAdd
StaticInstPtr
decodeSveFpCplxAdd(ExtMachInst machInst)
{
uint8_t size = bits(machInst, 23, 22);
diff --git a/src/arch/arm/isa/formats/sve_top_level.isa
b/src/arch/arm/isa/formats/sve_top_level.isa
index 41861a8..b0579fb 100644
--- a/src/arch/arm/isa/formats/sve_top_level.isa
+++ b/src/arch/arm/isa/formats/sve_top_level.isa
@@ -1,4 +1,4 @@
-// Copyright (c) 2017-2019 ARM Limited
+// Copyright (c) 2017-2020 ARM Limited
// All rights reserved
//
// The license below extends only to copyright in the software and shall
@@ -83,6 +83,7 @@
StaticInstPtr decodeSveFpUnaryPred(ExtMachInst machInst);
StaticInstPtr decodeSveFpCmpVec(ExtMachInst machInst);
StaticInstPtr decodeSveFpFusedMulAdd(ExtMachInst machInst);
StaticInstPtr decodeSveFpFusedMatMulAdd(ExtMachInst machInst);
StaticInstPtr decodeSveFpCplxAdd(ExtMachInst machInst);
StaticInstPtr decodeSveFpCplxMulAddVec(ExtMachInst machInst);
StaticInstPtr decodeSveFpMulAddIndexed(ExtMachInst machInst);
@@ -269,9 +270,10 @@
case 0:
return decodeSveFpMulAddIndexed(machInst);
case 4:

```
           if (!bits(machInst, 10))
```

```
           if (bits(machInst, 10))
```

              return decodeSveFpFusedMatMulAdd(machInst);

           else
                return decodeSveFpMulIndexed(machInst);

           [[fallthrough]];
          default:
            return new Unknown64(machInst);
        }

diff --git a/src/arch/arm/isa/includes.isa b/src/arch/arm/isa/includes.isa
index e2534a6..cde035a 100644
--- a/src/arch/arm/isa/includes.isa
+++ b/src/arch/arm/isa/includes.isa
@@ -66,6 +66,7 @@
#include "arch/arm/insts/sve.hh"
#include "arch/arm/insts/sve_mem.hh"
#include "arch/arm/insts/tme64.hh"
+#include "arch/arm/insts/vector_element_traits.hh"
#include "arch/arm/insts/vfp.hh"
#include "enums/DecoderFlavor.hh"
#include "mem/packet.hh"
diff --git a/src/arch/arm/isa/insts/sve.isa b/src/arch/arm/isa/insts/sve.isa
index 4c10cd6..74eacb8 100644
--- a/src/arch/arm/isa/insts/sve.isa
+++ b/src/arch/arm/isa/insts/sve.isa
@@ -1,4 +1,4 @@
-// Copyright (c) 2017-2019 ARM Limited
+// Copyright (c) 2017-2020 ARM Limited
// All rights reserved
//
// The license below extends only to copyright in the software and shall
@@ -2188,6 +2188,111 @@
'class_name' : 'Sve' + Name}
exec_output += SveOpExecDeclare.subst(substDict)

Generates definitions for ternary destructive SVE Matrix
Multiplication instructions (not predicated)
type_specs can either be a sequence of types for cases where
the dest and source matrices have the same element types, or a
sequence of 3-tuples for case where the dest and source matrices
have differnet element types.
The calculation Z = Z + A x B is performed for full matrices
Z (numDestRows x numDestCols), A (numDestRows x K), and
B(K x numDestCols), and remaining elemnts of Z are set to zero.
The vector length must be large enough for one full matrix or
an UndefinedInstruction Fault is generated.
def sveMatMulInst(name, Name, opClass, type_specs,

                 numDestRows, numDestCols, K,

```
                 elt_mul_op):
```
```
   global header_output, exec_output
```
```
   code = sveEnabledCheckCode + '''
```

   // Types of the extended versions of the source elements.

   // Required to make sure the itermediate calculations don't

overflow.

   using ExtendedElementA = typename vector_element_traits::

                              extend_element<DestElement,

                                             SrcElementA>::type;

   using ExtendedElementB = typename vector_element_traits::

                              extend_element<DestElement,

                                             SrcElementB>::type;

   // Element count of destination vector

   unsigned eCount = ArmStaticInst::getCurSveVecLen<DestElement>(

```
           xc->tcBase());
```

   // SVE Matrix operations require that there are at least 4

   // elements (one full matrix). Further matrices may be partial,

   // in which case the trailing dest elements are filled with zeros.

```
   if (eCount < 4) {
```

       return std::make_shared<UndefinedInstruction>(machInst, false,

                                                     "%(mnemonic)s");

```
   }
```

   // Some properties of the source and dest matrix dimensions

   //   ( numDestRows x numDestCols ) <- (numDestRows x K) .

   //                                        (K x numDestCols)

   constexpr unsigned numDestRows = %(numDestRows)d;

   constexpr unsigned numDestCols = %(numDestCols)d;

```
   constexpr unsigned K = %(K)d;
```

   constexpr unsigned eltsPerDestMatrix = numDestRows * numDestCols;

   constexpr unsigned eltsPerSrcAMatrix = numDestRows * K;

   constexpr unsigned eltsPerSrcBMatrix = K * numDestCols;

   // Number of full matrices - there may be some elements left over

   const unsigned mCount = eCount / eltsPerDestMatrix;

   // Calculate z_ij = Sum[k=1..K](a_ik * b_kj)

   unsigned zEltIdx = 0; // Index of the result element being produced

   unsigned aMatIdx = 0; // Index of the first element of the A matrix

   unsigned bMatIdx = 0; // Index of the first element of the B matrix

   for (unsigned matIdx = 0; matIdx < mCount; ++matIdx) {

       for (unsigned rowIdx = 0; rowIdx < numDestRows; ++rowIdx) {

           for (unsigned colIdx = 0; colIdx < numDestCols; ++colIdx) {

```
               DestElement destElem =
```

static_cast<DestElement>(AA64FpDestMerge_x[zEltIdx]);

               for (unsigned k = 0; k < K; ++k) {

                   const ExtendedElementA srcElemA =

                       static_cast<ExtendedElementA>

                                           (AA64FpOp1_srcA[aMatIdx +

K * rowIdx + k]);

                   const ExtendedElementB srcElemB =

                       static_cast<ExtendedElementB>

                                           (AA64FpOp2_srcB[bMatIdx +

K * colIdx + k]);
+

                   // Do the math operation. Should be of form:

                   //   destElem += f(destElem, srcElemA, srcElemB);

```
                   %(elt_mul_op)s;
```
```
               }
```

               AA64FpDest_x[zEltIdx++] = destElem;

```
           }
```
```
       }
```
```
       aMatIdx += eltsPerSrcAMatrix;
```
```
       bMatIdx += eltsPerSrcBMatrix;
```
```
   }
```
```
   // Zero-fill any trailing elements
```

   for (unsigned i = mCount * eltsPerDestMatrix; i < eCount; ++i) {

       AA64FpDest_x[i] = static_cast<DestElement>(0);

```
   }
```

   ''' % {'elt_mul_op': elt_mul_op, 'mnemonic': name,

          'numDestRows': numDestRows, 'numDestCols': numDestCols,

```
          'K': K}
```

   iop = InstObjParams(name, 'Sve' + Name, 'SveTerUnpredOp',

                       {'code': code, 'op_class': opClass}, [])

   header_output += SveMatMulOpDeclare.subst(iop)

   exec_output += SveMatMulOpExecute.subst(iop)

```
   for type_spec in type_specs:
```
```
       try:
```

           destEltType, srcEltAType, srcEltBType = type_spec

```
       except ValueError:
```

           destEltType, srcEltAType, srcEltBType = (type_spec,) * 3

       substDict = {'destEltType': destEltType,

                    'srcEltAType': srcEltAType,

                    'srcEltBType': srcEltBType,

                    'class_name': 'Sve' + Name}

       exec_output += SveMatMulOpExecDeclare.subst(substDict)

# Generates definitions for PTRUE and PTRUES instructions.
def svePtrueInst(name, Name, opClass, types, isFlagSetting=False,
                 decoder='Generic'):

@@ -3822,6 +3927,16 @@
# FMLS (indexed)
sveTerIdxInst('fmls', 'FmlsIdx', 'SimdFloatMultAccOp', floatTypes,
fmlsCode, PredType.MERGE)
+

fmmlaCode = fpOp % '''
```
   fplibAdd<DestElement>(destElem,
```

       fplibMul<DestElement>(srcElemA, srcElemB, fpscr), fpscr);

'''
FMMLA (vectors)
sveMatMulInst('fmmla', 'Fmmla', 'SimdFloatMultAccOp', floatTypes,

             numDestRows=2, numDestCols=2, K=2,

```
             elt_mul_op=fmmlaCode)
```

# FMLS (vectors)
sveTerInst('fmls', 'Fmls', 'SimdFloatMultAccOp', floatTypes, fmlsCode,
           PredType.MERGE)

diff --git a/src/arch/arm/isa/operands.isa b/src/arch/arm/isa/operands.isa
index 24a0af9..5bba00f 100644
--- a/src/arch/arm/isa/operands.isa
+++ b/src/arch/arm/isa/operands.isa
@@ -60,6 +60,8 @@
'xs1' : 'TPS1Elem',
'xs2' : 'TPS2Elem',
'xd' : 'TPDElem',

'srcA' : 'TPSrcAElem',
'srcB' : 'TPSrcBElem',
'pc' : 'ArmISA::VecPredRegContainer',
'pb' : 'uint8_t'
}};
diff --git a/src/arch/arm/isa/templates/sve.isa
b/src/arch/arm/isa/templates/sve.isa
index 886fd7a..65abb1b 100644
--- a/src/arch/arm/isa/templates/sve.isa
+++ b/src/arch/arm/isa/templates/sve.isa
@@ -1,4 +1,4 @@
-// Copyright (c) 2018-2019 ARM Limited
+// Copyright (c) 2018-2020 ARM Limited
// All rights reserved
//
// The license below extends only to copyright in the software and shall
@@ -515,6 +515,33 @@
};
}};

+def template SveMatMulOpDeclare {{
+template <typename DestElement,

```
     typename SrcElementA,
```
```
     typename SrcElementB>
```

+class %(class_name)s : public %(base_class)s
+{

private:
%(reg_idx_arr_decl)s;
protected:
typedef DestElement TPElem;
typedef SrcElementA TPSrcAElem;
typedef SrcElementB TPSrcBElem;
public:
// Constructor
%(class_name)s(ExtMachInst machInst,

              RegIndex _dest, RegIndex _op1, RegIndex _op2)

   : %(base_class)s("%(mnemonic)s", machInst, %(op_class)s,

```
                    _dest, _op1, _op2)
```
{
```
   %(set_reg_idx_arr)s;
```
```
   %(constructor)s;
```
}
Fault execute(ExecContext *, trace::InstRecord *) const override;
+};
+}};
def template SveTerImmUnpredOpDeclare {{
template <class _Element>
class %(class_name)s : public %(base_class)s
@@ -1310,3 +1337,32 @@
Fault %(class_name)s<%(targs)s>::execute(
ExecContext *, trace::InstRecord *) const;
}};

+def template SveMatMulOpExecute {{

template <typename DestElement,
```
         typename SrcElementA,
```
```
         typename SrcElementB>
```
Fault %(class_name)s<DestElement,SrcElementA,SrcElementB>::execute(
```
       ExecContext *xc,
```

       trace::InstRecord *traceData) const

{
```
   Fault fault = NoFault;
```
```
   %(op_decl)s;
```
```
   %(op_rd)s;
```
```
   %(code)s;
```
```
   if (fault == NoFault)
```
```
   {
```
```
       %(op_wb)s;
```
```
   }
```
```
   return fault;
```
}
+}};

+def template SveMatMulOpExecDeclare {{

template
Fault
%(class_name)s<%(destEltType)s,%(srcEltAType)s,%(srcEltBType)s>
::execute(ExecContext *, trace::InstRecord *) const;
+}};
diff --git a/src/arch/arm/process.cc b/src/arch/arm/process.cc
index fda9415..24e1250 100644
--- a/src/arch/arm/process.cc
+++ b/src/arch/arm/process.cc
@@ -320,6 +320,9 @@
hwcap |= (isa_r0.ts >= 2) ? Arm_Flagm2 : Arm_None;
hwcap |= (isa_r0.rndr >= 1) ? Arm_Rng : Arm_None;
const AA64ZFR0 zf_r0 = tc->readMiscReg(MISCREG_ID_AA64ZFR0_EL1);
hwcap |= (zf_r0.f32mm >= 1) ? Arm_Svef32mm : Arm_None;
```
return hwcap;
```
}

diff --git a/src/arch/arm/regs/misc.cc b/src/arch/arm/regs/misc.cc
index 53e9268..8925bc0 100644
--- a/src/arch/arm/regs/misc.cc
+++ b/src/arch/arm/regs/misc.cc
@@ -5403,6 +5403,11 @@

  // SVE
  InitReg(MISCREG_ID_AA64ZFR0_EL1)

```
   .reset([this](){
```
```
       AA64ZFR0 zfr0_el1 = 0;
```

       zfr0_el1.f32mm = release->has(ArmExtension::FEAT_F32MM) ? 1 :

```
       return zfr0_el1;
```

   }())
    .faultRead(EL0, faultIdst)
    .faultRead(EL1, HCR_TRAP(tid3))
    .allPrivileges().exceptUserMode().writes(0);

--
To view, visit
https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email
To unsubscribe, or for help writing mail filters, visit
https://gem5-review.googlesource.com/settings?usp=email

Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If3547378ffa48527fe540767399bcc37a5dab524
Gerrit-Change-Number: 70726
Gerrit-PatchSet: 7
Gerrit-Owner: Giacomo Travaglini giacomo.travaglini@arm.com
Gerrit-Reviewer: Andreas Sandberg andreas.sandberg@arm.com
Gerrit-Reviewer: Bobby Bruce bbruce@ucdavis.edu
Gerrit-Reviewer: Giacomo Travaglini giacomo.travaglini@arm.com
Gerrit-Reviewer: Jason Lowe-Power power.jg@gmail.com
Gerrit-Reviewer: kokoro noreply+kokoro@google.com
Gerrit-CC: Richard Cooper richard.cooper@arm.com

Bobby Bruce has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email ) Change subject: arch-arm: Add support for Arm SVE fmmla instruction. ...................................................................... arch-arm: Add support for Arm SVE fmmla instruction. Add support for the Arm SVE Floating Point Matrix Multiply-Accumulate (FMMLA) instruction. Both 32-bit element (single precision) and 64-bit element (double precision) encodings are implemented, but because the associated required instructions (LD1RO*, etc) have not yet been implemented, the SVE Feature ID register 0 (ID_AA64ZFR0_EL1) has only been updated to indicate 32-bit element support at this time. For more information please refer to the "ARM Architecture Reference Manual Supplement - The Scalable Vector Extension (SVE), for ARMv8-A" (https://developer.arm.com/architectures/cpu-architecture/a-profile/ docs/arm-architecture-reference-manual-supplement-armv8-a) Additional Contributors: Giacomo Travaglini Change-Id: If3547378ffa48527fe540767399bcc37a5dab524 Reviewed-by: Richard Cooper <richard.cooper@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70726 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> --- M src/arch/arm/ArmISA.py M src/arch/arm/ArmSystem.py M src/arch/arm/insts/sve.hh A src/arch/arm/insts/vector_element_traits.hh M src/arch/arm/isa/formats/sve_2nd_level.isa M src/arch/arm/isa/formats/sve_top_level.isa M src/arch/arm/isa/includes.isa M src/arch/arm/isa/insts/sve.isa M src/arch/arm/isa/operands.isa M src/arch/arm/isa/templates/sve.isa M src/arch/arm/process.cc M src/arch/arm/regs/misc.cc 12 files changed, 291 insertions(+), 7 deletions(-) Approvals: Giacomo Travaglini: Looks good to me, approved; Looks good to me, approved Andreas Sandberg: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/arm/ArmISA.py b/src/arch/arm/ArmISA.py index 37970dc..31ecbcb 100644 --- a/src/arch/arm/ArmISA.py +++ b/src/arch/arm/ArmISA.py @@ -53,6 +53,7 @@ "FEAT_LSE", "FEAT_RDM", # Armv8.2 + "FEAT_F32MM", "FEAT_SVE", # Armv8.3 "FEAT_FCMA", diff --git a/src/arch/arm/ArmSystem.py b/src/arch/arm/ArmSystem.py index c1f5e9f..5a7ae79 100644 --- a/src/arch/arm/ArmSystem.py +++ b/src/arch/arm/ArmSystem.py @@ -78,6 +78,7 @@ "FEAT_UAO", "FEAT_LVA", # Optional in Armv8.2 "FEAT_LPA", # Optional in Armv8.2 + "FEAT_F32MM", # Optional in Armv8.2 # Armv8.3 "FEAT_FCMA", "FEAT_JSCVT", @@ -163,6 +164,7 @@ "FEAT_LVA", "FEAT_LPA", "FEAT_SVE", + "FEAT_F32MM", # Armv8.3 "FEAT_FCMA", "FEAT_JSCVT", @@ -196,6 +198,7 @@ "FEAT_LVA", "FEAT_LPA", "FEAT_SVE", + "FEAT_F32MM", ] diff --git a/src/arch/arm/insts/sve.hh b/src/arch/arm/insts/sve.hh index de1163e..dc18ff3 100644 --- a/src/arch/arm/insts/sve.hh +++ b/src/arch/arm/insts/sve.hh @@ -498,7 +498,7 @@ Addr pc, const loader::SymbolTable *symtab) const override; }; -///SVE2 Accumulate instructions +/// Ternary, destructive, unpredicated SVE instruction. class SveTerUnpredOp : public ArmStaticInst { protected: diff --git a/src/arch/arm/insts/vector_element_traits.hh b/src/arch/arm/insts/vector_element_traits.hh new file mode 100644 index 0000000..3495bef --- /dev/null +++ b/src/arch/arm/insts/vector_element_traits.hh @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2020 ARM Limited + * All rights reserved + * + * The license below extends only to copyright in the software and shall + * not be construed as granting a license to any other intellectual + * property including but not limited to intellectual property relating + * to a hardware implementation of the functionality of the software + * licensed hereunder. You may use the software subject to the license + * terms below provided that you ensure that this notice is replicated + * unmodified and in its entirety in all distributions of the software, + * modified or unmodified, in source code or in binary form. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer; + * redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution; + * neither the name of the copyright holders nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH__ +#define __ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH__ + +#include <type_traits> + +namespace gem5 { +namespace ArmISA { +namespace vector_element_traits { + + +// Make an integral type with the size of IntDestElemType but the +// signed-ness of IntSrcElemType. The size of IntDestElemType must be +// greater than or equal to the size of IntSrcElemType. +template<typename IntDestElemType, + typename IntSrcElemType> +class extend_element +{ + public: + static_assert(std::is_integral<IntDestElemType>::value + && std::is_integral<IntSrcElemType>::value + && sizeof(IntDestElemType) >= sizeof(IntSrcElemType), + "Extended Element Dest and Src types must both be " + "integer types, and Dest must be at least as large " + "as Src."); + using type = typename std::conditional< + std::is_signed<IntSrcElemType>::value, + typename std::make_signed<IntDestElemType>::type, + typename std::make_unsigned<IntDestElemType>::type>::type; +}; + + +} // namespace vector_element_traits +} // namespace ArmISA +} // namespace gem5 + +#endif // __ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH__ diff --git a/src/arch/arm/isa/formats/sve_2nd_level.isa b/src/arch/arm/isa/formats/sve_2nd_level.isa index 4281eeb..440722a 100644 --- a/src/arch/arm/isa/formats/sve_2nd_level.isa +++ b/src/arch/arm/isa/formats/sve_2nd_level.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2019 ARM Limited +// Copyright (c) 2017-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -2883,6 +2883,29 @@ } // decodeSveFpFusedMulAdd StaticInstPtr + decodeSveFpFusedMatMulAdd(ExtMachInst machInst) + { + RegIndex zda = (RegIndex) (uint8_t) bits(machInst, 4, 0); + RegIndex zn = (RegIndex) (uint8_t) bits(machInst, 9, 5); + RegIndex zm = (RegIndex) (uint8_t) bits(machInst, 20, 16); + + uint8_t size = bits(machInst, 23, 22); + switch (size) { + case 0x1: + // BFMMLA goes here when implemented. + return new Unknown64(machInst); + case 0x2: + return new SveFmmla<uint32_t,uint32_t,uint32_t>( + machInst, zda, zn, zm); + case 0x3: + return new SveFmmla<uint64_t,uint64_t,uint64_t>( + machInst, zda, zn, zm); + default: + return new Unknown64(machInst); + } + } // decodeSveFpFusedMatMulAdd + + StaticInstPtr decodeSveFpCplxAdd(ExtMachInst machInst) { uint8_t size = bits(machInst, 23, 22); diff --git a/src/arch/arm/isa/formats/sve_top_level.isa b/src/arch/arm/isa/formats/sve_top_level.isa index 41861a8..b0579fb 100644 --- a/src/arch/arm/isa/formats/sve_top_level.isa +++ b/src/arch/arm/isa/formats/sve_top_level.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2019 ARM Limited +// Copyright (c) 2017-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -83,6 +83,7 @@ StaticInstPtr decodeSveFpUnaryPred(ExtMachInst machInst); StaticInstPtr decodeSveFpCmpVec(ExtMachInst machInst); StaticInstPtr decodeSveFpFusedMulAdd(ExtMachInst machInst); + StaticInstPtr decodeSveFpFusedMatMulAdd(ExtMachInst machInst); StaticInstPtr decodeSveFpCplxAdd(ExtMachInst machInst); StaticInstPtr decodeSveFpCplxMulAddVec(ExtMachInst machInst); StaticInstPtr decodeSveFpMulAddIndexed(ExtMachInst machInst); @@ -269,9 +270,10 @@ case 0: return decodeSveFpMulAddIndexed(machInst); case 4: - if (!bits(machInst, 10)) + if (bits(machInst, 10)) + return decodeSveFpFusedMatMulAdd(machInst); + else return decodeSveFpMulIndexed(machInst); - [[fallthrough]]; default: return new Unknown64(machInst); } diff --git a/src/arch/arm/isa/includes.isa b/src/arch/arm/isa/includes.isa index e2534a6..cde035a 100644 --- a/src/arch/arm/isa/includes.isa +++ b/src/arch/arm/isa/includes.isa @@ -66,6 +66,7 @@ #include "arch/arm/insts/sve.hh" #include "arch/arm/insts/sve_mem.hh" #include "arch/arm/insts/tme64.hh" +#include "arch/arm/insts/vector_element_traits.hh" #include "arch/arm/insts/vfp.hh" #include "enums/DecoderFlavor.hh" #include "mem/packet.hh" diff --git a/src/arch/arm/isa/insts/sve.isa b/src/arch/arm/isa/insts/sve.isa index 4c10cd6..74eacb8 100644 --- a/src/arch/arm/isa/insts/sve.isa +++ b/src/arch/arm/isa/insts/sve.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2019 ARM Limited +// Copyright (c) 2017-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -2188,6 +2188,111 @@ 'class_name' : 'Sve' + Name} exec_output += SveOpExecDeclare.subst(substDict) + # Generates definitions for ternary destructive SVE Matrix + # Multiplication instructions (not predicated) + # + # `type_specs` can either be a sequence of types for cases where + # the dest and source matrices have the same element types, or a + # sequence of 3-tuples for case where the dest and source matrices + # have differnet element types. + # + # The calculation Z = Z + A x B is performed for full matrices + # Z (numDestRows x numDestCols), A (numDestRows x K), and + # B(K x numDestCols), and remaining elemnts of Z are set to zero. + # The vector length must be large enough for one full matrix or + # an UndefinedInstruction Fault is generated. + # + def sveMatMulInst(name, Name, opClass, type_specs, + numDestRows, numDestCols, K, + elt_mul_op): + global header_output, exec_output + code = sveEnabledCheckCode + ''' + // Types of the extended versions of the source elements. + // Required to make sure the itermediate calculations don't overflow. + using ExtendedElementA = typename vector_element_traits:: + extend_element<DestElement, + SrcElementA>::type; + using ExtendedElementB = typename vector_element_traits:: + extend_element<DestElement, + SrcElementB>::type; + + // Element count of destination vector + unsigned eCount = ArmStaticInst::getCurSveVecLen<DestElement>( + xc->tcBase()); + + // SVE Matrix operations require that there are at least 4 + // elements (one full matrix). Further matrices may be partial, + // in which case the trailing dest elements are filled with zeros. + if (eCount < 4) { + return std::make_shared<UndefinedInstruction>(machInst, false, + "%(mnemonic)s"); + } + + // Some properties of the source and dest matrix dimensions + // ( numDestRows x numDestCols ) <- (numDestRows x K) . + // (K x numDestCols) + constexpr unsigned numDestRows = %(numDestRows)d; + constexpr unsigned numDestCols = %(numDestCols)d; + constexpr unsigned K = %(K)d; + + constexpr unsigned eltsPerDestMatrix = numDestRows * numDestCols; + constexpr unsigned eltsPerSrcAMatrix = numDestRows * K; + constexpr unsigned eltsPerSrcBMatrix = K * numDestCols; + + // Number of full matrices - there may be some elements left over + const unsigned mCount = eCount / eltsPerDestMatrix; + + // Calculate z_ij = Sum[k=1..K](a_ik * b_kj) + + unsigned zEltIdx = 0; // Index of the result element being produced + unsigned aMatIdx = 0; // Index of the first element of the A matrix + unsigned bMatIdx = 0; // Index of the first element of the B matrix + for (unsigned matIdx = 0; matIdx < mCount; ++matIdx) { + for (unsigned rowIdx = 0; rowIdx < numDestRows; ++rowIdx) { + for (unsigned colIdx = 0; colIdx < numDestCols; ++colIdx) { + DestElement destElem = + static_cast<DestElement>(AA64FpDestMerge_x[zEltIdx]); + for (unsigned k = 0; k < K; ++k) { + const ExtendedElementA srcElemA = + static_cast<ExtendedElementA> + (AA64FpOp1_srcA[aMatIdx + K * rowIdx + k]); + const ExtendedElementB srcElemB = + static_cast<ExtendedElementB> + (AA64FpOp2_srcB[bMatIdx + K * colIdx + k]); + + // Do the math operation. Should be of form: + // destElem += f(destElem, srcElemA, srcElemB); + %(elt_mul_op)s; + } + AA64FpDest_x[zEltIdx++] = destElem; + } + } + aMatIdx += eltsPerSrcAMatrix; + bMatIdx += eltsPerSrcBMatrix; + } + + // Zero-fill any trailing elements + for (unsigned i = mCount * eltsPerDestMatrix; i < eCount; ++i) { + AA64FpDest_x[i] = static_cast<DestElement>(0); + } + ''' % {'elt_mul_op': elt_mul_op, 'mnemonic': name, + 'numDestRows': numDestRows, 'numDestCols': numDestCols, + 'K': K} + iop = InstObjParams(name, 'Sve' + Name, 'SveTerUnpredOp', + {'code': code, 'op_class': opClass}, []) + header_output += SveMatMulOpDeclare.subst(iop) + exec_output += SveMatMulOpExecute.subst(iop) + for type_spec in type_specs: + try: + destEltType, srcEltAType, srcEltBType = type_spec + except ValueError: + destEltType, srcEltAType, srcEltBType = (type_spec,) * 3 + substDict = {'destEltType': destEltType, + 'srcEltAType': srcEltAType, + 'srcEltBType': srcEltBType, + 'class_name': 'Sve' + Name} + exec_output += SveMatMulOpExecDeclare.subst(substDict) + # Generates definitions for PTRUE and PTRUES instructions. def svePtrueInst(name, Name, opClass, types, isFlagSetting=False, decoder='Generic'): @@ -3822,6 +3927,16 @@ # FMLS (indexed) sveTerIdxInst('fmls', 'FmlsIdx', 'SimdFloatMultAccOp', floatTypes, fmlsCode, PredType.MERGE) + + fmmlaCode = fpOp % ''' + fplibAdd<DestElement>(destElem, + fplibMul<DestElement>(srcElemA, srcElemB, fpscr), fpscr); + ''' + # FMMLA (vectors) + sveMatMulInst('fmmla', 'Fmmla', 'SimdFloatMultAccOp', floatTypes, + numDestRows=2, numDestCols=2, K=2, + elt_mul_op=fmmlaCode) + # FMLS (vectors) sveTerInst('fmls', 'Fmls', 'SimdFloatMultAccOp', floatTypes, fmlsCode, PredType.MERGE) diff --git a/src/arch/arm/isa/operands.isa b/src/arch/arm/isa/operands.isa index 24a0af9..5bba00f 100644 --- a/src/arch/arm/isa/operands.isa +++ b/src/arch/arm/isa/operands.isa @@ -60,6 +60,8 @@ 'xs1' : 'TPS1Elem', 'xs2' : 'TPS2Elem', 'xd' : 'TPDElem', + 'srcA' : 'TPSrcAElem', + 'srcB' : 'TPSrcBElem', 'pc' : 'ArmISA::VecPredRegContainer', 'pb' : 'uint8_t' }}; diff --git a/src/arch/arm/isa/templates/sve.isa b/src/arch/arm/isa/templates/sve.isa index 886fd7a..65abb1b 100644 --- a/src/arch/arm/isa/templates/sve.isa +++ b/src/arch/arm/isa/templates/sve.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2018-2019 ARM Limited +// Copyright (c) 2018-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -515,6 +515,33 @@ }; }}; +def template SveMatMulOpDeclare {{ +template <typename DestElement, + typename SrcElementA, + typename SrcElementB> +class %(class_name)s : public %(base_class)s +{ + private: + %(reg_idx_arr_decl)s; + protected: + typedef DestElement TPElem; + typedef SrcElementA TPSrcAElem; + typedef SrcElementB TPSrcBElem; + public: + // Constructor + %(class_name)s(ExtMachInst machInst, + RegIndex _dest, RegIndex _op1, RegIndex _op2) + : %(base_class)s("%(mnemonic)s", machInst, %(op_class)s, + _dest, _op1, _op2) + { + %(set_reg_idx_arr)s; + %(constructor)s; + } + + Fault execute(ExecContext *, trace::InstRecord *) const override; +}; +}}; + def template SveTerImmUnpredOpDeclare {{ template <class _Element> class %(class_name)s : public %(base_class)s @@ -1310,3 +1337,32 @@ Fault %(class_name)s<%(targs)s>::execute( ExecContext *, trace::InstRecord *) const; }}; + +def template SveMatMulOpExecute {{ + template <typename DestElement, + typename SrcElementA, + typename SrcElementB> + Fault %(class_name)s<DestElement,SrcElementA,SrcElementB>::execute( + ExecContext *xc, + trace::InstRecord *traceData) const + { + Fault fault = NoFault; + %(op_decl)s; + %(op_rd)s; + + %(code)s; + if (fault == NoFault) + { + %(op_wb)s; + } + + return fault; + } +}}; + +def template SveMatMulOpExecDeclare {{ + template + Fault + %(class_name)s<%(destEltType)s,%(srcEltAType)s,%(srcEltBType)s> + ::execute(ExecContext *, trace::InstRecord *) const; +}}; diff --git a/src/arch/arm/process.cc b/src/arch/arm/process.cc index fda9415..24e1250 100644 --- a/src/arch/arm/process.cc +++ b/src/arch/arm/process.cc @@ -320,6 +320,9 @@ hwcap |= (isa_r0.ts >= 2) ? Arm_Flagm2 : Arm_None; hwcap |= (isa_r0.rndr >= 1) ? Arm_Rng : Arm_None; + const AA64ZFR0 zf_r0 = tc->readMiscReg(MISCREG_ID_AA64ZFR0_EL1); + hwcap |= (zf_r0.f32mm >= 1) ? Arm_Svef32mm : Arm_None; + return hwcap; } diff --git a/src/arch/arm/regs/misc.cc b/src/arch/arm/regs/misc.cc index 53e9268..8925bc0 100644 --- a/src/arch/arm/regs/misc.cc +++ b/src/arch/arm/regs/misc.cc @@ -5403,6 +5403,11 @@ // SVE InitReg(MISCREG_ID_AA64ZFR0_EL1) + .reset([this](){ + AA64ZFR0 zfr0_el1 = 0; + zfr0_el1.f32mm = release->has(ArmExtension::FEAT_F32MM) ? 1 : 0; + return zfr0_el1; + }()) .faultRead(EL0, faultIdst) .faultRead(EL1, HCR_TRAP(tid3)) .allPrivileges().exceptUserMode().writes(0); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If3547378ffa48527fe540767399bcc37a5dab524 Gerrit-Change-Number: 70726 Gerrit-PatchSet: 7 Gerrit-Owner: Giacomo Travaglini <giacomo.travaglini@arm.com> Gerrit-Reviewer: Andreas Sandberg <andreas.sandberg@arm.com> Gerrit-Reviewer: Bobby Bruce <bbruce@ucdavis.edu> Gerrit-Reviewer: Giacomo Travaglini <giacomo.travaglini@arm.com> Gerrit-Reviewer: Jason Lowe-Power <power.jg@gmail.com> Gerrit-Reviewer: kokoro <noreply+kokoro@google.com> Gerrit-CC: Richard Cooper <richard.cooper@arm.com>

gem5-dev@gem5.org

[L] Change in gem5/gem5[develop]: arch-arm: Add support for Arm SVE fmmla instruction.

Generates definitions for ternary destructive SVE Matrix

Multiplication instructions (not predicated)

type_specs can either be a sequence of types for cases where

the dest and source matrices have the same element types, or a

sequence of 3-tuples for case where the dest and source matrices

have differnet element types.

The calculation Z = Z + A x B is performed for full matrices

Z (numDestRows x numDestCols), A (numDestRows x K), and

B(K x numDestCols), and remaining elemnts of Z are set to zero.

The vector length must be large enough for one full matrix or

an UndefinedInstruction Fault is generated.

FMMLA (vectors)

`type_specs` can either be a sequence of types for cases where