gem5-dev@gem5.org

The gem5 Developer List

View all threads

[L] Change in gem5/gem5[develop]: arch-arm: Add support for Arm SVE fmmla instruction.

BB
Bobby Bruce (Gerrit)
Thu, May 25, 2023 9:36 PM

Bobby Bruce has submitted this change. (
https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email )

Change subject: arch-arm: Add support for Arm SVE fmmla instruction.
......................................................................

arch-arm: Add support for Arm SVE fmmla instruction.

Add support for the Arm SVE Floating Point Matrix Multiply-Accumulate
(FMMLA) instruction. Both 32-bit element (single precision) and 64-bit
element (double precision) encodings are implemented, but because the
associated required instructions (LD1RO*, etc) have not yet been
implemented, the SVE Feature ID register 0 (ID_AA64ZFR0_EL1) has only
been updated to indicate 32-bit element support at this time.

For more information please refer to the "ARM Architecture Reference
Manual Supplement - The Scalable Vector Extension (SVE), for ARMv8-A"
(https://developer.arm.com/architectures/cpu-architecture/a-profile/
docs/arm-architecture-reference-manual-supplement-armv8-a)

Additional Contributors: Giacomo Travaglini

Change-Id: If3547378ffa48527fe540767399bcc37a5dab524
Reviewed-by: Richard Cooper richard.cooper@arm.com
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70726
Reviewed-by: Andreas Sandberg andreas.sandberg@arm.com
Maintainer: Andreas Sandberg andreas.sandberg@arm.com
Maintainer: Giacomo Travaglini giacomo.travaglini@arm.com
Tested-by: kokoro noreply+kokoro@google.com
Reviewed-by: Giacomo Travaglini giacomo.travaglini@arm.com

M src/arch/arm/ArmISA.py
M src/arch/arm/ArmSystem.py
M src/arch/arm/insts/sve.hh
A src/arch/arm/insts/vector_element_traits.hh
M src/arch/arm/isa/formats/sve_2nd_level.isa
M src/arch/arm/isa/formats/sve_top_level.isa
M src/arch/arm/isa/includes.isa
M src/arch/arm/isa/insts/sve.isa
M src/arch/arm/isa/operands.isa
M src/arch/arm/isa/templates/sve.isa
M src/arch/arm/process.cc
M src/arch/arm/regs/misc.cc
12 files changed, 291 insertions(+), 7 deletions(-)

Approvals:
Giacomo Travaglini: Looks good to me, approved; Looks good to me, approved
Andreas Sandberg: Looks good to me, approved; Looks good to me, approved
kokoro: Regressions pass

diff --git a/src/arch/arm/ArmISA.py b/src/arch/arm/ArmISA.py
index 37970dc..31ecbcb 100644
--- a/src/arch/arm/ArmISA.py
+++ b/src/arch/arm/ArmISA.py
@@ -53,6 +53,7 @@
"FEAT_LSE",
"FEAT_RDM",
# Armv8.2

  •    "FEAT_F32MM",
        "FEAT_SVE",
        # Armv8.3
        "FEAT_FCMA",
    

diff --git a/src/arch/arm/ArmSystem.py b/src/arch/arm/ArmSystem.py
index c1f5e9f..5a7ae79 100644
--- a/src/arch/arm/ArmSystem.py
+++ b/src/arch/arm/ArmSystem.py
@@ -78,6 +78,7 @@
"FEAT_UAO",
"FEAT_LVA",  # Optional in Armv8.2
"FEAT_LPA",  # Optional in Armv8.2

  •    "FEAT_F32MM",  # Optional in Armv8.2
        # Armv8.3
        "FEAT_FCMA",
        "FEAT_JSCVT",
    

@@ -163,6 +164,7 @@
"FEAT_LVA",
"FEAT_LPA",
"FEAT_SVE",

  •    "FEAT_F32MM",
        # Armv8.3
        "FEAT_FCMA",
        "FEAT_JSCVT",
    

@@ -196,6 +198,7 @@
"FEAT_LVA",
"FEAT_LPA",
"FEAT_SVE",

  •    "FEAT_F32MM",
    ]
    

diff --git a/src/arch/arm/insts/sve.hh b/src/arch/arm/insts/sve.hh
index de1163e..dc18ff3 100644
--- a/src/arch/arm/insts/sve.hh
+++ b/src/arch/arm/insts/sve.hh
@@ -498,7 +498,7 @@
Addr pc, const loader::SymbolTable *symtab) const override;
};

-///SVE2 Accumulate instructions
+/// Ternary, destructive, unpredicated SVE instruction.
class SveTerUnpredOp : public ArmStaticInst
{
protected:
diff --git a/src/arch/arm/insts/vector_element_traits.hh
b/src/arch/arm/insts/vector_element_traits.hh
new file mode 100644
index 0000000..3495bef
--- /dev/null
+++ b/src/arch/arm/insts/vector_element_traits.hh
@@ -0,0 +1,73 @@
+/*

    • Copyright (c) 2020 ARM Limited
    • All rights reserved
    • The license below extends only to copyright in the software and shall
    • not be construed as granting a license to any other intellectual
    • property including but not limited to intellectual property relating
    • to a hardware implementation of the functionality of the software
    • licensed hereunder.  You may use the software subject to the license
    • terms below provided that you ensure that this notice is replicated
    • unmodified and in its entirety in all distributions of the software,
    • modified or unmodified, in source code or in binary form.
    • Redistribution and use in source and binary forms, with or without
    • modification, are permitted provided that the following conditions are
    • met: redistributions of source code must retain the above copyright
    • notice, this list of conditions and the following disclaimer;
    • redistributions in binary form must reproduce the above copyright
    • notice, this list of conditions and the following disclaimer in the
    • documentation and/or other materials provided with the distribution;
    • neither the name of the copyright holders nor the names of its
    • contributors may be used to endorse or promote products derived from
    • this software without specific prior written permission.
    • THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    • "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    • LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    • A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    • OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    • SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    • LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    • DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    • THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    • (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    • OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • */

+#ifndef ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH
+#define ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH
+
+#include <type_traits>
+
+namespace gem5 {
+namespace ArmISA {
+namespace vector_element_traits {
+
+
+// Make an integral type with the size of IntDestElemType but the
+// signed-ness of IntSrcElemType. The size of IntDestElemType must be
+// greater than or equal to the size of IntSrcElemType.
+template<typename IntDestElemType,

  •     typename IntSrcElemType>
    

+class extend_element
+{

  • public:
  • static_assert(std::is_integral<IntDestElemType>::value
  •              && std::is_integral<IntSrcElemType>::value
    
  •              && sizeof(IntDestElemType) >= sizeof(IntSrcElemType),
    
  •              "Extended Element Dest and Src types must both be "
    
  •              "integer types, and Dest must be at least as large "
    
  •              "as Src.");
    
  • using type = typename std::conditional<
  •    std::is_signed<IntSrcElemType>::value,
    
  •    typename std::make_signed<IntDestElemType>::type,
    
  •    typename std::make_unsigned<IntDestElemType>::type>::type;
    

+};
+
+
+} // namespace vector_element_traits
+} // namespace ArmISA
+} // namespace gem5
+
+#endif // ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH
diff --git a/src/arch/arm/isa/formats/sve_2nd_level.isa
b/src/arch/arm/isa/formats/sve_2nd_level.isa
index 4281eeb..440722a 100644
--- a/src/arch/arm/isa/formats/sve_2nd_level.isa
+++ b/src/arch/arm/isa/formats/sve_2nd_level.isa
@@ -1,4 +1,4 @@
-// Copyright (c) 2017-2019 ARM Limited
+// Copyright (c) 2017-2020 ARM Limited
// All rights reserved
//
// The license below extends only to copyright in the software and shall
@@ -2883,6 +2883,29 @@
}  // decodeSveFpFusedMulAdd

  StaticInstPtr
  • decodeSveFpFusedMatMulAdd(ExtMachInst machInst)
  • {
  •    RegIndex zda = (RegIndex) (uint8_t) bits(machInst, 4, 0);
    
  •    RegIndex zn = (RegIndex) (uint8_t) bits(machInst, 9, 5);
    
  •    RegIndex zm = (RegIndex) (uint8_t) bits(machInst, 20, 16);
    
  •    uint8_t size = bits(machInst, 23, 22);
    
  •    switch (size) {
    
  •      case 0x1:
    
  •        // BFMMLA goes here when implemented.
    
  •        return new Unknown64(machInst);
    
  •      case 0x2:
    
  •        return new SveFmmla<uint32_t,uint32_t,uint32_t>(
    
  •            machInst, zda, zn, zm);
    
  •      case 0x3:
    
  •        return new SveFmmla<uint64_t,uint64_t,uint64_t>(
    
  •            machInst, zda, zn, zm);
    
  •      default:
    
  •        return new Unknown64(machInst);
    
  •    }
    
  • }  // decodeSveFpFusedMatMulAdd
  • StaticInstPtr
    decodeSveFpCplxAdd(ExtMachInst machInst)
    {
    uint8_t size = bits(machInst, 23, 22);
    diff --git a/src/arch/arm/isa/formats/sve_top_level.isa
    b/src/arch/arm/isa/formats/sve_top_level.isa
    index 41861a8..b0579fb 100644
    --- a/src/arch/arm/isa/formats/sve_top_level.isa
    +++ b/src/arch/arm/isa/formats/sve_top_level.isa
    @@ -1,4 +1,4 @@
    -// Copyright (c) 2017-2019 ARM Limited
    +// Copyright (c) 2017-2020 ARM Limited
    // All rights reserved
    //
    // The license below extends only to copyright in the software and shall
    @@ -83,6 +83,7 @@
    StaticInstPtr decodeSveFpUnaryPred(ExtMachInst machInst);
    StaticInstPtr decodeSveFpCmpVec(ExtMachInst machInst);
    StaticInstPtr decodeSveFpFusedMulAdd(ExtMachInst machInst);
  • StaticInstPtr decodeSveFpFusedMatMulAdd(ExtMachInst machInst);
    StaticInstPtr decodeSveFpCplxAdd(ExtMachInst machInst);
    StaticInstPtr decodeSveFpCplxMulAddVec(ExtMachInst machInst);
    StaticInstPtr decodeSveFpMulAddIndexed(ExtMachInst machInst);
    @@ -269,9 +270,10 @@
    case 0:
    return decodeSveFpMulAddIndexed(machInst);
    case 4:
  •            if (!bits(machInst, 10))
    
  •            if (bits(machInst, 10))
    
  •               return decodeSveFpFusedMatMulAdd(machInst);
    
  •            else
                    return decodeSveFpMulIndexed(machInst);
    
  •            [[fallthrough]];
              default:
                return new Unknown64(machInst);
            }
    

diff --git a/src/arch/arm/isa/includes.isa b/src/arch/arm/isa/includes.isa
index e2534a6..cde035a 100644
--- a/src/arch/arm/isa/includes.isa
+++ b/src/arch/arm/isa/includes.isa
@@ -66,6 +66,7 @@
#include "arch/arm/insts/sve.hh"
#include "arch/arm/insts/sve_mem.hh"
#include "arch/arm/insts/tme64.hh"
+#include "arch/arm/insts/vector_element_traits.hh"
#include "arch/arm/insts/vfp.hh"
#include "enums/DecoderFlavor.hh"
#include "mem/packet.hh"
diff --git a/src/arch/arm/isa/insts/sve.isa b/src/arch/arm/isa/insts/sve.isa
index 4c10cd6..74eacb8 100644
--- a/src/arch/arm/isa/insts/sve.isa
+++ b/src/arch/arm/isa/insts/sve.isa
@@ -1,4 +1,4 @@
-// Copyright (c) 2017-2019 ARM Limited
+// Copyright (c) 2017-2020 ARM Limited
// All rights reserved
//
// The license below extends only to copyright in the software and shall
@@ -2188,6 +2188,111 @@
'class_name' : 'Sve' + Name}
exec_output += SveOpExecDeclare.subst(substDict)

  • Generates definitions for ternary destructive SVE Matrix

  • Multiplication instructions (not predicated)

  • type_specs can either be a sequence of types for cases where

  • the dest and source matrices have the same element types, or a

  • sequence of 3-tuples for case where the dest and source matrices

  • have differnet element types.

  • The calculation Z = Z + A x B is performed for full matrices

  • Z (numDestRows x numDestCols), A (numDestRows x K), and

  • B(K x numDestCols), and remaining elemnts of Z are set to zero.

  • The vector length must be large enough for one full matrix or

  • an UndefinedInstruction Fault is generated.

  • def sveMatMulInst(name, Name, opClass, type_specs,

  •                  numDestRows, numDestCols, K,
    
  •                  elt_mul_op):
    
  •    global header_output, exec_output
    
  •    code = sveEnabledCheckCode + '''
    
  •    // Types of the extended versions of the source elements.
    
  •    // Required to make sure the itermediate calculations don't  
    

overflow.

  •    using ExtendedElementA = typename vector_element_traits::
    
  •                               extend_element<DestElement,
    
  •                                              SrcElementA>::type;
    
  •    using ExtendedElementB = typename vector_element_traits::
    
  •                               extend_element<DestElement,
    
  •                                              SrcElementB>::type;
    
  •    // Element count of destination vector
    
  •    unsigned eCount = ArmStaticInst::getCurSveVecLen<DestElement>(
    
  •            xc->tcBase());
    
  •    // SVE Matrix operations require that there are at least 4
    
  •    // elements (one full matrix). Further matrices may be partial,
    
  •    // in which case the trailing dest elements are filled with zeros.
    
  •    if (eCount < 4) {
    
  •        return std::make_shared<UndefinedInstruction>(machInst, false,
    
  •                                                      "%(mnemonic)s");
    
  •    }
    
  •    // Some properties of the source and dest matrix dimensions
    
  •    //   ( numDestRows x numDestCols ) <- (numDestRows x K) .
    
  •    //                                        (K x numDestCols)
    
  •    constexpr unsigned numDestRows = %(numDestRows)d;
    
  •    constexpr unsigned numDestCols = %(numDestCols)d;
    
  •    constexpr unsigned K = %(K)d;
    
  •    constexpr unsigned eltsPerDestMatrix = numDestRows * numDestCols;
    
  •    constexpr unsigned eltsPerSrcAMatrix = numDestRows * K;
    
  •    constexpr unsigned eltsPerSrcBMatrix = K * numDestCols;
    
  •    // Number of full matrices - there may be some elements left over
    
  •    const unsigned mCount = eCount / eltsPerDestMatrix;
    
  •    // Calculate z_ij = Sum[k=1..K](a_ik * b_kj)
    
  •    unsigned zEltIdx = 0; // Index of the result element being produced
    
  •    unsigned aMatIdx = 0; // Index of the first element of the A matrix
    
  •    unsigned bMatIdx = 0; // Index of the first element of the B matrix
    
  •    for (unsigned matIdx = 0; matIdx < mCount; ++matIdx) {
    
  •        for (unsigned rowIdx = 0; rowIdx < numDestRows; ++rowIdx) {
    
  •            for (unsigned colIdx = 0; colIdx < numDestCols; ++colIdx) {
    
  •                DestElement destElem =
    

static_cast<DestElement>(AA64FpDestMerge_x[zEltIdx]);

  •                for (unsigned k = 0; k < K; ++k) {
    
  •                    const ExtendedElementA srcElemA =
    
  •                        static_cast<ExtendedElementA>
    
  •                                            (AA64FpOp1_srcA[aMatIdx +  
    

K * rowIdx + k]);

  •                    const ExtendedElementB srcElemB =
    
  •                        static_cast<ExtendedElementB>
    
  •                                            (AA64FpOp2_srcB[bMatIdx +  
    

K * colIdx + k]);
+

  •                    // Do the math operation. Should be of form:
    
  •                    //   destElem += f(destElem, srcElemA, srcElemB);
    
  •                    %(elt_mul_op)s;
    
  •                }
    
  •                AA64FpDest_x[zEltIdx++] = destElem;
    
  •            }
    
  •        }
    
  •        aMatIdx += eltsPerSrcAMatrix;
    
  •        bMatIdx += eltsPerSrcBMatrix;
    
  •    }
    
  •    // Zero-fill any trailing elements
    
  •    for (unsigned i = mCount * eltsPerDestMatrix; i < eCount; ++i) {
    
  •        AA64FpDest_x[i] = static_cast<DestElement>(0);
    
  •    }
    
  •    ''' % {'elt_mul_op': elt_mul_op, 'mnemonic': name,
    
  •           'numDestRows': numDestRows, 'numDestCols': numDestCols,
    
  •           'K': K}
    
  •    iop = InstObjParams(name, 'Sve' + Name, 'SveTerUnpredOp',
    
  •                        {'code': code, 'op_class': opClass}, [])
    
  •    header_output += SveMatMulOpDeclare.subst(iop)
    
  •    exec_output += SveMatMulOpExecute.subst(iop)
    
  •    for type_spec in type_specs:
    
  •        try:
    
  •            destEltType, srcEltAType, srcEltBType = type_spec
    
  •        except ValueError:
    
  •            destEltType, srcEltAType, srcEltBType = (type_spec,) * 3
    
  •        substDict = {'destEltType': destEltType,
    
  •                     'srcEltAType': srcEltAType,
    
  •                     'srcEltBType': srcEltBType,
    
  •                     'class_name': 'Sve' + Name}
    
  •        exec_output += SveMatMulOpExecDeclare.subst(substDict)
    
  • # Generates definitions for PTRUE and PTRUES instructions.
    def svePtrueInst(name, Name, opClass, types, isFlagSetting=False,
                     decoder='Generic'):
    

@@ -3822,6 +3927,16 @@
# FMLS (indexed)
sveTerIdxInst('fmls', 'FmlsIdx', 'SimdFloatMultAccOp', floatTypes,
fmlsCode, PredType.MERGE)
+

  • fmmlaCode = fpOp % '''
  •    fplibAdd<DestElement>(destElem,
    
  •        fplibMul<DestElement>(srcElemA, srcElemB, fpscr), fpscr);
    
  • '''
  • FMMLA (vectors)

  • sveMatMulInst('fmmla', 'Fmmla', 'SimdFloatMultAccOp', floatTypes,
  •              numDestRows=2, numDestCols=2, K=2,
    
  •              elt_mul_op=fmmlaCode)
    
  • # FMLS (vectors)
    sveTerInst('fmls', 'Fmls', 'SimdFloatMultAccOp', floatTypes, fmlsCode,
               PredType.MERGE)
    

diff --git a/src/arch/arm/isa/operands.isa b/src/arch/arm/isa/operands.isa
index 24a0af9..5bba00f 100644
--- a/src/arch/arm/isa/operands.isa
+++ b/src/arch/arm/isa/operands.isa
@@ -60,6 +60,8 @@
'xs1' : 'TPS1Elem',
'xs2' : 'TPS2Elem',
'xd' : 'TPDElem',

  • 'srcA' : 'TPSrcAElem',
  • 'srcB' : 'TPSrcBElem',
    'pc' : 'ArmISA::VecPredRegContainer',
    'pb' : 'uint8_t'
    }};
    diff --git a/src/arch/arm/isa/templates/sve.isa
    b/src/arch/arm/isa/templates/sve.isa
    index 886fd7a..65abb1b 100644
    --- a/src/arch/arm/isa/templates/sve.isa
    +++ b/src/arch/arm/isa/templates/sve.isa
    @@ -1,4 +1,4 @@
    -// Copyright (c) 2018-2019 ARM Limited
    +// Copyright (c) 2018-2020 ARM Limited
    // All rights reserved
    //
    // The license below extends only to copyright in the software and shall
    @@ -515,6 +515,33 @@
    };
    }};

+def template SveMatMulOpDeclare {{
+template <typename DestElement,

  •      typename SrcElementA,
    
  •      typename SrcElementB>
    

+class %(class_name)s : public %(base_class)s
+{

  • private:
  • %(reg_idx_arr_decl)s;
  • protected:
  • typedef DestElement TPElem;
  • typedef SrcElementA TPSrcAElem;
  • typedef SrcElementB TPSrcBElem;
  • public:
  • // Constructor
  • %(class_name)s(ExtMachInst machInst,
  •               RegIndex _dest, RegIndex _op1, RegIndex _op2)
    
  •    : %(base_class)s("%(mnemonic)s", machInst, %(op_class)s,
    
  •                     _dest, _op1, _op2)
    
  • {
  •    %(set_reg_idx_arr)s;
    
  •    %(constructor)s;
    
  • }
  • Fault execute(ExecContext *, trace::InstRecord *) const override;
    +};
    +}};
  • def template SveTerImmUnpredOpDeclare {{
    template <class _Element>
    class %(class_name)s : public %(base_class)s
    @@ -1310,3 +1337,32 @@
    Fault %(class_name)s<%(targs)s>::execute(
    ExecContext *, trace::InstRecord *) const;
    }};

+def template SveMatMulOpExecute {{

  • template <typename DestElement,
  •          typename SrcElementA,
    
  •          typename SrcElementB>
    
  • Fault %(class_name)s<DestElement,SrcElementA,SrcElementB>::execute(
  •        ExecContext *xc,
    
  •        trace::InstRecord *traceData) const
    
  • {
  •    Fault fault = NoFault;
    
  •    %(op_decl)s;
    
  •    %(op_rd)s;
    
  •    %(code)s;
    
  •    if (fault == NoFault)
    
  •    {
    
  •        %(op_wb)s;
    
  •    }
    
  •    return fault;
    
  • }
    +}};

+def template SveMatMulOpExecDeclare {{

  • template

  • Fault

  • %(class_name)s<%(destEltType)s,%(srcEltAType)s,%(srcEltBType)s>

  • ::execute(ExecContext *, trace::InstRecord *) const;
    +}};
    diff --git a/src/arch/arm/process.cc b/src/arch/arm/process.cc
    index fda9415..24e1250 100644
    --- a/src/arch/arm/process.cc
    +++ b/src/arch/arm/process.cc
    @@ -320,6 +320,9 @@
    hwcap |= (isa_r0.ts >= 2) ? Arm_Flagm2 : Arm_None;
    hwcap |= (isa_r0.rndr >= 1) ? Arm_Rng : Arm_None;

  • const AA64ZFR0 zf_r0 = tc->readMiscReg(MISCREG_ID_AA64ZFR0_EL1);

  • hwcap |= (zf_r0.f32mm >= 1) ? Arm_Svef32mm : Arm_None;

  • return hwcap;
    

    }

diff --git a/src/arch/arm/regs/misc.cc b/src/arch/arm/regs/misc.cc
index 53e9268..8925bc0 100644
--- a/src/arch/arm/regs/misc.cc
+++ b/src/arch/arm/regs/misc.cc
@@ -5403,6 +5403,11 @@

  // SVE
  InitReg(MISCREG_ID_AA64ZFR0_EL1)
  •    .reset([this](){
    
  •        AA64ZFR0 zfr0_el1 = 0;
    
  •        zfr0_el1.f32mm = release->has(ArmExtension::FEAT_F32MM) ? 1 :  
    

0;

  •        return zfr0_el1;
    
  •    }())
        .faultRead(EL0, faultIdst)
        .faultRead(EL1, HCR_TRAP(tid3))
        .allPrivileges().exceptUserMode().writes(0);
    

--
To view, visit
https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email
To unsubscribe, or for help writing mail filters, visit
https://gem5-review.googlesource.com/settings?usp=email

Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If3547378ffa48527fe540767399bcc37a5dab524
Gerrit-Change-Number: 70726
Gerrit-PatchSet: 7
Gerrit-Owner: Giacomo Travaglini giacomo.travaglini@arm.com
Gerrit-Reviewer: Andreas Sandberg andreas.sandberg@arm.com
Gerrit-Reviewer: Bobby Bruce bbruce@ucdavis.edu
Gerrit-Reviewer: Giacomo Travaglini giacomo.travaglini@arm.com
Gerrit-Reviewer: Jason Lowe-Power power.jg@gmail.com
Gerrit-Reviewer: kokoro noreply+kokoro@google.com
Gerrit-CC: Richard Cooper richard.cooper@arm.com

Bobby Bruce has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email ) Change subject: arch-arm: Add support for Arm SVE fmmla instruction. ...................................................................... arch-arm: Add support for Arm SVE fmmla instruction. Add support for the Arm SVE Floating Point Matrix Multiply-Accumulate (FMMLA) instruction. Both 32-bit element (single precision) and 64-bit element (double precision) encodings are implemented, but because the associated required instructions (LD1RO*, etc) have not yet been implemented, the SVE Feature ID register 0 (ID_AA64ZFR0_EL1) has only been updated to indicate 32-bit element support at this time. For more information please refer to the "ARM Architecture Reference Manual Supplement - The Scalable Vector Extension (SVE), for ARMv8-A" (https://developer.arm.com/architectures/cpu-architecture/a-profile/ docs/arm-architecture-reference-manual-supplement-armv8-a) Additional Contributors: Giacomo Travaglini Change-Id: If3547378ffa48527fe540767399bcc37a5dab524 Reviewed-by: Richard Cooper <richard.cooper@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70726 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> --- M src/arch/arm/ArmISA.py M src/arch/arm/ArmSystem.py M src/arch/arm/insts/sve.hh A src/arch/arm/insts/vector_element_traits.hh M src/arch/arm/isa/formats/sve_2nd_level.isa M src/arch/arm/isa/formats/sve_top_level.isa M src/arch/arm/isa/includes.isa M src/arch/arm/isa/insts/sve.isa M src/arch/arm/isa/operands.isa M src/arch/arm/isa/templates/sve.isa M src/arch/arm/process.cc M src/arch/arm/regs/misc.cc 12 files changed, 291 insertions(+), 7 deletions(-) Approvals: Giacomo Travaglini: Looks good to me, approved; Looks good to me, approved Andreas Sandberg: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/arm/ArmISA.py b/src/arch/arm/ArmISA.py index 37970dc..31ecbcb 100644 --- a/src/arch/arm/ArmISA.py +++ b/src/arch/arm/ArmISA.py @@ -53,6 +53,7 @@ "FEAT_LSE", "FEAT_RDM", # Armv8.2 + "FEAT_F32MM", "FEAT_SVE", # Armv8.3 "FEAT_FCMA", diff --git a/src/arch/arm/ArmSystem.py b/src/arch/arm/ArmSystem.py index c1f5e9f..5a7ae79 100644 --- a/src/arch/arm/ArmSystem.py +++ b/src/arch/arm/ArmSystem.py @@ -78,6 +78,7 @@ "FEAT_UAO", "FEAT_LVA", # Optional in Armv8.2 "FEAT_LPA", # Optional in Armv8.2 + "FEAT_F32MM", # Optional in Armv8.2 # Armv8.3 "FEAT_FCMA", "FEAT_JSCVT", @@ -163,6 +164,7 @@ "FEAT_LVA", "FEAT_LPA", "FEAT_SVE", + "FEAT_F32MM", # Armv8.3 "FEAT_FCMA", "FEAT_JSCVT", @@ -196,6 +198,7 @@ "FEAT_LVA", "FEAT_LPA", "FEAT_SVE", + "FEAT_F32MM", ] diff --git a/src/arch/arm/insts/sve.hh b/src/arch/arm/insts/sve.hh index de1163e..dc18ff3 100644 --- a/src/arch/arm/insts/sve.hh +++ b/src/arch/arm/insts/sve.hh @@ -498,7 +498,7 @@ Addr pc, const loader::SymbolTable *symtab) const override; }; -///SVE2 Accumulate instructions +/// Ternary, destructive, unpredicated SVE instruction. class SveTerUnpredOp : public ArmStaticInst { protected: diff --git a/src/arch/arm/insts/vector_element_traits.hh b/src/arch/arm/insts/vector_element_traits.hh new file mode 100644 index 0000000..3495bef --- /dev/null +++ b/src/arch/arm/insts/vector_element_traits.hh @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2020 ARM Limited + * All rights reserved + * + * The license below extends only to copyright in the software and shall + * not be construed as granting a license to any other intellectual + * property including but not limited to intellectual property relating + * to a hardware implementation of the functionality of the software + * licensed hereunder. You may use the software subject to the license + * terms below provided that you ensure that this notice is replicated + * unmodified and in its entirety in all distributions of the software, + * modified or unmodified, in source code or in binary form. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer; + * redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution; + * neither the name of the copyright holders nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH__ +#define __ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH__ + +#include <type_traits> + +namespace gem5 { +namespace ArmISA { +namespace vector_element_traits { + + +// Make an integral type with the size of IntDestElemType but the +// signed-ness of IntSrcElemType. The size of IntDestElemType must be +// greater than or equal to the size of IntSrcElemType. +template<typename IntDestElemType, + typename IntSrcElemType> +class extend_element +{ + public: + static_assert(std::is_integral<IntDestElemType>::value + && std::is_integral<IntSrcElemType>::value + && sizeof(IntDestElemType) >= sizeof(IntSrcElemType), + "Extended Element Dest and Src types must both be " + "integer types, and Dest must be at least as large " + "as Src."); + using type = typename std::conditional< + std::is_signed<IntSrcElemType>::value, + typename std::make_signed<IntDestElemType>::type, + typename std::make_unsigned<IntDestElemType>::type>::type; +}; + + +} // namespace vector_element_traits +} // namespace ArmISA +} // namespace gem5 + +#endif // __ARCH_ARM_VECTOR_ELEMENT_TRAITS_HH__ diff --git a/src/arch/arm/isa/formats/sve_2nd_level.isa b/src/arch/arm/isa/formats/sve_2nd_level.isa index 4281eeb..440722a 100644 --- a/src/arch/arm/isa/formats/sve_2nd_level.isa +++ b/src/arch/arm/isa/formats/sve_2nd_level.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2019 ARM Limited +// Copyright (c) 2017-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -2883,6 +2883,29 @@ } // decodeSveFpFusedMulAdd StaticInstPtr + decodeSveFpFusedMatMulAdd(ExtMachInst machInst) + { + RegIndex zda = (RegIndex) (uint8_t) bits(machInst, 4, 0); + RegIndex zn = (RegIndex) (uint8_t) bits(machInst, 9, 5); + RegIndex zm = (RegIndex) (uint8_t) bits(machInst, 20, 16); + + uint8_t size = bits(machInst, 23, 22); + switch (size) { + case 0x1: + // BFMMLA goes here when implemented. + return new Unknown64(machInst); + case 0x2: + return new SveFmmla<uint32_t,uint32_t,uint32_t>( + machInst, zda, zn, zm); + case 0x3: + return new SveFmmla<uint64_t,uint64_t,uint64_t>( + machInst, zda, zn, zm); + default: + return new Unknown64(machInst); + } + } // decodeSveFpFusedMatMulAdd + + StaticInstPtr decodeSveFpCplxAdd(ExtMachInst machInst) { uint8_t size = bits(machInst, 23, 22); diff --git a/src/arch/arm/isa/formats/sve_top_level.isa b/src/arch/arm/isa/formats/sve_top_level.isa index 41861a8..b0579fb 100644 --- a/src/arch/arm/isa/formats/sve_top_level.isa +++ b/src/arch/arm/isa/formats/sve_top_level.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2019 ARM Limited +// Copyright (c) 2017-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -83,6 +83,7 @@ StaticInstPtr decodeSveFpUnaryPred(ExtMachInst machInst); StaticInstPtr decodeSveFpCmpVec(ExtMachInst machInst); StaticInstPtr decodeSveFpFusedMulAdd(ExtMachInst machInst); + StaticInstPtr decodeSveFpFusedMatMulAdd(ExtMachInst machInst); StaticInstPtr decodeSveFpCplxAdd(ExtMachInst machInst); StaticInstPtr decodeSveFpCplxMulAddVec(ExtMachInst machInst); StaticInstPtr decodeSveFpMulAddIndexed(ExtMachInst machInst); @@ -269,9 +270,10 @@ case 0: return decodeSveFpMulAddIndexed(machInst); case 4: - if (!bits(machInst, 10)) + if (bits(machInst, 10)) + return decodeSveFpFusedMatMulAdd(machInst); + else return decodeSveFpMulIndexed(machInst); - [[fallthrough]]; default: return new Unknown64(machInst); } diff --git a/src/arch/arm/isa/includes.isa b/src/arch/arm/isa/includes.isa index e2534a6..cde035a 100644 --- a/src/arch/arm/isa/includes.isa +++ b/src/arch/arm/isa/includes.isa @@ -66,6 +66,7 @@ #include "arch/arm/insts/sve.hh" #include "arch/arm/insts/sve_mem.hh" #include "arch/arm/insts/tme64.hh" +#include "arch/arm/insts/vector_element_traits.hh" #include "arch/arm/insts/vfp.hh" #include "enums/DecoderFlavor.hh" #include "mem/packet.hh" diff --git a/src/arch/arm/isa/insts/sve.isa b/src/arch/arm/isa/insts/sve.isa index 4c10cd6..74eacb8 100644 --- a/src/arch/arm/isa/insts/sve.isa +++ b/src/arch/arm/isa/insts/sve.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2019 ARM Limited +// Copyright (c) 2017-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -2188,6 +2188,111 @@ 'class_name' : 'Sve' + Name} exec_output += SveOpExecDeclare.subst(substDict) + # Generates definitions for ternary destructive SVE Matrix + # Multiplication instructions (not predicated) + # + # `type_specs` can either be a sequence of types for cases where + # the dest and source matrices have the same element types, or a + # sequence of 3-tuples for case where the dest and source matrices + # have differnet element types. + # + # The calculation Z = Z + A x B is performed for full matrices + # Z (numDestRows x numDestCols), A (numDestRows x K), and + # B(K x numDestCols), and remaining elemnts of Z are set to zero. + # The vector length must be large enough for one full matrix or + # an UndefinedInstruction Fault is generated. + # + def sveMatMulInst(name, Name, opClass, type_specs, + numDestRows, numDestCols, K, + elt_mul_op): + global header_output, exec_output + code = sveEnabledCheckCode + ''' + // Types of the extended versions of the source elements. + // Required to make sure the itermediate calculations don't overflow. + using ExtendedElementA = typename vector_element_traits:: + extend_element<DestElement, + SrcElementA>::type; + using ExtendedElementB = typename vector_element_traits:: + extend_element<DestElement, + SrcElementB>::type; + + // Element count of destination vector + unsigned eCount = ArmStaticInst::getCurSveVecLen<DestElement>( + xc->tcBase()); + + // SVE Matrix operations require that there are at least 4 + // elements (one full matrix). Further matrices may be partial, + // in which case the trailing dest elements are filled with zeros. + if (eCount < 4) { + return std::make_shared<UndefinedInstruction>(machInst, false, + "%(mnemonic)s"); + } + + // Some properties of the source and dest matrix dimensions + // ( numDestRows x numDestCols ) <- (numDestRows x K) . + // (K x numDestCols) + constexpr unsigned numDestRows = %(numDestRows)d; + constexpr unsigned numDestCols = %(numDestCols)d; + constexpr unsigned K = %(K)d; + + constexpr unsigned eltsPerDestMatrix = numDestRows * numDestCols; + constexpr unsigned eltsPerSrcAMatrix = numDestRows * K; + constexpr unsigned eltsPerSrcBMatrix = K * numDestCols; + + // Number of full matrices - there may be some elements left over + const unsigned mCount = eCount / eltsPerDestMatrix; + + // Calculate z_ij = Sum[k=1..K](a_ik * b_kj) + + unsigned zEltIdx = 0; // Index of the result element being produced + unsigned aMatIdx = 0; // Index of the first element of the A matrix + unsigned bMatIdx = 0; // Index of the first element of the B matrix + for (unsigned matIdx = 0; matIdx < mCount; ++matIdx) { + for (unsigned rowIdx = 0; rowIdx < numDestRows; ++rowIdx) { + for (unsigned colIdx = 0; colIdx < numDestCols; ++colIdx) { + DestElement destElem = + static_cast<DestElement>(AA64FpDestMerge_x[zEltIdx]); + for (unsigned k = 0; k < K; ++k) { + const ExtendedElementA srcElemA = + static_cast<ExtendedElementA> + (AA64FpOp1_srcA[aMatIdx + K * rowIdx + k]); + const ExtendedElementB srcElemB = + static_cast<ExtendedElementB> + (AA64FpOp2_srcB[bMatIdx + K * colIdx + k]); + + // Do the math operation. Should be of form: + // destElem += f(destElem, srcElemA, srcElemB); + %(elt_mul_op)s; + } + AA64FpDest_x[zEltIdx++] = destElem; + } + } + aMatIdx += eltsPerSrcAMatrix; + bMatIdx += eltsPerSrcBMatrix; + } + + // Zero-fill any trailing elements + for (unsigned i = mCount * eltsPerDestMatrix; i < eCount; ++i) { + AA64FpDest_x[i] = static_cast<DestElement>(0); + } + ''' % {'elt_mul_op': elt_mul_op, 'mnemonic': name, + 'numDestRows': numDestRows, 'numDestCols': numDestCols, + 'K': K} + iop = InstObjParams(name, 'Sve' + Name, 'SveTerUnpredOp', + {'code': code, 'op_class': opClass}, []) + header_output += SveMatMulOpDeclare.subst(iop) + exec_output += SveMatMulOpExecute.subst(iop) + for type_spec in type_specs: + try: + destEltType, srcEltAType, srcEltBType = type_spec + except ValueError: + destEltType, srcEltAType, srcEltBType = (type_spec,) * 3 + substDict = {'destEltType': destEltType, + 'srcEltAType': srcEltAType, + 'srcEltBType': srcEltBType, + 'class_name': 'Sve' + Name} + exec_output += SveMatMulOpExecDeclare.subst(substDict) + # Generates definitions for PTRUE and PTRUES instructions. def svePtrueInst(name, Name, opClass, types, isFlagSetting=False, decoder='Generic'): @@ -3822,6 +3927,16 @@ # FMLS (indexed) sveTerIdxInst('fmls', 'FmlsIdx', 'SimdFloatMultAccOp', floatTypes, fmlsCode, PredType.MERGE) + + fmmlaCode = fpOp % ''' + fplibAdd<DestElement>(destElem, + fplibMul<DestElement>(srcElemA, srcElemB, fpscr), fpscr); + ''' + # FMMLA (vectors) + sveMatMulInst('fmmla', 'Fmmla', 'SimdFloatMultAccOp', floatTypes, + numDestRows=2, numDestCols=2, K=2, + elt_mul_op=fmmlaCode) + # FMLS (vectors) sveTerInst('fmls', 'Fmls', 'SimdFloatMultAccOp', floatTypes, fmlsCode, PredType.MERGE) diff --git a/src/arch/arm/isa/operands.isa b/src/arch/arm/isa/operands.isa index 24a0af9..5bba00f 100644 --- a/src/arch/arm/isa/operands.isa +++ b/src/arch/arm/isa/operands.isa @@ -60,6 +60,8 @@ 'xs1' : 'TPS1Elem', 'xs2' : 'TPS2Elem', 'xd' : 'TPDElem', + 'srcA' : 'TPSrcAElem', + 'srcB' : 'TPSrcBElem', 'pc' : 'ArmISA::VecPredRegContainer', 'pb' : 'uint8_t' }}; diff --git a/src/arch/arm/isa/templates/sve.isa b/src/arch/arm/isa/templates/sve.isa index 886fd7a..65abb1b 100644 --- a/src/arch/arm/isa/templates/sve.isa +++ b/src/arch/arm/isa/templates/sve.isa @@ -1,4 +1,4 @@ -// Copyright (c) 2018-2019 ARM Limited +// Copyright (c) 2018-2020 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -515,6 +515,33 @@ }; }}; +def template SveMatMulOpDeclare {{ +template <typename DestElement, + typename SrcElementA, + typename SrcElementB> +class %(class_name)s : public %(base_class)s +{ + private: + %(reg_idx_arr_decl)s; + protected: + typedef DestElement TPElem; + typedef SrcElementA TPSrcAElem; + typedef SrcElementB TPSrcBElem; + public: + // Constructor + %(class_name)s(ExtMachInst machInst, + RegIndex _dest, RegIndex _op1, RegIndex _op2) + : %(base_class)s("%(mnemonic)s", machInst, %(op_class)s, + _dest, _op1, _op2) + { + %(set_reg_idx_arr)s; + %(constructor)s; + } + + Fault execute(ExecContext *, trace::InstRecord *) const override; +}; +}}; + def template SveTerImmUnpredOpDeclare {{ template <class _Element> class %(class_name)s : public %(base_class)s @@ -1310,3 +1337,32 @@ Fault %(class_name)s<%(targs)s>::execute( ExecContext *, trace::InstRecord *) const; }}; + +def template SveMatMulOpExecute {{ + template <typename DestElement, + typename SrcElementA, + typename SrcElementB> + Fault %(class_name)s<DestElement,SrcElementA,SrcElementB>::execute( + ExecContext *xc, + trace::InstRecord *traceData) const + { + Fault fault = NoFault; + %(op_decl)s; + %(op_rd)s; + + %(code)s; + if (fault == NoFault) + { + %(op_wb)s; + } + + return fault; + } +}}; + +def template SveMatMulOpExecDeclare {{ + template + Fault + %(class_name)s<%(destEltType)s,%(srcEltAType)s,%(srcEltBType)s> + ::execute(ExecContext *, trace::InstRecord *) const; +}}; diff --git a/src/arch/arm/process.cc b/src/arch/arm/process.cc index fda9415..24e1250 100644 --- a/src/arch/arm/process.cc +++ b/src/arch/arm/process.cc @@ -320,6 +320,9 @@ hwcap |= (isa_r0.ts >= 2) ? Arm_Flagm2 : Arm_None; hwcap |= (isa_r0.rndr >= 1) ? Arm_Rng : Arm_None; + const AA64ZFR0 zf_r0 = tc->readMiscReg(MISCREG_ID_AA64ZFR0_EL1); + hwcap |= (zf_r0.f32mm >= 1) ? Arm_Svef32mm : Arm_None; + return hwcap; } diff --git a/src/arch/arm/regs/misc.cc b/src/arch/arm/regs/misc.cc index 53e9268..8925bc0 100644 --- a/src/arch/arm/regs/misc.cc +++ b/src/arch/arm/regs/misc.cc @@ -5403,6 +5403,11 @@ // SVE InitReg(MISCREG_ID_AA64ZFR0_EL1) + .reset([this](){ + AA64ZFR0 zfr0_el1 = 0; + zfr0_el1.f32mm = release->has(ArmExtension::FEAT_F32MM) ? 1 : 0; + return zfr0_el1; + }()) .faultRead(EL0, faultIdst) .faultRead(EL1, HCR_TRAP(tid3)) .allPrivileges().exceptUserMode().writes(0); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70726?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If3547378ffa48527fe540767399bcc37a5dab524 Gerrit-Change-Number: 70726 Gerrit-PatchSet: 7 Gerrit-Owner: Giacomo Travaglini <giacomo.travaglini@arm.com> Gerrit-Reviewer: Andreas Sandberg <andreas.sandberg@arm.com> Gerrit-Reviewer: Bobby Bruce <bbruce@ucdavis.edu> Gerrit-Reviewer: Giacomo Travaglini <giacomo.travaglini@arm.com> Gerrit-Reviewer: Jason Lowe-Power <power.jg@gmail.com> Gerrit-Reviewer: kokoro <noreply+kokoro@google.com> Gerrit-CC: Richard Cooper <richard.cooper@arm.com>