1 Introduction

Within the Standard Model (SM) of particle physics, direct production of lepton pairs with different flavours (\(\ell \ell ^{\prime }\)) is forbidden. However, lepton flavour violation (LFV) is allowed in many extensions of the SM. Models with additional gauge symmetries, e.g. production of a new heavy neutral gauge boson, similar to a \(Z^\prime \) boson [1], scalar neutrinos in R-parity-violating (RPV) [2, 3] supersymmetry (SUSY) [4, 610], or low-scale gravity models predicting quantum black hole (QBH) production [11] can produce decays to lepton-flavour-violating final states. Processes leading to flavour-violating dilepton final states have a clear detector signature and a low background from SM processes. The Drell–Yan (DY) process (dilepton production in hadron–hadron collisions), an irreducible background for same-flavour dilepton searches, is limited to the production and decay of a ditau system, enhancing the sensitivity to a possible signal. This paper looks for final states with two leptons of different flavour in proton–proton (pp) collisions at \(\sqrt{s}=13\) TeV. The invariant mass of the two leptons (\(m_{\ell \ell ^{\prime }}\)) is used as the search variable.

A common extension of the SM is the addition of an extra U(1) gauge symmetry resulting in a massive vector boson known as a \(Z^\prime \) boson [1]. The search presented in this paper assumes a \(Z^\prime \) boson that has the same fermion couplings as the SM \(Z\) boson in the quark sector, but only leptonic decays that violate LFC are allowed. The addition of lepton-flavour-violating processes, \(Z^\prime \) \(\rightarrow e\mu \), \(e\tau \), \(\mu \tau \), requires new couplings between leptons of different generations: \(Q^{\ell }_{\mathrm {12}}\), \(Q^{\ell }_{\mathrm {13}}\) and \(Q^{\ell }_{\mathrm {23}}\), where the subscripts denote lepton generations. For the model considered, this paper assumes \(Q_{ij}^{\ell }\) equal to the SM \(Z\) boson coupling to one lepton and only one LFV coupling different from zero at the same time. The ATLAS and CMS Collaborations have placed limits on the \(e\mu \), \(e\tau \) and \(\mu \tau \) couplings as a function of the \(Z^\prime \) boson mass up to 2.5 TeV, using the full \(\sqrt{s}=8\) TeV [12, 13].

In RPV SUSY, the Lagrangian terms allowing LFV can be expressed as \(\frac{1}{2}\lambda _{ijk}L_i L_j\bar{e_k} + \lambda ^{\prime }_{ijk}L_i Q_j\bar{d_k}\), where L and Q are the SU(2) doublet superfields of leptons and quarks, e and d are the SU(2) singlet superfields of leptons and down-like quarks, \(\lambda \) and \(\lambda ^\prime \) are Yukawa couplings, and the indices i, j and k denote fermion generations. A \(\tau \) sneutrino (\(\tilde{\nu }_{\tau }\)) may be produced in pp collisions by \(d\bar{d}\) annihilation and subsequently decay to \(e\mu \), \(e\tau \), or \(\mu \tau \). Although only \(\tilde{\nu }_{\tau }\) is considered in this paper, results apply to any sneutrino flavour. For the theoretical prediction of the cross-section times branching ratio, the \(\tilde{\nu }_{\tau }\) coupling to first-generation quarks (\(\lambda ^{\prime }_{311}\)) is assumed to be 0.11 for all channels. As for the \(Z^\prime \) model, only one decay to a lepton-flavour-violating final state is allowed at the same time. As such, for an \(e\mu \) final state, it is assumed that \(\lambda _{312}=\lambda _{321}=0.07\), for \(e\tau \lambda _{313}=\lambda _{331}=0.07\) and \(\mu \tau \, \lambda _{323}=\lambda _{332}=0.07\). These values are consistent with benchmark couplings used in previous ATLAS and CMS searches [12, 13]. The ATLAS Collaboration has placed limits up to 2.0 TeV on the mass of an RPV SUSY \(\tilde{\nu }_{\tau }\) [12].

Various models introduce extra dimensions in order to lower the value of the Planck mass (\(M_{\mathrm {P}}\)) and solve the hierarchy problem. The search presented in this paper focuses on the ADD model [14], assuming \(n=6\), where n is the number of extra dimensions, and the RS model [15], with one extra dimension. Due to the increased strength of gravity at short distances, pp collisions at the Large Hadron Collider (LHC) could produce states with masses beyond the threshold mass (\(M_{\mathrm {th}}\)), satisfying the Hoop conjecture [16] and form black holes. For the model considered, \(M_{\mathrm {th}}\) is assumed to be equivalent to the extra-dimensional Planck scale. It is expected that, for masses beyond 3–5\(M_{\mathrm {th}}\), thermal black holes would be produced [17, 18], characterised by high-multiplicity final states. As such, for the search presented in this paper, it is more interesting to focus on the mass region below 3–5\(M_{\mathrm {th}}\), known as the quantum gravity regime, investigated in Refs. [1921]. Non-thermal (or quantum) black holes would be formed in this region, and could decay to two-particle final states, producing the topology this analysis is focused on. Such quantum black holes would form a continuum in mass from \(M_{\mathrm {th}}\) up to the beginning of the thermal regime. For the model considered in this paper, the thermal regime is assumed to start at 3\(M_{\mathrm {th}}\). The decay of quantum black holes would be governed by a yet unknown theory of quantum gravity. The two main assumptions of the extra-dimensions models considered [11] in this paper are:

  • gravity couples with equal strength to all SM particle degrees of freedom;

  • gravity conserves local symmetries (colour, electric charge) but can violate global symmetries such as LFC and baryon number conservation.

Following these assumptions, the branching ratio (BR) to each final state can be calculated. Two initial states could give rise to a quantum black hole decaying into a lepton-flavour-violating final state: \(q\bar{q}\) and gg. The branching ratio to \(\ell \ell ^{^{\prime }}\) is 0.87 % (0.34 %) for a \(q\bar{q}\) (gg) initial state [11]. This model was used in previous ATLAS and CMS searches in dijet [2224], lepton+jet [25], photon+jet [26], \(e\mu \) [13] and same-flavour dilepton [27] final states.

2 The ATLAS detector

The ATLAS detector [28] is a general-purpose particle detector with approximately forward-backward symmetric cylindrical geometry.Footnote 1 It is composed of four main components, each responsible for identifying and reconstructing different types of particles: the inner detector (ID), the electromagnetic and hadronic calorimeters, and the muon spectrometer (MS). Each of the sub-detectors is divided into two components, barrel and endcap, to provide coverage close to \(4\pi \) in solid angle. In addition, two magnet systems are in place to allow charge and momentum measurements: an axial magnetic field of 2.0T provided by a solenoid surrounding the ID, and a toroidal magnetic field for the MS. The ID, the component of the ATLAS detector closest to the interaction point, reconstructs the trajectories of charged particles in the region \(|\eta |<2.5\) and measures their momenta. It is composed of three sub-systems:

  1. i)

    a silicon pixel detector, including the newly installed insertable B-layer [29, 30];

  2. ii)

    the semi-conductor tracker, used in conjunction with the silicon pixel detector to determine primary and secondary vertices with high precision thanks to their high granularity;

  3. iii)

    the transition radiation tracker, providing additional tracking in the region \(|\eta |<2.0\) and electron identification.

Surrounding the ID, lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity. A steel/scintillator-tile hadronic calorimeter covers the central pseudorapidity range (\(|\eta | < \) 1.7). The endcap and forward regions are LAr calorimeters with copper or tungsten absorbers for both the EM and hadronic energy measurements up to \(|\eta | < \) 4.9. Built around the calorimeter system, the MS is the sub-detector furthest from the interaction point. It consists of three layers of precision tracking chambers and fast detectors for triggering on muons. Tracking coverage is provided up to \(|\eta | < \) 2.7 through the use of monitored drift tubes and, in the innermost layer, cathode strip chambers for \(|\eta | > \) 2.0, while trigger coverage is provided by resistive plate and thin gap chambers up to \(|\eta | < \) 2.4.

The trigger and data-acquisition system is based on two levels of online event selection [31]: the level-1 trigger and the high-level trigger. The level-1 trigger is hardware-based and uses a subset of detector information to provide quick trigger decisions and reduce the accepted rate to 100 kHz. The high-level trigger is software-based and exploits the full detector information to further reduce the accepted rate to about 1 kHz.

3 Data and Monte Carlo simulated samples

The data sample used for this analysis was collected with the ATLAS detector during the 2015 LHC run with pp collisions at a centre-of-mass energy of 13 TeV with a 25 ns minimum proton bunch spacing. After selecting periods with stable beams and requiring all detector systems to be fully functional, the total integrated luminosity for the analysis is 3.2 fb\(^{-1}\). The uncertainty in the integrated luminosity is 5.0 %. It is derived following a methodology similar to that detailed in Ref. [32], from a calibration of the luminosity scale using xy beam-separation scans performed in August 2015.

The \(pp\rightarrow Z^\prime \rightarrow \ell \ell ^{\prime }\) signal samples are generated at leading order (LO) using the Monte Carlo (MC) generator Pythia 8.186 [33] with the NNPDF23LO [34] parton distribution function (PDF) set and the A14 [35] set of tuned parameters (tune). Signal samples with 25 mass points ranging from 0.5 TeV up to 5 TeV are generated in 0.1 TeV steps from 0.5 to 2.0 TeV, 0.2 TeV steps from 2.0 to 3.0 and 0.5 TeV steps from 3.0 to 5.0 TeV. The production cross-section is calculated with the same MC generator used for simulation. No mixing with the SM \(Z\) boson is included.

The \(d\bar{d}\rightarrow \tilde{\nu }_{\tau }\rightarrow \ell \ell ^{\prime }\) signal samples are generated at LO using the MC generator MG5_aMC@NLO v2.3.3 [36] interfaced to the Pythia 8.186 parton shower model with the NNPDF23LO PDF set and the A14 tune. The signal samples are generated at the same pole-masses as for the \(Z^\prime \) described above. The cross-section is calculated at LO with the same MC generator used for simulation. A next-to-leading order (NLO) correction factor (K-factor) is calculated for the cross-section based on Ref. [37] using LoopTools v2.2 [38].

The \(pp\rightarrow \)QBH\(\rightarrow \ell \ell ^{\prime }\) samples are generated with QBH 3.00 [39] using the CTEQ6L1 [40] PDF set and the A14 tune, for which Pythia 8.183 provides showering and hadronisation. For each extra-dimensional model, eleven \(M_{\mathrm {th}}\) points in 0.5 TeV steps were produced: from 3.0 to 8.0 TeV for the ADD \(n=6\) model, and from 1.0 to 6.0 TeV for the RS \(n=1\) model. The production cross-section is calculated with the same MC generator used for simulation. These two models have differences in the number and nature of the additional extra dimensions (large extra dimensions for ADD, one highly warped extra dimension for RS). In particular, the ADD model allows production of black holes with a larger gravitational radius and hence the parton–parton cross-section for this model is larger than for the RS model. Therefore, the \(M_{\mathrm {th}}\) range of the generated samples is different for the two models.

The SM background to the LFV dilepton search is composed of several processes which can produce a final state with two different-flavour leptons. The dominant background contributions originate from \(t\bar{t}\) and single-top production, with the subsequent decays of the top quark producing leptonically decaying \(W\) bosons. Other backgrounds originate from diboson (\(WW\), \(WZ\) and \(ZZ\)) production and the DY process, \(q\bar{q}\rightarrow Z/\gamma ^{*}\rightarrow \tau \tau \), which can produce different-flavour final states through the leptonic decay of the \(W\) and \(Z\) bosons and the \(\tau \) lepton. Multi-jet and \(W\)+jets processes contribute due to the misidentification of jets as leptons.

Backgrounds from top quark production include \(t\bar{t}\) and single-top with an associated \(W\) boson (\(tW\)). Both the \(t\bar{t}\) and single-top-quark backgrounds are generated at NLO using the Powheg-Box v2 [41] generator with the CT10 [42] PDF set in the matrix element (ME) calculations. Pythia 6.4.28 [43] and the corresponding Perugia 2012 tune [44] are used to simulate the parton shower, hadronisation, and the underlying event. Top quarks are decayed using MadSpin [45], preserving all spin correlations. The parameter which controls the \(p_{\mathrm {T}}\) of the first emission beyond the Born configuration in Powheg, called hdamp, is set to the mass of the top quark. The main effect of this is to regulate the high-\(p_{\mathrm {T}}\) emission against which the \(t\bar{t}\) system recoils. The mass of the top quark is set to 172.5GeV. A value of 831\(^{+20}_{-29}\) (scale) \(^{+35}_{-35}\) (PDF+\(\alpha _{\mathrm {S}}\)) \(^{+23}_{-22}\) (mass uncertainty) pb is used for the \(t\bar{t}\) production cross-section, computed with Top++ 2.0 [46], incorporating next-to-next-to-leading order (NNLO) corrections in QCD, including resummation of next-to-next-to-leading logarithmic (NNLL) soft gluon terms. A \(tW\) production cross-section of \(71.7\pm 3.8\) pb is used, as computed in Ref. [47] to approximately NNLO (NNLL+NLO) accuracy.

Diboson processes with four charged leptons, three charged leptons and one neutrino, two charged leptons and two neutrinos, or one boson decaying to leptons and the other hadronically, are simulated using the Sherpa 2.1.1 generator [48]. The matrix elements contain all diagrams with four electroweak vertices. Fully-leptonic decays are calculated for up to one (four leptons, two leptons and two neutrinos) or zero partons (three leptons and one neutrino) at NLO and up to three partons at LO using the Comix [49] and OpenLoops [50] ME generators and merged with the Sherpa parton-shower [51] using the ME+PS@NLO prescription [52]. Semileptonic decays are calculated for up to one (\(ZZ\)) or zero (\(WW\), \(WZ\)) additional partons at NLO and up to three additional partons at LO using Comix and OpenLoops. The CT10 PDF set is used in conjunction with the default parton-shower tuning provided by the Sherpa authors in the release.

The Drell–Yan process is generated at LO using the Pythia8 MC generator with the NNPDF23LO PDF set. The same generator is used for showering and hadronisation. Dilepton mass-dependent K-factors are applied to account for higher-order QCD and electroweak corrections and to normalise the cross-section to NNLO, computed using FEWZ 3.1 [53] and the CT14NNLO PDF set [54].

SM processes such as \(W\)+jets and multi-jet production involving jets that fake leptons are evaluated through the use of data-driven methods detailed in Sect. 5. The \(W\)+jets contribution is estimated with the aid of Sherpa MC simulated samples. Matrix elements are calculated for up to two partons at NLO and four partons at LO using the same procedures, prescriptions and PDF set adopted for the diboson samples. The \(W\)+jets events are normalised to the NNLO cross-section [55].

For all samples used in this analysis, the effects of multiple interactions per bunch crossing (pile-up) are accounted for by overlaying minimum-bias events simulated with Pythia8 and re-weighting the MC events to reproduce the distribution of the average number of interactions per bunch crossing observed in the data. The MC generated events were processed with the ATLAS simulation infrastructure [56], based on Geant4 [57], and passed through the trigger simulation and the same reconstruction software used for the data.

4 Object and event selection

Candidate muon tracks are initially reconstructed independently in the ID and the MS. The two tracks are then used as input to a combined fit which takes into account the energy loss in the calorimeter and multiple scattering. Muon identification is based on information from both the ID and MS to ensure that muons are reconstructed with the optimal momentum resolution up to very high \(p_{\mathrm {T}}\) using the High-\(p_{\mathrm {T}}\) operating point [58]. Muon candidates with hits in regions of the MS with residual misalignments, such as the barrel–endcap overlap region (\(1.01<|\eta |<1.1\)), are vetoed. Muon tracks are required to be within the ID acceptance regionFootnote 2 of \(|\eta |<2.5\) and have at least three hits in each of the three traversed precision chambers in the MS. An exception is made in the region \(|\eta |<0.1\) due to the MS gap in that region, where tracks with at least three hits in a single precision chamber are allowed. In order to suppress hadrons misidentified as muons, the momentum measurements of the ID and the MS must agree within seven standard deviations. As well as the quality cuts, muon candidates must fulfil \(p_{\mathrm {T}}>65\) GeV and transverse impact parameter (\(d_{\mathrm {0}}\)) significance \(|d_{\mathrm {0}}/\sigma _{d_{\mathrm {0}}}|<3\) with respect to the beam line, where \(\sigma _{d_{\mathrm {0}}}\) is the uncertainty in the value of the transverse impact parameter. The distance between the z-position of the point of closest approach of the muon track in the ID to the beamline and the z-coordinate of the primary vertexFootnote 3 (\(\Delta z_{\mathrm {0}}\)) is required to satisfy \(|\Delta z_{\mathrm {0}}\sin \theta |<0.5\) mm. This requirement aims to reduce the background from cosmic rays and from muons originating from heavy-flavour decays. Moreover, candidates are required to fulfil track-based isolation criteria with a fixed efficiency of 99 % over the full range of muon momentum to further reduce contamination from non-prompt muons. The sum of the transverse momentum of tracks in an isolation cone of size \(\Delta R=0.2\) (excluding the muon itself) divided by the muon \(p_{\mathrm {T}}\) is used as a discrimination criterion for the track-based isolation.

Electron candidates are formed from the energy in clusters of cells in the electromagnetic calorimeter associated with a track in the ID [59]. A multivariate analysis approach, employing a likelihood (LH) discriminant, is built to suppress contributions from hadronic jets, photon conversions, Dalitz decays and semileptonic heavy-flavour hadron or kaon decays. The LH discriminant utilises lateral and longitudinal calorimeter shower shape, tracking and cluster–track matching quantities. The discriminant criterion is a function of the tranverse momentum and \(|\eta |\) of the candidate electron. Two operating points are used in this analysis, as defined in Ref. [60]: Medium and Tight. The Tight working point (90 % efficient at \(p_{\mathrm {T}}=65\) GeV) is required for electron candidates, while the Medium working point (95 % efficient at \(p_{\mathrm {T}}=65\) GeV) is used to estimate the background contribution from jets misidentified as electrons (as discussed in Sect. 5). Electron candidates must fulfil \(p_{\mathrm {T}}>65\) GeV and \(|\eta |<2.47\), excluding the region \(1.37<|\eta |<1.52\), where the energy reconstruction performance is degraded due to the presence of extra inactive material. Further requirements are made on the impact parameter: \(|d_{\mathrm {0}}/\sigma _{d_{\mathrm {0}}}|<5\) and \(|\Delta z_{\mathrm {0}}\sin \theta |<0.5\) mm. To reject electrons faked by muons, electron candidates within a \(\Delta R=0.2\) cone around a muon candidate are removed. Moreover, candidates are required to fulfil relative track- (as defined above for muon candidates) and calorimeter-based isolation requirements with a fixed efficiency of 99 %, to suppress background from non-prompt leptons originating from heavy-flavour or kaon decays, charged hadrons and photon conversions from \(\pi ^{\mathrm {0}}\) decays. The sum of the calorimeter transverse energy deposits in an isolation cone of size \(\Delta R=0.2\) (excluding the electron itself) divided by the electron \(p_{\mathrm {T}}\) is used as a discrimination criterion for the calorimeter-based isolation.

Jets, used in the reconstruction of hadronically-decaying \(\tau \) leptons, are reconstructed using the anti-\(k_{t}\) algorithm [61] with a radius parameter (R) of 0.4, using as input topological clusters [62] of calorimeter cells [63]. The three-dimensional topological clusters are built from topologically connected calorimeter cells that contain a significant signal above noise. The cluster energies are corrected for inactive material and out-of-cluster energy losses. Jet calibrations derived from \(\sqrt{s}=13\) TeV simulation, and collision data taken at \(\sqrt{s}=8\) and \(\sqrt{s}=13\) TeV, are used to correct the jet energies and directions to those of the particles from the hard-scatter interaction. This calibration procedure, described in Refs. [6365], is improved by a data-derived correction to the relative calibration of jets in the central and the forward regions.

The reconstruction of \(\tau \) leptons and their visible hadronic decay products, referred to as \(\tau _{\mathrm {had}}^{\mathrm {vis}}\), starts with jets reconstructed from topological clusters as described above. Hadronic decays of \(\tau \) leptons (\(\tau _{\mathrm {had}}\)) are mainly characterised by the presence of one or three charged particles, accompanied by a neutrino and possibly other neutral particles [66]. The \(\tau _{\mathrm {had}}^{\mathrm {vis}}\) candidates must have energy deposits in the calorimeters in the range \(|\eta |<2.5\), with the transition region between the barrel and endcap calorimeters (\(1.37<|\eta |<1.52\)) excluded, a transverse momentum greater than 40GeV, one or three associated tracks and an electric charge of ±1. Their identification is performed using a multivariate algorithm that employs boosted decision trees (BDTs) to discriminate against quark- and gluon-initiated jets using shower shape and tracking information. An additional dedicated likelihood-based veto is used to reduce the number of electrons misidentified as \(\tau _{\mathrm {had}}\). The \(\tau \) lepton candidates which overlap with electron or muon candidates within a cone of \(\Delta R=0.2\) are rejected.

The event selection requires a single-muon or single-electron trigger with a \(p_{\mathrm {T}}\) threshold of 50 GeV for muons, and 60 or 120 GeV for electrons. The single-electron trigger with higher \(p_{\mathrm {T}}\) threshold has a looser LH identification requirement, resulting in an increased trigger efficiency at high \(p_{\mathrm {T}}\). Selected events must have a reconstructed primary vertex and exactly two different-flavour lepton candidates meeting the above-mentioned criteria. Events with an additional lepton or extra “loose” leptonFootnote 4 are vetoed. Moreover, the lepton candidates have to be back-to-back in the \(\phi \) direction with \(\Delta \phi (\ell ,\ell ^{\prime })>2.7\). No requirement is made on the respective charges of the leptons as it is found to reduce the signal efficiency by as much as 6 % for the highest-mass signals considered due to charge mis-assignment, without a significant effect on the background rejection. For a \(Z^\prime \) boson with a mass of 1.5 TeV, the acceptance times efficiencyFootnote 5 (\(A\epsilon \)) of the selection requirements is approximately 50, 25 and 20 % for the \(e\mu \), \(e\tau \) and \(\mu \tau \) final states, respectively. To account for differences between data and simulation, corrections are applied to the lepton trigger, reconstruction, identification, and isolation efficiencies as well as the lepton energy/momentum resolution and scale [58, 59, 66].

The missing transverse momentum (\(E_{\mathrm {T}}^{\mathrm {miss}}\)) is defined as the negative vector sum of the transverse momenta of all identified physics objects (electrons, photons [67], muons, taus, jets) and an additional soft term. The soft term is constructed from all tracks that are associated with the primary vertex but not with any physics object. In this way, the missing transverse momentum is adjusted for the best calibration of the jets and the other identified physics objects above, while maintaining pile-up independence in the soft term [68].

An additional variable to estimate the contribution from reducible backgrounds is used: the transverse mass (\(m_{\mathrm {T}}\)) of a lepton and the \(E_{\mathrm {T}}^{\mathrm {miss}}\), defined as:

$$\begin{aligned} m_{\mathrm {T}} = \sqrt{2 p_{\mathrm {T}}E_{\mathrm {T}}^{\mathrm {miss}} (1-\cos (\Delta \phi (\ell ,E_{\mathrm {T}}^{\mathrm {miss}}))}\;, \end{aligned}$$
(1)

where \(\Delta \phi (\ell ,E_{\mathrm {T}}^{\mathrm {miss}})\) is the azimuthal angle between the lepton \(p_{\mathrm {T}}\) and \(E_{\mathrm {T}}^{\mathrm {miss}}\) direction.

For events in the \(e\tau \) and \(\mu \tau \) channels, in order to reconstruct the dilepton invariant mass more accurately, the neutrino four-momentum is taken into account. The hadronic decay of a \(\tau \) lepton from a heavy resonance leads to the neutrino and the resultant jet being nearly collinear. The neutrino four-momentum is reconstructed from the magnitude of the missing transverse momentum, and is assumed to be collinear with the \(\tau _{\mathrm {had}}\) candidate. For the mentioned channels, the above technique significantly improves the mass resolution and search sensitivity.

5 Background estimation

The background processes for this search can be divided into two categories: irreducible and reducible backgrounds. The former is composed of processes which can produce two different flavour prompt leptons in the final state, including the DY\(\rightarrow \tau \tau \) process, \(t\bar{t}\), single top, and diboson production. These processes are modelled using MC simulated samples. Reducible backgrounds occur when jets are mis-reconstructed as leptons, and require the use of data-driven techniques.

The MC samples used to estimate single-top and \(t\bar{t}\) production are statistically limited for dilepton invariant masses above 1 TeV. Therefore, fits to the \(m_{\ell \ell ^{\prime }}\) distribution using monotonically decreasing functions are used to extrapolate those backgrounds to the region \(m_{\ell \ell ^{\prime }}>1\) TeV. Two functional forms are investigated, chosen for their stability when varying the fit range and for the quality of the fit:

$$\begin{aligned} \mathrm {e}^{-a} \cdot m_{\ell \ell ^{\prime }}^b \cdot m_{\ell \ell ^{\prime }}^{c\cdot \mathrm {ln}(m_{\ell \ell ^{\prime }})} \quad \mathrm {and} \quad \frac{a}{(m_{\ell \ell ^{\prime }}+b)^c}\;, \end{aligned}$$
(2)

where a, b and c are free parameters in the fit. A study of the stability of the fit was performed by varying the lower and upper limits of the fit range between 200–300 GeV and 1000–1200 GeV in 25 GeV steps, respectively. The stitching point between the MC estimation and the fit is chosen to be at 900 GeV for the top quark background. The nominal extrapolation is then taken to be the median of all the tested fit ranges using both functional forms. Good agreement is found between the fit prediction and the available MC events. The addition in quadrature of the fit parameter uncertainties and the RMS of all fit variations is assigned as a systematic uncertainty.

The contribution from reducible backgrounds originate mainly from \(W\)+jets and multi-jet processes. The background of muons originating from hadronic decays is found to be negligible compared to the contribution from fake electrons and taus. Therefore, in the \(e\mu \) channel, where the contribution of the reducible background is expected to be small, these non-prompt muons are neglected. The reducible background in that channel is then reduced to events with one prompt muon and a jet faking an electron. This background contribution is usually not well modelled by MC simulation.

For the \(e\mu \) channel, a technique known as the matrix method, described in Ref. [27], is employed. Exclusive samples are defined by loosening the selection criteria for electron candidates. Here the matrix method involves two parameters that need to be determined as a function of electron \(p_{\mathrm {T}}\): the probability of a loose electron to pass the full object selection, the so-called real electron efficiency (\(\epsilon _{\mathrm {R}}\)), and the probability of a jet fulfilling the loose electron selection criteria to pass the full selection, known as the electron fake rate (\(\epsilon _{\mathrm {F}}\)). The former is evaluated from MC simulation, while the latter is evaluated in a data sample dominated by multi-jet events. To construct this multi-jet control sample, it is required that \(E_{\mathrm {T}}^{\mathrm {miss}}<25\) GeV and \(m_{\mathrm {T}}<50\) GeV in order to suppress the \(W\)+jets contribution. Contamination from \(W\)+jets and other SM background processes (top, diboson, and \(Z\rightarrow \ell \ell \)) is subtracted using MC predictions.

For the \(e\tau \) and \(\mu \tau \) channels, the \(\tau \) fake rate is measured in data in a \(W\) \(\rightarrow e\)/\(\mu \)+jets control region as a function of the \(\tau _{\mathrm {had}}^{\mathrm {vis}}\) \(p_{\mathrm {T}}\). The region is defined to be orthogonal to the signal selection by reversing the \(\Delta \phi (\ell ,\ell ^{\prime })\) requirement. Only events with exactly one electron or muon fulfilling all selection criteria (as defined in Sect. 4), as well as \(m_{\mathrm {T}}>60\) GeV, are used. The \(\tau _{\mathrm {had}}\) candidates present in those events are dominated by jets. The \(\tau \) fake rate is defined as the fraction of jets fulfilling all \(\tau \) object selection criteria, including the multivariate BDT-based identification. The derived fake rate is used to weight simulated \(W\)+jets events. After obtaining the fake-rate-weighted \(m_{\ell \ell ^{\prime }}\) distribution, a normalisation factor for the \(W\)+jets background is obtained in a \(W\)+jets enriched region to scale the overall normalisation of the MC simulation to that of the data. The \(W\)+jets enriched region is defined as a sub-set of the signal selection by further requiring \(E_{\mathrm {T}}^{\mathrm {miss}}>30\) GeV and lepton \(p_{\mathrm {T}}<150\) GeV to avoid possible signal contamination. The contribution from events with an electron/muon and a fake \(\tau _{\mathrm {had}}\) is found to make up around 55 % of the overall background in the \(e\tau \) and \(\mu \tau \) channels.

To evaluate the fake background from events with a real \(\tau _{\mathrm {had}}\) and a fake electron/muon in the \(e\tau \) and \(\mu \tau \) channels, a fake-electron/muon enriched sample is defined by requiring a non-isolated electron/muon and a \(\tau _{\mathrm {had}}\) candidate. Three regions are defined:

  • Region 1: pairs of a non-isolated electron/muon and a \(\tau _{\mathrm {had}}\) with the same electric charge;

  • Region 2: pairs of an isolated electron/muon and a \(\tau _{\mathrm {had}}\) with the same electric charge;

  • Region 3: all pairs of a non-isolated electron/muon and a \(\tau _{\mathrm {had}}\).

The \(m_{\ell \ell ^{\prime }}\) shape of the contribution is obtained from region 3 by subtracting the contribution from other background sources to the data, while the ratio of isolated to non-isolated leptons in regions 1 and 2 is used to normalise this background contribution appropriately. The contribution from events with a fake electron/muon and a real \(\tau \) lepton is found to be below 1 % in the \(\mu \tau \) channel, while in the \(e\tau \) channel its contribution to the overall SM background is close to 5 %.

A summary of the contribution from each SM background in each of the final states can be found in Sect. 8.

6 Systematic uncertainties

Sources of systematic uncertainty are divided in two categories: theoretical and experimental. Uncertainties in the predicted cross-section times branching ratio and the modelling of the \(m_{\ell \ell ^{\prime }}\) shape of the background processes considered are regarded as theoretical uncertainties, while uncertainties relating to the simulation of the detector response are regarded as experimental uncertainties. Theoretical uncertainties (such as PDF-related uncertainties) in the signal cross-section are not considered in this paper.

The PDF uncertainties are the dominant theoretical systematic uncertainties, together with the uncertainty of the extrapolation to estimate the background contribution at high-mass (as described in Sect. 5). The contribution from PDF uncertainties is estimated using different PDF sets and eigenvector uncertainty sets within a particular PDF. The CT10 PDF uncertainty due to eigenvector variations is evaluated through the use of LHAPDF [69] following the prescriptions outlined in Ref. [70]. The uncertainty related to the choice of PDF is evaluated by comparing the results with those from the central value of other PDF sets such as MMHT2014 [71], NNPDF3.0 [72] and CT14 [54]. PDF-related uncertainties in the signal shape are not considered. The uncertainties in the \(m_{\ell \ell ^{\prime }}\) modelling in \(t\bar{t}\) events is obtained using separate MC samples generated with variations in the renormalisation and factorisation scales and the hdamp parameter (as defined in Sect. 3).

The effect of experimental systematic uncertainties is assessed through the uncertainties associated to the corrections applied to simulated processes, including lepton momentum resolution and scale, and trigger, identification, reconstruction and isolation efficiencies [58, 59, 66]. The efficiencies are evaluated using events from the \(Z\rightarrow \ell \ell \) peak and then extrapolated to high energies.

Mismodelling of the muon momentum resolution at the TeV scale, such as due to residual misalignment of the muon precision chambers, can alter the signal and background shapes. An uncertainty related to this is obtained from studies performed in dedicated data-taking periods with no magnetic field in the MS. The muon reconstruction efficiency is affected at high-\(p_{\mathrm {T}}\) by possible large energy losses in the calorimeter. The associated uncertainty is estimated by comparing studies with \(Z\rightarrow \mu \mu \) events in data extrapolated at high-\(p_{\mathrm {T}}\) to the results predicted by MC simulation [73]. The effect on the muon reconstruction efficiency was found to be approximately 3 % per TeV as a function of muon \(p_{\mathrm {T}}\).

The uncertainty in the electron identification efficiency extrapolation is based on the differences in the electron shower shapes in the EM calorimeters between data and MC simulation in the \(Z\rightarrow ee\) peak, which are propagated to the high-\(p_{\mathrm {T}}\) electron sample. The effect on the electron identification efficiency was found to be 2 % and is independent of \(p_{\mathrm {T}}\) for electrons with transverse momentum above 150GeV [73].

The treatment of systematic uncertainties for \(\tau \) leptons with \(p_{\mathrm {T}}\) up to 100 GeV is detailed in Ref. [66]. An additional uncertainty of 20 % per TeV is assigned to the reconstruction efficiency of \(\tau \) leptons with pT > 100 GeV to account for the the degradation of the modelling and reconstruction efficiency due to track merging, derived through studies in simulation and in dijet data events at 8 TeV [74].

The uncertainties associated to the matrix method used for the \(e\mu \) channel are evaluated by considering effects on the \(\epsilon _{\mathrm {F}}\) measurement, including the multi-jet control sample definition and the uncertainties in the overall normalisation. The former effect is evaluated by shifting the \(E_{\mathrm {T}}^{\mathrm {miss}}\) and \(m_{\mathrm {T}}\) requirements by ±10 GeV, while the latter is taken into account by varying the MC subtraction of other SM processes by the luminosity and experimental systematic uncertainties. For the \(e\tau \) and \(\mu \tau \) channels, the uncertainty in the \(\tau \) fake rate and \(W\)+jets normalisation in the MC subtraction is considered. The \(\tau \) fake rate is re-evaluated when removing the \(m_\mathrm {T}\) requirement, requiring \(m_{\tau \ell }>110\) GeV to reduce the Drell–Yan background and vetoing events with a jet identified as originating from a b-quark [75] to reduce top-quark background contamination. The variations obtained for the \(\tau \) fake rates are assigned as systematic uncertainties. Given the limited data available for \(\tau \) lepton \(p_{\mathrm {T}}>500\) GeV, the statistical uncertainty from the last data bin is used together with an uncertainty of 20 % per TeV in \(\tau \) lepton \(p_{\mathrm {T}}\). The uncertainty on the \(W\)+jets normalisation is obtained by recalculating the normalisation factor after a variation for each of the experimental systematic uncertainties outlined in Table 1.

The uncertainty in the reducible background estimate is found to be close to 50, 30 and 40 % for the \(e\mu \), \(e\tau \) and \(\mu \tau \) channels, respectively, at \(m_{\ell \ell ^{\prime }}=1.0\) TeV and it is of comparable size to the PDF uncertainty in the \(e\tau \) and \(\mu \tau \) channels. However, the contribution from reducible backgrounds in the \(e\mu \) channel is below 10 %, while for \(e\tau \) and \(\mu \tau \) final states it is the leading background together with the contribution from top quark production.

Experimental systematic uncertainties common to signal and background processes are assumed to be correlated. The effect of systematic uncertainties on the estimated SM background yields is summarised in Table 1.

For signal processes, only experimental systematic uncertainties are considered. The statistical uncertainty of the signal MC samples is 3 %.

Table 1 Quantitative summary of the systematic uncertainties taken into account for background processes. Values are provided for \(m_{\ell \ell ^{\prime }}\) values of 1, 2 and 3 TeV. The statistical error includes the extrapolation uncertainties of the top quark background in the high-\(m_{\ell \ell ^{\prime }}\) region together with the uncertainty related to the number of MC events. Uncertainties are quoted with respect to the total background. N/A means the systematic uncertainty is not applicable. The expected SM background in a mass window within \(\pm 0.1\cdot m_{\ell \ell ^{\prime }}\) is also reported

7 Statistical analysis

If no deviations from the SM prediction are observed, model-dependent exclusion limits are extracted using a Bayesian method and implemented with the software package Bayesian Analysis Toolkit (BAT) [76] using a template shape method. A binned likelihood function (\(\mathcal {L}\)) is built as the product of the Poisson probability of observing \(n_{\mathrm {obs}_k}\) when expecting \(\mu _{k}\) in each of the mass bins used for the search:

$$\mathcal {L}(n_{\mathrm {obs}}| \theta ,\hat{\Omega }) = \prod ^{N_{\mathrm {bins}}}_{k=1} \dfrac{\mu ^{n_{\mathrm {obs}_k}}_{k}\mathrm {e}^{\mu _{k}}}{n_{\mathrm {obs}_{k}}!} \prod _{i=1}^{N_{\mathrm {Sys}}} G(\Omega _{i},0,1)\;,$$
(3)

where \(\mu _{k}\) is the expected number of background and signal events (\(\mu _{k}=N_{\mathrm {bkg}_k} + N_{\mathrm {sig}_k}(\theta )\)) as a function of the parameter of interest \(\theta \), \(\hat{\Omega }\) is the vector of nuisance parameters introduced to account for the effect of systematic uncertainties in the expected yields, \(N_{\mathrm {bins}}\) is the number of dilepton invariant mass bins, \(N_{\mathrm {Sys}}\) is the total number of nuisance parameters and \(G(\Omega _{i},0,1)\) is a Gaussian distribution with zero mean and unit standard deviation assumed to be the probability density function for the nuisance parameter \(\Omega _{i}\). The dependence on the vector of nuisance parameters is removed through the use of a Markov Chain Monte Carlo integration technique. Bayes theorem is then applied to construct a posterior probability density function for the number of signal events assuming a uniform prior in the parameter of interest (\(P(\theta )\)). The number of signal events can be expressed in terms of the cross-section times branching ratio of the signal process (\(\sigma \cdot \mathrm {BR}(X\rightarrow \ell \ell ^{\prime })\)) as:

$$\begin{aligned} N_{\mathrm {sig}} = \sum ^{\mathrm {N_{\mathrm {bins}}}}_{k=1} N_{\mathrm {sig}_{k}} = \sigma \cdot \mathrm {BR}(X\rightarrow \ell \ell ^{\prime })\cdot L \cdot A\epsilon (X\rightarrow \ell \ell ^{\prime })\;, \end{aligned}$$
(4)

where L is the integrated luminosity of the dataset and \(A\epsilon (X\rightarrow \ell \ell ^{\prime })\) is the acceptance times efficiency of the physics model tested. As such, a posterior probability density function is obtained for the signal \(\sigma \cdot \mathrm {BR}\). A 95 % credibility level (CL) upper limit is obtained on the signal cross-section times branching ratio by finding the value of \(\theta ^{\mathrm {95}}\) satisfying:

$$\begin{aligned} 0.95 = \dfrac{\int ^{\theta ^{\mathrm {95}}}_{\mathrm {0}} \mathcal {L^{\prime }}(n_{\mathrm {obs}}|\theta )P(\theta )\mathrm {d}\theta }{\int ^{\infty }_0\mathcal {L^{\prime }}(n_{\mathrm {obs}}|\theta )P(\theta ) \mathrm {d}\theta }\;, \end{aligned}$$
(5)

where \(P(\theta )\) is the uniform prior probability mentioned above and \(\mathcal {L^{\prime }}\) is the marginalised likelihood, obtained after performing the Markov Chain Monte Carlo integration over \(\hat{\Omega }\). Expected exclusion limits are obtained by running 1000 pseudo-experiments (PE) for each of the signal mass points tested. The median value of the 95 % CL upper Bayesian limit PE distribution is taken as the expected limit. The one- and two-standard deviation intervals of the expected limit are obtained from the 1000 PE ensemble by finding the 68 and 95 % CL interval envelopes, respectively.

The predicted width of the \(Z^\prime \) boson, 3 % for \(m_{Z^\prime }=2\) TeV, is lower than the detector resolution for the \(e\mu \) and the \(\mu \tau \) channels, which are approximately 8 % and 12 %, respectively, at the same \(Z^\prime \) boson mass. For the \(e\tau \) final state the detector resolution is 4 % at \(m_{Z^\prime }=2\) TeV, comparable to the \(Z^\prime \) boson width. The width of the \(\tilde{\nu }_{\tau }\) is below 1 % and hence the resolution of the detector is larger than the width for each of the final states investigated. For limit setting on the signal models investigated, a logarithmic \(m_{\ell \ell ^{\prime }}\) binning is used with 40 mass bins between 120 and 10,000 GeV. The bin width is around 10 % in dilepton mass throughout the whole range.

8 Results

Table 2 summarises the expected and observed yields in the validation and search regions for each of the channels considered in this search. The region \(m_{\ell \ell ^{\prime }}<600\) GeV is defined as the validation region where the data is used to check the SM background prediction, while the region \(m_{\ell \ell ^{\prime }}>600\) GeV is defined as the search region. Selected \(e\mu \) events are dominated by \(t\bar{t}\) events, while \(W\)+jets events are dominant for the \(e\tau \) and \(\mu \tau \) final states.

Table 2 Observed and expected numbers of (a) \(e\mu \) , (b) \(e\tau \), and (c) \(\mu \tau \) events in the validation (\(m_{\ell \ell ^{\prime }}<600\) GeV) and search regions (\(m_{\ell \ell ^{\prime }}>600\) GeV) for the SM backgrounds and the signal models considered. The quoted errors include statistical and systematic uncertainties. The uncertainties for the total background predictions account for the correlations between the uncertainties of the different background contributions

Figure 1 shows the \(e\mu \), \(e\tau \) and \(\mu \tau \) invariant mass distribution. The event with the largest dilepton invariant mass is found in the \(e\mu \) channel with \(m_{e\mu }\)=2.1  TeV. Since the SM expectation for \(m_{e\mu }>\) 2 TeV is 0.02±0.02 events, the probability of observing one or more events is 2.6 %. It is then concluded that the observation of this high-mass candidate event is compatible with a statistical fluctuation and no significant excess is found over the expected background. Therefore, the observed data are concluded to be consistent with the SM prediction, and model-dependent exclusion limits are extracted using the techniques described in Sect. 7.

Fig. 1
figure 1

The invariant mass distribution of final selected. a \(e\mu \), b \(e\tau \) and c \(\mu \tau \) pairs for data and MC predictions. Three selected signals are overlaid: a \(Z^\prime \) with a mass of 2.0 and 1.5 TeV, a \(\tau \) sneutrino (\(\tilde{\nu }_{\tau }\)) with a mass of 2.0 and 1.5 TeV, and a RS quantum black hole (QBH) with a threshold mass of 2.0 and 1.5 TeV. The signal mass point shown corresponds to the highest acceptance times efficiency in each channel. The error bars show the statistical uncertainty of the observed yields corresponding to a 68 % interval in a Poisson distribution, while the band in the bottom plot includes all systematic uncertainties added in quadrature

Figures 2, 3 and 4 show the 95 % CL expected and observed upper limits on the production cross-section times branching ratio of the \(Z^\prime \), RPV SUSY \(\tilde{\nu }_{\tau }\) and QBH models for each of the final states considered. The extracted limits worsen for signal masses above 2.5 (1.5)  TeV in the \(e\mu \) (\(e\tau \) and \(\mu \tau \)) channel due to a decrease in the lepton reconstruction efficiency at very high \(p_{\mathrm {T}}\). Results are summarised in Table 3. The \(A\epsilon \) of the ADD and RS QBH models were found to agree within 1 % and therefore the same curve is used for the limit extraction.

Fig. 2
figure 2

The observed and expected 95 % credibility level upper limits on the a \(Z^\prime \), b \(\tau \) sneutrino (\(\tilde{\nu }_{\tau }\)) and c QBH ADD and RS production cross-section times branching ratio in decays to an \(e\mu \) final state. The signal theoretical cross-section times branching ratio lines for the \(Z'\) model, the QBH ADD model assuming six extra dimensions and the RS model with one extra dimension are obtained from the Monte Carlo generators simulating each process, while the RPV SUSY \(\tilde{\nu }_{\tau }\) includes the NLO K-factor calculated using LoopTools [38]. The expected limits are plotted with the ±1 and ±2 standard deviation uncertainty bands

Fig. 3
figure 3

The observed and expected 95 % credibility level upper limits on the (a) \(Z^\prime \), (b) \(\tau \) sneutrino (\(\tilde{\nu }_{\tau }\)) and (c) QBH ADD and RS production cross-section times branching ratio in decays to an \(e\tau \) final state. The signal theoretical cross-section times branching ratio lines for the \(Z'\) model, the QBH ADD model assuming six extra dimensions and the RS model with one extra dimension are obtained from the Monte Carlo generators simulating each process, while the RPV SUSY \(\tilde{\nu }_{\tau }\) includes the NLO K-factor calculated using LoopTools [38]. The expected limits are plotted with the \(\pm 1\) and \(\pm 2\) standard deviation uncertainty bands

Fig. 4
figure 4

The observed and expected 95 % credibility level upper limits on the a \(Z^\prime \), b \(\tau \) sneutrino (\(\tilde{\nu }_{\tau }\)) and c QBH ADD and RS production cross-section times branching ratio in decays to an \(\mu \tau \) final state. The signal theoretical cross-section times branching ratio lines for the \(Z'\) model, the QBH ADD model assuming six extra dimensions and the RS model with one extra dimension are obtained from the Monte Carlo generators simulating each process, while the RPV SUSY \(\tilde{\nu }_{\tau }\) includes the NLO K-factor calculated using LoopTools [38]. The expected limits are plotted with the \(\pm 1\) and \(\pm 2\) standard deviation uncertainty bands

Table 3 Expected and observed 95 % credibility level lower limits on the mass of a \(Z^\prime \) with lepton-flavour-violating couplings, a supersymmetric \(\tau \) sneutrino (\(\tilde{\nu }_{\tau }\)) with R-parity-violating couplings, and the threshold mass for quantum black hole production for the ADD \(n=6\) and RS \(n=1\) models. Limits for all channels are reported

9 Conclusions

A search for a heavy particle decaying into an \(e\mu \), \(e\tau \) or \(\mu \tau \) \((\ell \ell ^{\prime })\) final state is conducted, using 3.2 fb\(^{-1}\) of \(\sqrt{s}=13\) TeV proton–proton collision data recorded by the ATLAS detector at the Large Hadron Collider. The data are found to be consistent with the Standard Model prediction in both the validation region (\(m_{\ell \ell ^{\prime }}<600\) GeV) and search region (\(m_{\ell \ell ^{\prime }}>600\) GeV). With no evidence of new physics, Bayesian lower limits at 95 % credibility level are set on the mass of a \(Z^\prime \) vector boson with lepton-flavour-violating couplings at 3.0, 2.7 and 2.6 TeV separately for \(e\mu \), \(e\tau \) and \(\mu \tau \) pairs, and a supersymmetric \(\tau \) sneutrino (\(\tilde{\nu }_{\tau }\)) with R-parity-violating couplings at 2.3, 2.2 and 1.9 TeV. The results are also interpreted as limits on the threshold mass for quantum black hole production. The exclusion limits extracted on the mass of a \(Z^\prime \) and the supersymmetric \(\tau \) sneutrino extend by around 20 % those reported by ATLAS and CMS using the full dataset at \(\sqrt{s}\) = 8 TeV.