Re: [Tails-dev] Bug#914980: linux-image-4.18.0-3-amd64: linu…

Delete this message

Reply to this message
Autor: Cyril Brulebois
Data:  
Para: Russel Winder, 914980
CC: Julian Calaby, Amy Kos, Sebastian Andrzej Siewior, Chris Wilson
Assunto: Re: [Tails-dev] Bug#914980: linux-image-4.18.0-3-amd64: linux image installed via 4.18.0-3 won't reboot on T500 and X201
Hi Russel and others,

Russel Winder <russel@???> (2018-11-29):
> It is also perhaps worth noting that whilst this new kernel reboots
> fine on a Lenovo X1, on the T500 and X201 rebooting just locks the
> laptops up requiring a forced power button shutdown.


There's a pending patch upstream (at least in drm-intel) for this bug:
https://patchwork.freedesktop.org/patch/265653/

but it doesn't seem to apply cleanly on v4.18.20:
| Changes to be committed:
| 
|     modified:   drivers/gpu/drm/i915/i915_drv.h
|     modified:   drivers/gpu/drm/i915/i915_gpu_error.c
| 
| Unmerged paths:
|   (use "git add <file>..." to mark resolution)
| 
|     both modified:   drivers/gpu/drm/i915/i915_gem.c
|     both modified:   drivers/gpu/drm/i915/intel_engine_cs.c
|     both modified:   drivers/gpu/drm/i915/intel_lrc.c
|     both modified:   drivers/gpu/drm/i915/intel_ringbuffer.c
|     both modified:   drivers/gpu/drm/i915/intel_ringbuffer.h


with things like:
| diff --cc drivers/gpu/drm/i915/intel_ringbuffer.h
| index 010750e8ee44,72edaa7ff411..000000000000
| --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
| +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
| @@@ -415,7 -436,9 +415,13 @@@ struct intel_engine_cs 
|   
|         struct intel_hw_status_page status_page;
|         struct i915_ctx_workarounds wa_ctx;
| ++<<<<<<< HEAD
|  +      struct i915_vma *scratch;
| ++=======
| +       struct i915_wa_list ctx_wa_list;
| +       struct i915_wa_list wa_list;
| +       struct i915_wa_list whitelist;
| ++>>>>>>> 517974992593... drm/i915: Allocate a common scratch page
|   
|         u32             irq_keep_mask; /* always keep these interrupts */
|         u32             irq_enable_mask; /* bitmask to enable ring interrupt */


I might have missed it, but I wasn't able to spot this commit's
being advertised on stable@ anyway, at least by manually checking:
https://www.spinics.net/lists/stable/

(hence Chris Wilson in copy.)

Since that doesn't look like things I can fix on my own, I took
inspiration from a bug reporter[1], trying to simply revert the
offending patch (which is said to be fixed by the commit message
of 5179749925933575a67f9d8f16d0cc204f98a29f[2]).

1. https://bugs.freedesktop.org/show_bug.cgi?id=108984
2. https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-next&id=5179749925933575a67f9d8f16d0cc204f98a29f

I'm attaching a patch against the sid branch of the debian packaging,
implementing this revert; and packages built from this updated sid
branch can be found in this repository (also available over https://):

deb http://apt.debamax.com/bug-914980/ sid main

This repository is signed with my work key (itself signed by my Debian
key), to make it clear it's an independent effort at trying to get this
bug worked around for the time being, rather than something official
from the debian-kernel team. In any case, this repository will only be
kept online while this issue isn't fixed in unstable.

Test reports welcome!


For context, Tails usually tracks the Linux kernel from sid to get
regular bugfixes; we seem to have missed this regression on gen4/gen5,
so I've started checking whether the upstream fix was being queued for
linux-4.18.y, and moved to trying to get a work around once I've noticed
that wasn't the case yet. We might end up temporarily downgrading the
kernel in Tails instead, though.

https://redmine.tails.boum.org/code/issues/16224


Cheers,
--
Cyril Brulebois -- Debian Consultant @ DEBAMAX -- https://debamax.com/
commit 6b1dd9475806fa55283f9e934b2bec5fbc6e60af
Author: Cyril Brulebois <cyril@???>
Date: Sat Dec 15 16:47:55 2018 +0100

    Fix gen4/gen5 regression from 4.18.19 by reverting this upstream commit (Closes: #914980):


    - drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5


diff --git a/debian/changelog b/debian/changelog
index b3b84dda0311..095551bac297 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,7 +1,13 @@
linux (4.18.20-3) UNRELEASED; urgency=medium

+ [ Vagrant Cascadian ]
* debian/config/config: Enable Z3FOLD as a module.

+  [ Cyril Brulebois ]
+  * Fix gen4/gen5 regression from 4.18.19 by reverting this upstream
+    commit (Closes: #914980):
+    - drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
+
  -- Vagrant Cascadian <vagrant@???>  Sun, 25 Nov 2018 20:31:40 -0800


 linux (4.18.20-2) unstable; urgency=medium
diff --git a/debian/patches/bugfix/all/Revert-drm-i915-ringbuffer-Delay-after-EMIT_INVALIDA.patch b/debian/patches/bugfix/all/Revert-drm-i915-ringbuffer-Delay-after-EMIT_INVALIDA.patch
new file mode 100644
index 000000000000..1c6e111381d8
--- /dev/null
+++ b/debian/patches/bugfix/all/Revert-drm-i915-ringbuffer-Delay-after-EMIT_INVALIDA.patch
@@ -0,0 +1,104 @@
+From 8d4bf1ce648ab1b16eca14baf778cc39756afbeb Mon Sep 17 00:00:00 2001
+From: Cyril Brulebois <cyril@???>
+Date: Sat, 15 Dec 2018 16:34:38 +0100
+Subject: [PATCH] Revert "drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for
+ gen4/gen5"
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This reverts commit 06e562e7f515292ea7721475950f23554214adde.
+
+v4.18.20 regresses at least on gen4 as seen in these bug reports:
+  https://bugs.freedesktop.org/108850
+  https://bugs.freedesktop.org/108984
+  https://bugs.debian.org/914980
+  https://redmine.tails.boum.org/code/issues/16224
+
+This patch landed in various drm-intel branches but hasn't found its way
+to linux-4.18.y yet:
+  https://patchwork.freedesktop.org/patch/265653/
+
+Trying to apply it on top of v4.18.20 triggers several conflicts, so it
+seems safer to just revert what seems to be the culprit, as confirmed by
+a user reporting this revert fixes the problem for them, and by this
+part of the commit message for the actual fix in drm-intel:
+
+    commit 5179749925933575a67f9d8f16d0cc204f98a29f
+    Author: Chris Wilson <chris@???>
+    Date:   Tue Dec 4 14:15:16 2018 +0000
+
+        drm/i915: Allocate a common scratch page
+    […]
+        Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
+        Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850
+    […]
+
+Signed-off-by: Cyril Brulebois <cyril@???>
+---
+ drivers/gpu/drm/i915/intel_ringbuffer.c | 38 ++-------------------------------
+ 1 file changed, 2 insertions(+), 36 deletions(-)
+
+diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
+index 72007d634359..8f19349a6055 100644
+--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
++++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
+@@ -91,7 +91,6 @@ static int
+ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
+ {
+     u32 cmd, *cs;
+-    int i;
+ 
+     /*
+      * read/write caches:
+@@ -128,45 +127,12 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
+             cmd |= MI_INVALIDATE_ISP;
+     }
+ 
+-    i = 2;
+-    if (mode & EMIT_INVALIDATE)
+-        i += 20;
+-
+-    cs = intel_ring_begin(rq, i);
++    cs = intel_ring_begin(rq, 2);
+     if (IS_ERR(cs))
+         return PTR_ERR(cs);
+ 
+     *cs++ = cmd;
+-
+-    /*
+-     * A random delay to let the CS invalidate take effect? Without this
+-     * delay, the GPU relocation path fails as the CS does not see
+-     * the updated contents. Just as important, if we apply the flushes
+-     * to the EMIT_FLUSH branch (i.e. immediately after the relocation
+-     * write and before the invalidate on the next batch), the relocations
+-     * still fail. This implies that is a delay following invalidation
+-     * that is required to reset the caches as opposed to a delay to
+-     * ensure the memory is written.
+-     */
+-    if (mode & EMIT_INVALIDATE) {
+-        *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+-        *cs++ = i915_ggtt_offset(rq->engine->scratch) |
+-            PIPE_CONTROL_GLOBAL_GTT;
+-        *cs++ = 0;
+-        *cs++ = 0;
+-
+-        for (i = 0; i < 12; i++)
+-            *cs++ = MI_FLUSH;
+-
+-        *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+-        *cs++ = i915_ggtt_offset(rq->engine->scratch) |
+-            PIPE_CONTROL_GLOBAL_GTT;
+-        *cs++ = 0;
+-        *cs++ = 0;
+-    }
+-
+-    *cs++ = cmd;
+-
++    *cs++ = MI_NOOP;
+     intel_ring_advance(rq, cs);
+ 
+     return 0;
+-- 
+2.11.0
+
diff --git a/debian/patches/series b/debian/patches/series
index bd216f4bd898..347d8fd16191 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -100,6 +100,7 @@ bugfix/all/partially-revert-usb-kconfig-using-select-for-usb_co.patch
 bugfix/all/kbuild-include-addtree-remove-quotes-before-matching-path.patch
 debian/revert-objtool-fix-config_stack_validation-y-warning.patch
 bugfix/all/netfilter-ipvs-Fix-invalid-bytes-in-IP_VS_MH_TAB_IND.patch
+bugfix/all/Revert-drm-i915-ringbuffer-Delay-after-EMIT_INVALIDA.patch


# Miscellaneous features
features/all/kbuild-add-build-salt-to-the-kernel-and-modules.patch