Discussion:
web-platform-tests that fail only in Firefox (from wpt.fyi data)
Philip Jägenstedt
2018-10-11 20:22:37 UTC
Permalink
Hi all,

I sent the result of some investigation to webkit-dev [1] today and
thought you might be interested to take a look the equivalent list for
Firefox.

https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
list of tests that fail in Firefox Nightly, but pass in stable
versions of Chrome, Edge and Safari. Although not all of them will be
high-value and really impact web developers, these are probably more
valuable to fix than a random WPT failure. Triage and prioritization
required, of course.

Skimming the list, I'd guess that css-flexbox, css-grid, fetch and
streams might be the most worth digging into.
cors-cookies-redirect.any.html, for example, seems like something that
could matter in the real world.

Making this part of the wpt.fyi UI is a current priority [2] but I
thought this one-off analysis might still be useful to y'all.

[1] https://lists.webkit.org/pipermail/webkit-dev/2018-October/030209.html
[2] https://github.com/web-platform-tests/wpt.fyi/issues/201
Boris Zbarsky
2018-10-11 20:34:19 UTC
Permalink
Post by Philip Jägenstedt
https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
list of tests that fail in Firefox Nightly, but pass in stable
versions of Chrome, Edge and Safari.
Or more precisely have some sub-test that has that property, right?

Thank you for putting this list together.

-Boris
Philip Jägenstedt
2018-10-13 07:17:01 UTC
Permalink
Post by Boris Zbarsky
Post by Philip Jägenstedt
https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
list of tests that fail in Firefox Nightly, but pass in stable
versions of Chrome, Edge and Safari.
Or more precisely have some sub-test that has that property, right?
Right, since there's no way to link to a subtest, in those cases I've
linked to the test and it might take some work to spot which subtest it
was. If this is a problem I could improve the report.

Thanks for filing the tracking bug, l hope there's some failures in here
that point to problems that really affect web developers that can be fixed.
Philip Jägenstedt
2018-10-13 07:27:09 UTC
Permalink
Post by Philip Jägenstedt
Post by Boris Zbarsky
Post by Philip Jägenstedt
https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
list of tests that fail in Firefox Nightly, but pass in stable
versions of Chrome, Edge and Safari.
Or more precisely have some sub-test that has that property, right?
Right, since there's no way to link to a subtest, in those cases I've
linked to the test and it might take some work to spot which subtest it
was. If this is a problem I could improve the report.
Thanks for filing the tracking bug, l hope there's some failures in here
that point to problems that really affect web developers that can be fixed.
There's another crux worth mentioning. Tests can be definitely passing or
definitely failing, but then there are various crash/error/timeout/etc
results where the validity of the test is uncertain, or it's quite likely
to be a flake or infra issue. In my report I've been conservative and used
1 PASS + 3 FAIL as the criteria. Fiddling with these rules can reveal lots
more potential issues, and if you like I could provide reports on that too.
Emilio Cobos Álvarez
2018-10-17 00:23:10 UTC
Permalink
Hi Philip,

Do you know how do reftests run in order to get that data?

I'm particularly curious about this Firefox-only failure:

css/selectors/selection-image-001.html

It passes both on our automation and locally. I'm curious because I was
the author of that test (whoops) and the Firefox fix (bug 1449010).

Does it use the same mechanism than our automation to wait for image
decodes and such? Is there any way to see the test images?

IIRC one potential difference here is that Firefox blocks the load event
for image loads, but doesn't decode images synchronously unlike other
browsers, so we may fire the load event but not paint the image. Our
reftest harnesses has use internal APIs to ensure that the screenshot is
taken with all the images decoded.

I suspect that can't be the cause of this test failure, since the image
is really small and I would've expected it to get synchronously decoded
anyway (we sync-decode if fast by default), but I'm no expert about how
wpt.fyi is set up, thus the curiosity, I'd love to be able to see the
screenshots of that test.

Thanks in advance,

-- Emilio
Post by Philip Jägenstedt
Post by Philip Jägenstedt
Post by Boris Zbarsky
Post by Philip Jägenstedt
https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
list of tests that fail in Firefox Nightly, but pass in stable
versions of Chrome, Edge and Safari.
Or more precisely have some sub-test that has that property, right?
Right, since there's no way to link to a subtest, in those cases I've
linked to the test and it might take some work to spot which subtest it
was. If this is a problem I could improve the report.
Thanks for filing the tracking bug, l hope there's some failures in here
that point to problems that really affect web developers that can be fixed.
There's another crux worth mentioning. Tests can be definitely passing or
definitely failing, but then there are various crash/error/timeout/etc
results where the validity of the test is uncertain, or it's quite likely
to be a flake or infra issue. In my report I've been conservative and used
1 PASS + 3 FAIL as the criteria. Fiddling with these rules can reveal lots
more potential issues, and if you like I could provide reports on that too.
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform
James Graham
2018-10-17 09:12:51 UTC
Permalink
Post by Emilio Cobos Álvarez
Hi Philip,
Do you know how do reftests run in order to get that data?
  css/selectors/selection-image-001.html
It passes both on our automation and locally. I'm curious because I was
the author of that test (whoops) and the Firefox fix (bug 1449010).
Does it use the same mechanism than our automation to wait for image
decodes and such? Is there any way to see the test images?
It's using the same harness as we use in gecko, so it should be giving
the same results, but of course it's possible that there's some
difference in the configuration that could cause different results for
some tests.

Unfortunately there isn't yet a way to see the images; because of the
number of failures per run, and the number of runs, putting all the
screenshots in the logs would be prohibitively large, but there is a
plan to start uploading previously unseen screenshots to wpt.fyi [1]

Having said that the infrastructure is all containerised and it's
possible to repeat the run locally with relatively little effort. I'm
happy to help out with that if you like.

[1] https://github.com/web-platform-tests/wpt.fyi/issues/57
James Graham
2018-10-17 09:56:03 UTC
Permalink
Post by James Graham
Post by Emilio Cobos Álvarez
Hi Philip,
Do you know how do reftests run in order to get that data?
   css/selectors/selection-image-001.html
It passes both on our automation and locally. I'm curious because I
was the author of that test (whoops) and the Firefox fix (bug 1449010).
Does it use the same mechanism than our automation to wait for image
decodes and such? Is there any way to see the test images?
It's using the same harness as we use in gecko, so it should be giving
the same results, but of course it's possible that there's some
difference in the configuration that could cause different results for
some tests.
Unfortunately there isn't yet a way to see the images; because of the
number of failures per run, and the number of runs, putting all the
screenshots in the logs would be prohibitively large, but there is a
plan to start uploading previously unseen screenshots to wpt.fyi [1]
OK, I investigated this and it turns out that we accidentally started
uploading tbpl-style logs with screenshots for full runs when we turned
on taskcluster for PRs. So the screenshot is available through

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://taskcluster-artifacts.net/U6OIGr7ZTjurDYjy_KgyCg/0/public/results/log_tbpl.log
Emilio Cobos Álvarez
2018-10-17 12:03:15 UTC
Permalink
Post by James Graham
Post by James Graham
Post by Emilio Cobos Álvarez
Hi Philip,
Do you know how do reftests run in order to get that data?
   css/selectors/selection-image-001.html
It passes both on our automation and locally. I'm curious because I
was the author of that test (whoops) and the Firefox fix (bug 1449010).
Does it use the same mechanism than our automation to wait for image
decodes and such? Is there any way to see the test images?
It's using the same harness as we use in gecko, so it should be giving
the same results, but of course it's possible that there's some
difference in the configuration that could cause different results for
some tests.
Unfortunately there isn't yet a way to see the images; because of the
number of failures per run, and the number of runs, putting all the
screenshots in the logs would be prohibitively large, but there is a
plan to start uploading previously unseen screenshots to wpt.fyi [1]
OK, I investigated this and it turns out that we accidentally started
uploading tbpl-style logs with screenshots for full runs when we turned
on taskcluster for PRs. So the screenshot is available through
https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://taskcluster-artifacts.net/U6OIGr7ZTjurDYjy_KgyCg/0/public/results/log_tbpl.log
Thanks! So it looks that the reftest screenshots are taken on inactive
windows?

We don't respect ::selection for inactive windows, so the failure now
makes sense.

Still I think there's something fishy there, but it may be related to
the widget toolkit that is on wpt's CI or something...

-- Emilio
Post by James Graham
_______________________________________________
dev-platform mailing list
https://lists.mozilla.org/listinfo/dev-platform
Philip Jägenstedt
2018-10-17 13:10:23 UTC
Permalink
Post by Emilio Cobos Álvarez
Post by James Graham
Post by James Graham
Post by Emilio Cobos Álvarez
Hi Philip,
Do you know how do reftests run in order to get that data?
css/selectors/selection-image-001.html
It passes both on our automation and locally. I'm curious because I
was the author of that test (whoops) and the Firefox fix (bug 1449010).
Does it use the same mechanism than our automation to wait for image
decodes and such? Is there any way to see the test images?
It's using the same harness as we use in gecko, so it should be giving
the same results, but of course it's possible that there's some
difference in the configuration that could cause different results for
some tests.
Unfortunately there isn't yet a way to see the images; because of the
number of failures per run, and the number of runs, putting all the
screenshots in the logs would be prohibitively large, but there is a
plan to start uploading previously unseen screenshots to wpt.fyi [1]
OK, I investigated this and it turns out that we accidentally started
uploading tbpl-style logs with screenshots for full runs when we turned
on taskcluster for PRs. So the screenshot is available through
https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://taskcluster-artifacts.net/U6OIGr7ZTjurDYjy_KgyCg/0/public/results/log_tbpl.log
Thanks! So it looks that the reftest screenshots are taken on inactive
windows?
We don't respect ::selection for inactive windows, so the failure now
makes sense.
Still I think there's something fishy there, but it may be related to
the widget toolkit that is on wpt's CI or something...
Thanks James for accidentally storing screenshots in Taskcluster logs
and figuring out how to use them with reftest-analyzer, that's great
and I'll pass along this tip to blink-dev as well :D
Boris Zbarsky
2018-10-17 21:53:31 UTC
Permalink
Post by Philip Jägenstedt
Fiddling with these rules can reveal lots
more potential issues, and if you like I could provide reports on that too.
I would be pretty interested in that, yes. In particular, a report
where there is 1 "not PASS and not FAIL" and 3 "PASS" would be pretty
helpful, I suspect.

By the way, I recently found some tests that fail when run directly but
pass in the harness. :( For example
http://w3c-test.org/html/infrastructure/common-dom-interfaces/collections/htmlallcollection.html
fails various subtests in all browsers due to the <div id="log"> being
in the DOM when running directly. Not really sure what we can do with that.

-Boris
Philip Jägenstedt
2018-10-19 12:42:10 UTC
Permalink
Post by Boris Zbarsky
Post by Philip Jägenstedt
Fiddling with these rules can reveal lots
more potential issues, and if you like I could provide reports on that too.
I would be pretty interested in that, yes. In particular, a report
where there is 1 "not PASS and not FAIL" and 3 "PASS" would be pretty
helpful, I suspect.
Rerunning my script it's apparent that unreliable Edge results [1]
leads to the same tests being considered lone failures or not for the
other browsers. So, I've use the same set of runs for this report of
what you suggested:
https://gist.github.com/foolip/e6014c9bcc8ca405219bf18542eb5d69

It's not a long list, so I checked them all and they are timeouts.
This is sometimes the failure mode for genuine problems, so looking
over these might be valuable.
Post by Boris Zbarsky
By the way, I recently found some tests that fail when run directly but
pass in the harness. :( For example
http://w3c-test.org/html/infrastructure/common-dom-interfaces/collections/htmlallcollection.html
fails various subtests in all browsers due to the <div id="log"> being
in the DOM when running directly. Not really sure what we can do with that.
That's a bit odd, the <div id="log"> is in the markup and would be
when running manually or under automation. Are you sure that explains
the difference? If it does, then just removing it from the markup and
adapting any affected tests would be the way to go. I updated the test
pretty recently, if you're confident it's broken can you file a wpt
issue and assign me?

[1] https://github.com/web-platform-tests/results-collection/issues/563
Boris Zbarsky
2018-10-19 15:52:36 UTC
Permalink
Post by Philip Jägenstedt
That's a bit odd, the <div id="log"> is in the markup and would be
when running manually or under automation. Are you sure that explains
the difference?
Yes. I filed https://github.com/web-platform-tests/wpt/issues/13625

-Boris

Boris Zbarsky
2018-10-11 20:36:54 UTC
Permalink
I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1498357 to track
these failures.

-Boris
Loading...