VOOZH about

URL: https://phabricator.wikimedia.org/T416574

⇱ ⚓ T416574 [MEX] [M5] [SPIKE] Investigate enabling Cypress video recording for browser tests


Maniphest T416574

[MEX] [M5] [SPIKE] Investigate enabling Cypress video recording for browser tests
Open, Needs TriagePublic

Description

NOTE: Task requirements were implemented and it will remain open for further performance observation based on the data provided.

The build times (19m 44s and 22m 06s) are comparable to the build times without the recordings - it doesn't seem to be a massive performance hit. But we would need to have more data to be sure about exactly what the impact is.


In our January 2026 retrospective, we agreed that having video recordings of failed Cypress runs would be very helpful to investigate failures. We should investigate:

Documentation: Capture Screenshots and Videos. See also wdio-mediawiki configuration.

Timebox: 8 hrs

Related Objects

StatusSubtypeAssignedTask
OpenNoneT394621 [MEX] Mobile Editing Experience of Items Project
OpenNoneT415326 [MEX] M5 - additional functionality and clean up
OpenNoneT398036 [MEX] M5 - Testing
ResolvedkarapayneWMDET400154 [MEX] M2 - Create tests for view files
ResolvedkarapayneWMDET400336 [MEX] M2 - Create unit tests for mainSnak
ResolvedkarapayneWMDET400337 [MEX] M2 - Create unit tests for statementView
ResolvedkarapayneWMDET400338 [MEX] M2 - Create unit tests for statementSections
ResolvedkarapayneWMDET400339 [MEX] M2 - Create unit tests for statementDetailView
ResolvedkarapayneWMDET400340 [MEX] M2 - Create unit tests for references
ResolvedkarapayneWMDET400341 [MEX] M2 - Create unit tests for propertyName
ResolvedkarapayneWMDET400342 [MEX] M2 - Create unit tests for qualifiers
ResolvedkarapayneWMDET400678 [MEX] Spike - accessibility testing
ResolvedkarapayneWMDET400471 [MEX] Create end2end testing framework
DeclinedNoneT401698 [MEX] M3.4 - Update test coverage for new edit functionality
DeclinedNoneT401829 [MEX] Create testing framework for performance and minimum supported devices
OpenNoneT416574 [MEX] [M5] [SPIKE] Investigate enabling Cypress video recording for browser tests
DeclinedNoneT425542 [MEX] Enable Cypress video recording for browser tests on retry
Resolvedmahmoud.abdelsattar.wmdeT415170 🚧 [MEX] E2E tests refactoring for better performance (Experimental)
Resolvedmahmoud.abdelsattar.wmdeT417859 🚧 [MEX][SPIKE] Dividing the E2E specs into multiple parallel processes to reduce the execution time
ResolvedNoneT415487 🚧[MEX][Score] Introduce integration testing for the musical notation statements
DeclinedNoneT412190 Flaky Cypress tests: wbui2025 string datatypes (tabular-data and geo-shape)
DeclinedNoneT416215 Flaky Cypress test: wbui2025 add qualifiers: mobile view: is possible to add and edit a qualifier
Resolvedmahmoud.abdelsattar.wmdeT416160 Flaky Cypress test: wbui2025 time datatypes: mobile view - time datatype: allows adding time statement to empty item, displays statement and supports full editing workflow
Resolvedmahmoud.abdelsattar.wmdeT419592 [MEX] M5 - clean up debounce workarounds in editableTimeSnakValue.spec.js and editableSnakValue.spec.js

Event Timeline

Comment Actions

Change #1243718 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/Wikibase@master] Enable video recordings for cypress tests

https://gerrit.wikimedia.org/r/1243718

Comment Actions

The attached patch results in videos being saved to the build artefacts for the cypress tests:

The build times (19m 44s and 22m 06s) are comparable to the build times without the recordings - it doesn't seem to be a massive performance hit. But we would need to have more data to be sure about exactly what the impact is.

Comment Actions

Change #1243718 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Enable video recordings for cypress tests

https://gerrit.wikimedia.org/r/1243718

Comment Actions

Shouldn’t the investigation remain open until we’ve assessed the performance impact?

Comment Actions

@mahmoud.abdelsattar.wmde Is this ticket now a ticket about gathering that data then? If so we should probably update the description or create a subtask and re-groom it. Or what is the plan to gather job performance data?

Comment Actions

@ArthurTaylor I don't think creating a sub task is necessary, we are just trying to observe the performance impact after implementing the video recording in the CI.
If there is a significant impact to the CI build times caused by the current task, we can create a sub task for the new adjustment but based on the observations from the other builds as well.
But, you are right, the description also should be updated.

Comment Actions

How will we notice if the build times are impacted? Do we have data about what the average (mean) build time (and variance?) were before we made this change?

Comment Actions

The build times (19m 44s and 22m 06s) are comparable to the build times without the recordings - it doesn't seem to be a massive performance hit. But we would need to have more data to be sure about exactly what the impact is.

Yes, based on the data you provided earlier in one of the previous comments.
The impact will be noticed from the other patches build times.
I don't expect this would be systematic, just a general build observation will do.

Comment Actions

Then I don't understand. We already have a build observation - that's the observation I made in the comment . If we are not going to do a rigorous analysis, what additional observation do we need?

Comment Actions

I think more CI build data would be useful to make sure it is stable.
Since we also introduced some additional logic for the videos recording (deleting the successful tests videos while keeping the failed ones) I believe we could check more builds to validate properly.
If you believe it is stable and it is good to go as the result of the investigation, we can resolve the task.

Comment Actions

I don't think it's stable. I think the build time varies according to the load on the CI servers.

So what is the plan to collect more build data? Is that a task that we are going to assign to someone? How many observations should they collect? What will they compare their observations to?

Comment Actions

Sure, the plan is to collect build times (and observe the failed test videos and confirm that the functionality works on the failed ones), this could be assigned to anyone (possible as a follow-up task or within the current task since it is an investigation task which I recommend).
If we observe a consistent shift beyond the normal spread compared to the baseline, we'll treat that as a real runtime impact and adjust accordingly .. otherwise, we’ll consider it CI noise.
The recommended amounts of the builds are at least 15-30 CI builds or more from or after the date of merge.
The data should be reasonably compared to the posted CI build data (in the comments).

Comment Actions

okay. And what do we think the baseline is? How will we collect data about the baseline if the patch has already been merged?

Comment Actions

The baseline would be the historical runtimes of the same CI job(s) before the merge. Even though the patch is already merged, we can still retrieve (I think) pre-merge durations from Jenkins build history. I don't think it will be going below the baseline, but that would be the perfect scenario.

Comment Actions

Hi in the Test Platform team we are working on T420590 to decrease the feedback time from CI for developers (specific for the mediawiki/core jobs).

And I think adding videos for all tests increase the feedback time? I could fully understand in this task. Or do we say it's not adding overhead/make things slower? How does Cypress record a video, does it use FFMPEG or use Chrome tracing and cut out screenshots and create a video?

I want to make sure that the new Wikibase job that runs for core is below 10 minutes in run time. The median for March is 08:59 so far.

If the change is already is merged you can use the link I added for the median in March to dig into the numbers and see if there's a change. Let me know if you need any help.

Is it possible to either record videos for retries or making it easy to enable them when you actually have failing test? Cypress test is by default slow in CI since we do not run them in parallel and adding extra overhead is something I would like us to avoid.

Comment Actions

Hi @Peter,

Thanks for your work speeding up the tests - it's very much appreciated!

When we enabled recordings for Cypress, we did understand that there would be a performance hit, and I have to admit we don't know exactly what the hit is. I think it does use FFMPEG - we deliberately disabled compression on the videos to minimise the performance hit.

It's a bit difficult to measure the performance hit from the video change in isolation - we've been extending the test suite during that time, and also making other performance optimisations to reduce the runtime. If you have any other data about the performance hit for video recording we would be very interested to have it as that would inform our choices here.

It might be possible just to record videos for retries, but at least the documentation I've seen only describes techniques for deleting videos when specs pass - I didn't find anything there about only enabling recordings for retries.

Comment Actions

Noting that part of our motivation for turning on video recordings was painful experiences with flaky tests that seemed tough-to-impossible to debug from the final screenshot alone. So I don’t think “making it easy to enable [video recordings]” when you actually have failing test[s]” would work very well, because it would mean we mostly don’t have recordings for the flaky tests. On the other hand, recording videos for retries would be fine, I think (either the retry still fails and we have a useful video, or it passes and then we don’t really care because it didn’t block CI).

Arian_Bozorg renamed this task from [MEX] Investigate enabling Cypress video recording for browser tests to [MEX] [M5] [SPIKE] Investigate enabling Cypress video recording for browser tests.Apr 7 2026, 9:00 AM
Arian_Bozorg updated the task description. (Show Details)
Comment Actions

Given that the scope of this ticket was delivered on and that generally this ticket has a lot of conversation in it, I've broken out the next step to a new subtask T425542: [MEX] Enable Cypress video recording for browser tests on retry.

Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits