Make fewer tests that are larger, because the tests are going to be extremely expensive and fragile. Make assertions that are very permissive, eg assert that you see a success message, but not what the message says. Because there's a million ways for these tests to fail. Someone changes some wording in a message and tests fail. Someone changes the DOM structure and the tests fail. Someone changes a classname and the tests fail. Some async operation takes longer than expected or resolves in some unexpected order, and the tests fail. But in all those situations, the thing you are testing is still correct, it's just that the way the test checks it is fragile, and the thing you are testing is volatile. So you want the tests to only assert the most important things, and the art of it is figuring out how permissive you want to be. Eg you want to be assertive enough that they don't accidentally pass when they actually should fail, but still permissive enough that they can tolerate irrelevant volatility.
Additionally, if the people writing the code aren't the ones writing these test suites, then they may write it in ways that make it difficult for the test suite. So communicate the difficulties to them, maybe send PRs to add things you need for the test suite. For example, it is often worthwhile to add test specific endpoints to manipulate the state of the app. These endpoints would only be added in the dev and test environments, not in prod, and they might do things like log you in, or provide you information about your session, allow you to put some data into some specific state that you want to test. Otherwise each test has to log in, and create the data before it can even begin testing, so test setup alone can start taking 30 seconds, and if you then also make the mistake of doing unit-test style tests, then you might have 20 tests, each testing some small thing, and each requiring 30 seconds to setup, now you have a 10 minute test suite, which is incredibly painful, so test-specific endpoints let you reduce that setup cost, and batching your tests/assertions together lets you share that setup cost.
Test failures will be difficult to diagnose, so CI environments will give you... uhm, I forget what they're called, maybe "assets" or something, but basically a way for your test to export relevant data. For example you might record a video of the test and export that, you might record a screenshot and the state of the DOM when there are failures, and export that. The goal of these things is mostly to give you the context to understand why the tests fail.
Async will be your enemy. If you depend on some async thing, you may need to modify the code to expose promises you can
.then on, or to allow you to provide callbacks for your tests. In general, modifications to code for the tests makes the code better, but just keep security in mind, ask yourself if the change you're making could be exploited. You might modify the code to emit events, and then your tests could listen to the events (I haven't tried this, I just realized it's probably a reasonable way for the code and tests to collaborate without needing direct dependencies on each other). If you can't easily hook into events / promises, then you can poll for changes. But this is often fragile. Eg if a flash message pops up and then fades, then you need your code to poll during that short period of time, otherwise it will look like it never happened. Polling is generally slow and annoying, but it can work where other things won't. Using sleep / timeouts to try and bypass race conditions from multiple async things is similar to polling, but just know that this is extremely fragile. You will almost certainly have to go and tweak the duration a whole bunch of times, and then you'll see you're waiting 10 seconds before you make your assertion, but the page resolved after 200ms, so then you'll want to reduce the duration, but then sometimes the page takes 8s. So you're stuck between asserting too early and failing where you should pass, and wasting all this time waiting when the page has already resolved. This is why polling is better than sleeping, and collaborating (allowing the test to hook into the code's promises or events) is better than polling. But, don't be afraid to start simply with things that are easy like sleeping and polling, and then modifying things later to use something more reliable. In general, avoid introducing abstractions and patterns until you have several places that can use them, otherwise you'll probably introduce the wrong one.
Definitely create abstractions for your tests. Repeating a very similar set of steps a lot? Toss them in a function. As the test suite grows in complexity, maybe consider page objects. Wrap complex volatile interfaces with a small well-defined interface that you can call. Then all your code depends on the small, well defined interface, and whenever the volatile interface changes, you only have to go update that one wrapper interface and all the places that use the wrapper will then work again. This is much more maintainable than having to go to every place that uses the wrapper to update them. But still, don't introduce the abstractions until after you have a few places that you want to use them. 3 is probably the magic number there. 3 places that use it allows you to see what you need, rather than speculating, and prevents you from baking one use-case's context into the abstractions.
Okay, I think those are my thoughts for now. If you experience pain and issues that I haven't addressed here, hit me up and I'll think through them with you