*It's quite difficult to cover all the themes about frontend optimization. So I'm going to have a number of them:
The first 2 parts of the article are republished from my dev.to account: https://dev.to/xnimorz/hitchhiker-s-guide-to-frontend-performance-optimization-4607 However, I'm going to extend it.
I'm Nik and I'm a frontend developer. Besides writing code, I was a mentor at HeadHunter's developers school: https://school.hh.ru/
We recorded our lectures in 2018-2019. These lectures are opened on our YouTube channel (but in Russian). Here is a playlist https://www.youtube.com/watch?v=eHWMtfqxjes&list=PLGn25JCaSSFQQOab_xMXI3vJ0tDUkFaCI
However, in 2019-2020 school we didn't record our lectures. I had a talk dedicated to frontend performance optimization. After it, I decided to make an article based on the material. As the lecture was 3 hours long, I divided the article into 2 parts.
This longread could be useful as a handbook. We will cover:
The rest of the themes, which were in my lecture, will be in the second article. The second part will cover such topics as layout, reflow, repaint, composite, and their optimization.
0.1 seconds — it is a gap when we perceive a connection between our mouse click or keyboard press and changes in the application or interface.
I think almost everybody saw a lag when you input a text, but the interface handles only a previous word. A similar problem exists with button clicks. The good UX helps me, it tells me: "Okay, just a moment and everything will be done". The latest example I had was when I tried to remove a huge number of emails through a web-version in one email webapp (let it be an anonymous service). When I selected emails and clicked the "remove" button, nothing happened. At those moments I didn't understand either I misclicked or the interface had a lag. The second variant was correct :) It is frustrating. I want to have a responsive interface.
Why should it be 0.1 seconds? The key is that our consciousness makes connections between our actions and the definite changes in the website and 100ms is a good time for it.
Let me show an example. Here is a video clip of 30 Seconds to mars — Hurricane (be careful, it is an explicit one, and has some NSFW parts. You can open the clip on 9:30 and you will be able to catch frames, which we are talking about, during the next 30 seconds): https://www.youtube.com/watch?v=MjyvlD0TwiA this clip has several moments when a screen appears for only 1-2 frames. Our consciousness not only handles this screen but recognizes content (partly).
1 second is a perfect time to load a site. Users perceive surfing smoothly in this case. If your service could be loaded within 1 second you are awesome! Unfortunately, we have a different situation in general.
Let's count what we have to do when a user navigates to our site: network outgoings, backend processings, microservice queries (usually), DB queries, templating, data processing on the client-side (we are going to talk about it today), static resource loading, script initialization. Summing up: it's painful.
That's why usually 1 second is an ideal timing.
10 seconds. Lots of analytics tell us that people spend about 30 seconds visiting a website on average. A site that is loaded 5 seconds consumes 1/6 of user time. 10 seconds — a third.
The next numbers are 1 minute and 10 minutes. 1 minute is a perfect time to complete a small task using a site like reading product info or getting registered. Why should it be only a minute? We don't spend much time these days concentrating on one thing. We change objects of our attention pretty often.
When a user spent 10 minutes on a site, it means they tried to solve their problem at least. They compared plans, made an order, etc.
Big companies have good analytics for performance metrics:
The latest motivator is from Wikipedia:
https://twitter.com/wikipedia/status/585186967685619712
Let's go further:
Let's run a lighthouse check on hh.ru. Looks pretty bad (pay attention it's a mobile configuration of the lighthouse):
Here we have 2 traditional questions:
Who's to blame for this? :) (and it's better to replace with a question why we have this)
What do we do with it?
Spoiler: there won't be a picture of how good our metrics became at the end.
We have 3 common scenarios:
Talking about first-page loading, we have 2 the most important stages of page readiness from the user's point of view: FMP (First Meaningful Paint) and TTI (Time to interactive):
FMP for users indicates that we have text, and they can start consuming content (of course in case you are not Instagram or youtube).
TTI === the site is ready to work. Scripts are downloaded, initialized, all resources are ready.
The most important metric for HeadHunter (hh.ru) is FMP, as applicants base behavior is to open vacancies search and then open each vacancy in a new tab so that users can read them one by one and make a decision whether they want to apply to this vacancy or not.
With some nuances, FMP is one of the best metrics to measure websites' critical render path. A critical render path is a number of actions, resources, which should be downloaded and processed by the browser before showing a first result appropriate to users' work. Minimal resources, we have to download, are HTML, CSS stylesheets, and blocking js scripts.
TL&DR;
Make a navigate request (DNS resolve, TCP request, etc.)
Receive HTML-doc;
Parse HTML
Build the DOM (Document object model)
Send requests to download blocking resources (works in parallel with the previous process)
Receive blocking resources, especially CSS-code. In case we have blocking JS code, execute it.
Rebuild the DOM if needed (especially in case blocking JS mutates DOM)
Make CSSOM tree
Build Render tree
Draw a page (Layout ⇒ paint ⇒ Composite)
Note: Reflow could be executed additionally on previous stages, due to the fact that js could force it. We will cover this part in the second article
Request
Make a request, resolve DNS, IP, TCP, etc. Bytes are running through the sockets, the server receives a request.
Response
Backends execute a request, write bytes into the socket. We receive the answer like this:
We receive a bunch of bytes, form a string due to the text/html
data type. Interesting thing: first requests are marked by the browser as a "navigate" request. You can see it if you subscribe to fetch
action in ServiceWorker. After receiving data, the browser should parse it and make DOM.
DOM
We receive a string or a Stream. In this stage browser parses it and transform a string into a special object (DOM):
This is only a carcass. At this point, the browser knows nothing about styles, hence it doesn't know how to render the page.
Downloading of Blocking resources
Browsers synchronously process HTML. Each resource either CSS or JS could be downloaded synchronously or asynchronously. When we download a resource synchronously we block the rest of DOM processing before we receive it. That's why people recommend putting blocking javascript without defer
and async
attributes right before the closing body tag.
So each time browsers get to the blocking resource, they make a request, parse the response, and so on. Here we have some limitations such as the max number of simultaneous domain requests.
After all blocking resources are received, we can form CSSOM
CSSOM
Let's suggest, besides meta
and title
tags we have style
or link
. Now browsers merge DOM and CSS and make an object model for CSS:
The left part of the object (head
and the children) isn't interesting for CSSOM, as it wouldn't be shown to the user. For the rest of the nodes, we define styles, which browsers will apply.
CSSOM is important, as it helps us to form RenderTree.
RenderTree
The last step between making trees and render. At this stage, we form a tree that will be rendered. In our example, the left part won't be rendered, so we will remove it:
This tree will be rendered. However, we could get a question. Why do we render "RenderTree" instead of DOM? We can check it easily by opening DevTools. Even though DevTools has all DOM elements, all computed styles are based on RenderTree:
Here we selected a button in the Elements tab. We got all the computed data of the button: its size, position, styles, even inherited ones, etc. After making the RenderTree the browser's next task is to execute Layout ⇒ Paint ⇒ Composite for our app. Once Composite is ended user will see the site. Layout ⇒ Paint ⇒ Composite could be a problem not only for the first render but also during user interaction with the website. it's why I moved this part to another article.
Read next part: What can we do to improve FMP and TTI?