Streaming Video for beginners — 101

5 min readFeb 21, 2022

Below is a set of key terminology associated with streaming video. This article was written by a member of the Flaneer team (Tom Hoxey), while working on our own Streaming Protocol.

If you are interested in this kind of work, feel free to reach out or to check out Flaneer!

Quality Terminology

Generally, asking for “higher quality” regarding video streaming is unhelpful, since there are such a high number of factors that you could be referring to:

Resolution

Perhaps the most known factor effecting quality is resolution. This refers to the number of pixels in a frame of the video. The most used terms to describe resolution are HD: (1280x720), Full HD (1920x1080), WQHD (2560x1440) and 4K UHD (3840x2180). The numbers after each acronym refer to the number of pixels in X and Y, i.e. HD has 1280 pixels across and 720 pixels down, for a total of 921600 pixels.

Latency

Latency is the time between an image being captured and an image being displayed. If I record a frame of my desktop, process it then 1 second later I can view it, the latency is 1 second. For the illusion of interactivity with a desktop PC, the latency needs to be as low as possible. If a user describes a “laggy” experience, they are more than likely referring to high latency.

(The current scientific consensus is that humans can’t perceive latency under 13ms)

Bit-Depth — Colour

There are many factors that affect the final “colour” of an image, however, a key one is bit-depth. The higher the colour bit depth, the higher the quality of colour fidelity is achieved in the video. This means that the more bits we use to represent a colour, the closer that colour is to how it was recorded. It is best shown in this image:

See that 8-bit colour has a lower number of reds it can display, so the image is less smooth than the 24-bit. Also shown is a process called dithering that allows for an artificial “smoothing” to be applied to make the most of low bit depth.

Lossy/Lossless

These two terms refer to the different types of compression algorithms. Lossy algorithms will lose some data during the encoding/decoding, whereas lossless contains the same data after decoding. (Note, a lossless image may look slightly different, but the original image could be reconstructed exactly from the data present). This is once again best seen in an image:

Bit Rate

At the most fundamental level in computing, everything is a bit, a 1 or a 0. Bit rate is therefore self-explanatory in that, the bit rate tells you how many bits per unit of time are processed. Building from our understanding of colour depth, you can see that a pixel, a single point of colour, could be anywhere from 8 to 32 bits in a usual image stream.

Imagine an image stream is a car carrying pixels from A to B, the car can take one million 8-bit pixels from A to B in an hour trip, the bit rate would be 8,000,000 bph (bits per hour). However, since networks are faster, we tend to talk about bits per second: bps. As computers have got better we have started to talk about Kbps (1000bps) or Mbps (1000000bps). Our car then has an effective bit rate of 2.2Kbps.

In general terms, a higher bit rate means a higher quality image. However, one must also consider the specific streaming technology used, since better streaming technology can get more out of the same number of bits. Imagine if we could somehow pack our pixels into the car so that we could reuse the same pixels multiple times, or instead of several small blue pixels, we could send one big one that does the same job.

Streaming Terminology

This section addresses terminology that deals with the data side of streaming, think: what am I sending and how does it get there.

Protocol

A streaming protocol is a series of rules that define how data should be exchanged between multiple information systems. These are technically nothing to do with images, but often the creator of a protocol will use specialist information about the images to inform how it chooses to exchange them. A good analogy would be school grades, all teachers agree that an A is a high grade and F is a low grade. That way they can create a cohesive report card. A streaming protocol works in the same way, there is an agreement that a certain signal means “stop the video” and since all implementers of the protocol agree, there is a cohesive streaming experience.

Codec

Short for “encode/decode” a codec defines how an image or audio is encoded (i.e. made smaller) and decoded (i.e. restored to original size). Codecs all essentially work by looking for a way that the same (or similar) information that can be stored in a smaller way. A good way to think about this is flatpack furniture. A flatpack desk is disassembled (encoded) and instructions are written how to reassemble it. The disassembled desk is then easy to transport from A to B. Once the desk arrives for it to be used, it must be reassembled following the instructions (decoded). In this analogy, the codec is the instructions, different codecs could provide different instructions to make the same desk.

Container

The container is the file format of the stream. It is made of two parts, the header and the data. The header is like a cargo manifest and the data is like the cargo. Different formats will pack different data in different ways to achieve the same video. The header is not the instructions on how to pack/unpack the data (that is the codec), it is simply a description of what data is present. The image below shows what a video container might look like: