TCP

What it is

The Transmission Control Protocol, or TCP, is the networking protocol used on the internet (alongside IP, the internet protocol) to provide a reliable data stream between two machines.

The internet is an insanely complicated thing. How incredible is it that the bits sent out of the wire in the back of your computer can zip to the other side of the planet through such a vast and decentralized network and come back without getting garbled or lost? It boggles the mind.

The sheer complexity makes reckoning with it difficult. To make it easier, computer networking can be broken down to a stack of layers built one on top of the other. Each provides a higher level abstraction.

At the most basic, you need a way to be able to turn a voltage on and off on a wire to send a binary signal from one end of the other. This is called the physical layer, of which ethernet is the most common protocol.

Trusting that ethernet works how it is supposed to, you can stop worrying about electricity and grapple with the next set of problems, such as how to route data through a network of many computers, how to handle congestion, how to detect errors, etc. These layers progressively build one atop another until you have the internet. All told, there are 7 total layers of what is called the OSI model.

TCP is the 4th layer, called the transmission layer. Below it is the networking layer, in which the Internet Protocol (IP) breaks data up into chunks called packets and handles routing them across nodes on the internet to go from one specific computer to another.

Imagine you needed to ship a book through the mail using only envelopes, each of which can only hold a few pages. The IP protocol tears the book into chunks (packets), puts them in envelopes, and mails them. On the other end, the receiver will get hundreds of letters arriving randomly. They'll need to put them all back into the correct order so that the book is coherent.

This is the role that TCP plays on the internet. TCP sits on top of IP and stitches those packets back together into the original message. Basically it just adds a few extra bits to the beginning of each IP packet to give it a sequential number. These bits are called the TCP header.

Obviously it is more complex than that- look how long the TCP spec is. TCP has many other features, such as a handshake protocol for establishing a connection between two machines and a concept of ports to allow multiple different connections between two machines.

TCP is not the only transport protocol out there. Notably, UDP is a popular alternative that foregoes explicit packet ordering for applications where latency or bandwidth is more important than exactly reconstructing the original data, such as with online games or streaming video.

Why it matters

TCP has been around for a long time, and along with IP is the backbone of the internet. The web, email, SSH and beyond all are built as an additional (application) layer on top of TCP.

To build simple web apps, it may not be necessary to know how the protocol works inside and out, but it is important to know the basics of what TCP is and how it fits into the bigger picture. For example, it is common to use unix commands like netstat to check if a web server is running via the shell.

In general, you should strive to completely understand how your program works from top to bottom, all the way from electricity up, because typically when something goes wrong, thew way you'll fix it is by moving down levels of abstraction. Eliminate magic.

How to learn it

Deeply understanding TCP will take some work. Computer Networking is typically an entire undergraduate class. Most of them use Andrew Tanenbaum’s Computer Networks textbook. Consider adding it to the backlog of textbooks you’ll eventually read, that’s how I learned it.

Online, there’s plenty of articles, tutorials and classes that cover the same material: