How do Websites work?
Welcome to your first lesson! I hope this gives you a taste of what Developer Atlas is all about. First time here? Check out The Curriculum.
What it is
It took me a long time to connect the dots between the HTML and CSS I learned in tutorials and how that translated to making a real website. Why is HTML relevant? Let's take a step back: how do websites actually work?
Why it matters
We’ll talk about how to ask the right kinds of questions so you can transcend limited, tutorial-type knowledge and attain real understanding.
What you’ll learn
- The function of Servers and Web Browsers
- Abstractions and the principle removing magic
As with any lesson, if you suspect you know this already and don't want to waste your time, scroll down and skim the 'Comprehension' section first.
What's a website?
A website is just a bunch of files on somebody else’s computer. That’s it. A web browser’s main function is to use the Internet to ask that other computer for those files.
In order to identify the right computer to ask, every computer on the Internet has a number associated with it called an IP (Internet Protocol) address. ‘Google.com’ is just an easy-to-remember nickname (DNS name) for that computer's ip address.
Go into your browser and type in Google.com’s ip address, 18.104.22.168.
Boom, there’s Google. Neat eh?
A domain name like Google.com is the name that your web browser uses to know what computer to ask for the files, and a URL like https://www.google.com/images/branding/googlelogo/1x/googlelogocolor272x92dp.png identifies a specific file on that computer. In this case, the Google logo image from the homepage.
Your computer, the one asking for the files, is called a client. The computer you are asking for files from is a server. A web server’s job is to take files on its local machine and make them available through an Internet interface. It listens for clients requesting files, and it serves them. The simplest web servers, known as static servers, just read the files off of their hard drive and give them to you.
Go to this simple example website.
Right click on the edge somewhere and choose
View Page Source
This is the file that the server sent back to your browser. Even though you see a website, all that happened behind the scenes was that the browser asked for a text file and got one. This HTML file is a special format of text that the browser knows how to turn into a website. Besides downloading files from servers, a web browser’s main job is to interpret these files and turn them into the things you see.
Most webpages are made of many different files, but initially the browser only got this single HTML file. See the links in the HTML source code? The browser reads through the HTML to figure out what additional files it needs to download from the server.
Specifically, the HTML has
Even on huge sites with hundreds of images and other files, it all starts with a single HTML file, represented by a single URL. When you go to the root directory of a website (like Google.com/ with nothing after the slash), you are asking the server for it's default file, usually called
/index.html to the end of our example site:
You get the exact same thing.
On the example site, Choose
File -> Save Page As… and save the website to your computer.
If you double click the file you grabbed,
My test page.htm, your browser opens it up and there’s the page again. This time, instead of getting the files from a server, your browser opens the files from your hard drive just like Photoshop would open a .jpg.
Is that it?
So what happens when you type Google.com into your browser's address box and press enter? Try explaining it out loud.
This is a famous interview question because there’s so much hidden complexity. Check out this GitHub repository where people collectively try to answer it without glossing over anything.
This level of detail is mind-killing: we’ll never be able to get anything done if we are constantly preoccupied with all of these extra details. The whole story even more complex than that braggy document lets on. Remember, keep drilling down and it’s all just electricity at some point, but nobody is actually thinking in ones and zeroes.
To deal with the high level of complexity, developers rely on one of the core tenets of software development, the concept of abstraction.
Abstraction means forgetting the details and only worrying about what is relevant. It means thinking on a ‘higher level’ where you can assume that the lower levels just work like they’re supposed to in order to free your brain up for higher level reasoning.
When I said "use the Internet to ask that other computer for those files”, I was working on an extremely high level of abstraction. I didn’t care how it ‘asked’ the server for the files, because I was more concerned with explaining the interaction. I can always go down a level and start thinking in terms of HTTP and the TCP/IP protocol, but doing so in this lesson would bog us down, so we abstract it away.
Know your abstractions
Before the lesson, what was your mental model about how websites work? If you weren’t familiar with the role of browsers and servers already, perhaps it was simply a fog of uncertainty. It didn’t matter because it just works. Developers commonly refer to this sentiment as ‘magic’ and use it as a bad word. One might commonly say ‘that library was easy to set up but it was a little to magical for me,’ meaning it was hard to figure out how it actually works.
Its not necessary to 100% understand all of your tools all the time, but it pays to be wary. If you use something that you don’t understand, it will bite you when it stops working: more than half of a developer’s time is spent fixing problems. Often, debugging involves moving down a level of abstraction to figure out what is going on under the hood.
Get rid of magic
For instance, I explained that a website corresponds to a single computer and a URL corresponds to a file on that computer. Once you picked up this mental model, perhaps your spidey sense started tingling. Google.com isn't powered by one computer, obviously, so what are you not getting?
When the web started out, all web content was static. Web apps didn’t exist, and websites were just like pages in a newspaper. Thats why they are called pages, after all. HTML takes many of its naming conventions from newspapers.
Things have gotten a lot more complicated since then. A web app with millions of users doesn’t have millions of different files on a big hard drive for each of them. Instead, the server figures out what content to return to the client, and dynamically generates the appropriate files on each request. These are called dynamic webpages. Still, as far as your browser can tell, it asked for a file and got one.
Correspondingly, every computer on the Internet has its own IP address, but the whole of Google.com isn’t powered by a single server. They use a virtual IP address, essentially a trick to make a big fleet of computers all look like a single machine from the outside. Google uses a vast network of interconnected servers, but as far as your browser can tell, it asked a single computer for a file and got it.
There, perhaps that was the first new thing you’ve learned from Developer Atlas so far! All because we tested our abstraction.
Thanks to Google, answers are generally easy to come by if you know what you are looking for. Learning effectively is all about figuring out the right questions to ask. Honing your spidey sense is how you do that.
Do I really understand this?
One of the best habits to form when you are learning to code is to ask yourself, “Do I really understand what is going on here? What are my assumptions? What level of abstraction am I working at?” If you realize that something is magic, challenge it with directed questions.
For example, just ten minutes ago when I was writing this, I noticed that the webpage’s main file was called
My test page.htm, not
My test page.html. Computers are extremely concrete, and based on my mental model, if that ‘htm’ was a typo I’d expect it to simply not work at all.
I googled "difference between html and htm", skimmed this article for 10 seconds, and ascertained that ‘htm’ and ‘html’ files are the same thing and this is just trivia. Still, if I had been surprised I would have leveled up my understanding.
You shouldn't go down every rabbit hole, because otherwise you’ll never get around to having fun and making cool stuff. Be pragmatic- over time you’ll get a feel for when your lack of understanding is becoming a problem and figure out how to learn strategically.
Feed your curiosity. Know your abstractions and eliminate magic. If something doesn’t make sense, investigate! If you can’t figure it out, ask me and maybe we can together. Odds are you are not alone in your confusion.
- So, how do websites work?
- What is an IP address?
- What is a virtual IP address?
- What does a web browser do? What are its two main jobs?
- What does a web server do?
- What is the difference between a static and a dynamic web server?
- How does the browser figure out what files it needs to ask for to construct a webpage?
- HTML, CSS, and JS are the Skin, Bones and Brain of a website. Which is which?
- What is HTML and what does it do?
- What is a markup language?
- What is an HTML element?
- What is CSS and what does it do?
- What is abstraction?
- What do I mean by getting rid of magic?
When you downloaded the site, what did the browser do to the links to the other files in the HTML? Why?
Open the HTML file you just downloaded in a text editor (any will do for today) and delete this line and save it:
<link href="./My test page_files/style.css" rel="stylesheet" type="text/css”>. What do you expect to happen when you refresh the browser? Does this make sense?