Author’s Note: This was actually meant to be the first part of a series called Three C-Words of Web App Security, dealing with CORS, CSRF, and Clickjacking, each in its own post. But as I started writing the exposition necessary to provide context around these issues, I realized that I really had so much background to cover, that I need a dedicated post just to detail it. That post is what follows.
To understand modern mechanisms, it’s helpful to look at what led to their creation. I would like to start by going back in time, not quite to the beginning of the Internet, but to the websites of the 90s – the Internet as it was when residential Internet access originally boomed in North America. Many websites at the time simply interlinked HTML documents. Styling and positioning was done with attributes on the HTML tags, and the content was static. This is a somewhat different meaning of the word static as we often apply it to web content today. Today, static content typically means content that is simply served as-is, without being affected by request variables or mutable databases or who is logged in. Static elements of today often still create interactive experiences of things that expand and collapse, hide and show, etc. The static sites I’m referring to from the 90s had some mix of text, images, and links (to positions on the same page or to other pages), and often nothing else. The only moving content was the animated GIFs showing that the page was under construction, or scrolling text in the IE-only marquee tag or perhaps flashing text from a blink tag in Netscape.
But not all sites were static, even then. Search engines, web-based email, chat rooms, and even browser-based games existed. And for those to work, there was a dependency on HTML forms – a means of accepting user input. Classic web forms usage had your form fields, and a submit button.
<form name="indexform" action="/cgi-bin/password.cgi" method="POST">
Much like today, the user would fill out the input and then submit the form. Unlike many modern applications, a full page refresh would occur. The form submission would generate an HTTP request containing the name and value of each field, the web server delegate to server-side code (often standalone scripts, popularly written in Perl) to process the form, new HTML would be generated and included in the response. The browser would then render this new HTML. In the case of a data validation error, this generally meant responding with the same form page. Because HTTP is stateless, the values that the user entered would be lost. A common solution for this was to populate that input into the value attributes of the form fields when constructing the response. So when the user loaded the form the first time, a given input might look like this:
<input name="email" type="text" value="" />
Which would render as an empty textbox. The user would fill out the form, perhaps including an email of mic@professionallyevil.com. They would hit submit, and if there was some sort of data validation error, the form would be recreated with the following:
<input name="email" type="text" value="mic@professionallyevil.com" />
There were some common patterns adopted, such as the Post-Redirect-Get pattern, but the general approach was widespread for more than 10 years, including through the rise of web domain-specific languages like PHP, and classic ASP.
In some cases, the HTML responses were constructed with simple concatenation, but as technologies progressed, often they used inline code:
<input name="email" type="text" value="<?php echo emailAddr ?>" />
In the above example, the emailAddr server-side variable is echoed inline. One way the implementation might be built around this is to initialize emailAddr to an empty string. Then, if handling a response that has an email parameter, assign that parameter’s value to this variable before the above line is interpreted. The common security problem this presented as that an emailAddr of [mic” onclick=”alert(1);] would of course be inlined as follows (square brackets are only included to delineate the user-supplied input):
<input name="email" type="text" value="[mic" onclick="alert(1);]">
This would could be made safe with input sanitization and/or output encoding, but generally it was up to the developer to remember to do that everywhere. This was one factor, but certainly not the only factor, that popularized the move toward server-side web-specific template engines like ASP.Net Webforms and Java Server Pages (JSP). I note that JSPs and other engines like Apache Velocity had been around for a while, but to me they appeared to have a significant uptick in the early post-2000 years.
<asp:TextBox id="emailTextbox" Text="mic@secureideas.com" runat="server" />
The above ASP.Net Webforms example looks a lot like the HTML examples that preceded it. Much like PHP echo example, this is not meant to be served to the client. Instead, the server would parse the template and render plain old HTML. But in this example, the application framework could default to treating input as unsafe – and automatically sanitize it without the developer explicitly instructing it to.
But from a user experience perspective, there was still the ugliness of a full page refresh anytime a form was submitted. Keep in mind that the average speed for consumer Internet access was slower than today as well, so these round trips to the server were quite pronounced.
Shortly after this, a young developer named John Resig created a clever little JavaScript library for DOM manipulation (dynamically changing the structure of a web page without reloading), that really started to pick up a large following of developers by 2007 or so. This library is still common today, and is called jQuery. One of the main issues it solved was that browsers of the time didn’t implement W3 standards nearly as consistently as today. JavaScript that worked in IE7 would completely fail in IE6. The feedback you would get was a single string of text: Error on page. Firefox was another big player in the browser space, and their interpretation of the standards was often different (and popularly viewed as more correct) than Microsoft’s. JQuery smoothed out most of these cross-browser problems, making a statement work cross-browser in a way that was seamless to the developer. In my estimation, this easing of DOM manipulation was a catalyst to the pursuit of efficient ways of retrieving data without reloading the page. jQuery didn’t do anything that couldn’t be written in vanilla JavaScript, of course, but it made it more accessible to web developers.
And it did something that was critical to the modern web application as well: it made AJAX (Asynchronous JavaScript and Xml – although the acronym is often used loosely to refer to all requests initiated by JavaScript in the page) requests easy to write in cross-browser compatible way.
Web services were not at all new, at this point. They were around for most or all of the time that we’ve discussed, but they had been mainly consumed by servers, thick clients, and heavier browser-based runtimes like ActiveX and Java (in the form of applets). But it quickly became popular for JavaScript running in the web page to call a web service directly. In some cases it received data (at this time, often XML formatted), and then destroyed and created new DOM elements to update the page without ever doing a full refresh. A simple example of this is when a user changed paged in a paginated list or table. Other implementations actually rendered a fragment of HTML on the server, and supplied it in the AJAX response, with the JavaScript replacing the contents of a container with the new HTML. In either case, this was the beginning of the end full-page refreshes.
Another application structure that was rapidly becoming the standard approach to web applications was the MVC (Model-View-Controller) architectural pattern. Ruby on Rails was around, though quite young, and doing just this. A few other examples were Jakarta Struts and ASP.Net MVC. Where most of the aforementioned server-side frameworks had a direct 1-to-1 relationship between a page and it’s server-side code, MVC reorganized it so that generally URLs corresponded to methods or functions defined in the controller classes. Typically, some sort of router object would parse the HTTP method and URL of an incoming request, then figure out which function in which controller should be handling it. The request object would be passed to that function, the server-side business logic would run, and a result would be returned. Initially, that result was usually a view – usually a template containing presentation logic, which consumed a model or viewmodel object containing the data to include in that template.
The growth of MVC, the move toward heavier use of AJAX, and the growing prevalence of web services consumed by browser-based JavaScript all sort of catalyzed the movement toward RESTful web services, which, even then, were approximately 15 years old. Controllers would often return data in either XML or JSON, essentially acting as a web service endpoint.
Client-side UI libraries like Backbone started to appear, which increased the movement toward shifting even complex presentation logic to the front-end of the application. There were a number of libraries that handled individual parts of the front-end structure (e.g. Knockout for two-way binding, Handlebars for templating, Sammy for routing), but Angular 1.x and Ember are noteworthy for being some of the most popular complete front-end frameworks of their time. In a way, they mirrored the server architecture, so server-side you would have your Model (sort of your data object) as it related to persistence and server-side validation, and you would have the Controller which essentially provided server’s interface in the form of URL-accessible endpoints that returned data. On the client-side, was the Controller which handled logic related to presentation e.g. event handlers for user interactions such as button clicks. You also had a Viewmodel, the JavaScript object with a two-way binding to the View which was generally a template or partial template that was rendered client-side. The two-way binding part meaning that if the viewmodel is a JavaScript object like this:
var user = {
userId: 1,
name: 'mic',
email: 'mic@professionallyevil.com'
};
And that is bound to a form, something like this…:
<form>
<input type="hidden" name="id" value="" />
Username: <input type="text" name="username" value="module_16621475137391" />
Email: <input type="text" name="email" value="" />
<button name="save" />
</form>
…then not only will I see the username and email textboxes populated with their respective values, but when I edit the value in the name textbox, the user JavaScript object (my viewmodel) will be changed to match. This means that the controller doesn’t have to directly interact with the HTML elements. When I click save, and the controller handles that event, it will simply use the viewmodel to do its thing (probably sending an HTTP post to the server with that object JSON-serialized as the payload).
Now, the three most popular front-end frameworks are React, Vue, and Angular (Angular 2 was a major rewrite without a direct migration path, and Angular 1.x has been retroactively rebranded AngularJS, with its successors simply being called Angular). I’m not going to go into too much depth here, as it’s not relevant to the subsequent articles, but the general move is toward building UIs as self-contained components, often with hierarchical nesting. In many respects, each of these individual components behave somewhat like the client-side MVC/MVVM applications described above. Application back-ends today are still most commonly the RESTful service APIs that are essentially the server-side MVC framework, minus the V.