Back to blog

Follow and Subscribe

The VCL Cookie Monster

Rogier Mulhuijzen

Senior Professional Services Engineer

This month's tip is more a theoretical exercise than anything else, just to show the power of VCL, and to explain a few regular expressions. I'm going to discuss VCL that deletes cookies.

Let's assume you have a Varnish server that only serves static content. Ideally you don't want the browser to send any Cookie headers, to keep the request as small as possible. This means you don't want your Varnish to send any Set-Cookie headers to the browser.

You may have started using a separate domain, xmplsttc.com, so that the cookies that were set with ; domain=.example.com don't show up on this server.

Unfortunately, one of your developers has deployed some code that set cookies on this domain. He has since seen the error of his ways, and gotten rid of the code, but there's still a lot of clients that are sending all these needless cookies, and you would love to get rid of them.

Look no further, because the code below does just that. Let's look at it first, and then I'll discuss each part in turn.

backend default {
  .host = "static.internal.example.com";
  .port = "80";
}

sub vcl_recv {
  unset req.http.Authorization;

  if (req.http.Cookie) {
    set req.http.X-First-CookieName = regsub(req.http.Cookie, "^([^=]*)=.*$", "\1");
    if (req.http.X-First-CookieName == req.http.Cookie) {
      unset req.http.X-First-CookieName;
    }
    unset req.http.Cookie;
  }
}

sub vcl_miss {
  unset bereq.http.X-First-CookieName;
}

sub vcl_pass {
  unset bereq.http.X-First-CookieName;
}

sub vcl_fetch {
  unset beresp.http.Set-Cookie;
}

sub vcl_deliver {
  if (req.http.X-First-CookieName) {
    set resp.http.Set-Cookie = req.http.X-First-CookieName
                               + "=; Expires=Thu, 01 Jan 1970 00:00:00 GMT";
  }
}

The VCL above is an example of a full VCL for your Varnish. Note the lack of return(); statements. One of the features of VCL is that it will fall through to the default VCL that is compiled into it. So after Varnish is done running the vcl_recv above, it will run the default vcl_recv, which adds X-Forwarded-For and determines whether to try to cache or pass through, or pipe the request.

The default VCL is usually placed in /etc/varnish/default.vcl, and you should give it a read-through to see what it does. That way you will know exactly what will happen if you don't return from your VCL subs.

One of the standard actions is to do return(pass); if there is either an Authorization or Cookie header. Since this server is only dealing with static assets, and has no authentication or authorization logic, the Authorization header is stripped from the request using unset req.http.Authorization. With the header removed, the default vcl_recv will not see it and will not do a pass-through because of it.

unset req.http.Authorization;

Instead of checking its existence before removing it, it's easier to just remove it. unset will not do anything if the header is not present.

Next up is the Cookie header. There is no gain in executing the code without checking the existence of the Cookie header.

if (req.http.Cookie) {
  set req.http.X-First-CookieName = regsub(req.http.Cookie, "^([^=]*)=.*$", "\1");

There aren't really variables local to the request available in VCL, so what all VCL writers do is store data in custom headers. In this case the data is being stored in a request header called X-First-CookieName.

As a refresher, a Cookie header looks like this:

Cookie: name1=value1; name2=value2

regsub() is a function to do a regular expression substitution. In this case, the whole string is being replaced with just the name of the first cookie. To make sure it's the whole string, and not just part of it, the regular expression (regex) is anchored by ^ and $, which match the start and end of the string respectively.

Varnish filters out any whitespace at the start of the header value, so the first character of the Cookie header will be the first character of the first cookie name. [^=]* will match zero or more characters that are not =. Regular expressions are greedy, meaning an element followed * will always be matched as much as possible. Since regexes are greedy, [^=]* will in fact match all characters up to the first = -- which is exactly the part you're interested in. The = is a literal character, so that will match the first = in the line. And .* is there to make sure you match everything up to the end of line anchor ($).

The \1 in the replacement string is what is called a back-reference. Back-references refer to groups in the regex, which are defined using parentheses. \1 is the first group, \2 is the second, etc. In this case there is only one group, which is [^=]*.

My favorite regex dissecting tool is Regex101, and I've put the regex and the example Cookie header here for your viewing and experimentation pleasure. The upper right panel has the explanation, and the middle right panel shows the match groups.

If the Cookie header is malformed, and does not have any = in it, then the regex does not match. If you went to the example on Regex101, replace the test string with just name1 to see the effect. Since this is a substitution operation, not finding a match means the input string (the Cookie header) is left untouched, and in this case req.http.X-First-CookieName will just be a copy of req.http.Cookie.

if (req.http.X-First-CookieName == req.http.Cookie) {
  unset req.http.X-First-CookieName;
}

After this bit of code, the custom header should either contain the name of the first cookie, or it should not exist at all.

Since I'm being hypothetical here, if instead of the first cookie you wanted the second cookie (you probably won't want this, but I'm throwing it out there), you could use ^[^;]*;\s*([^=]*)=.*$ as the regex. I have also saved this one on Regex101, here.

^ anchors it to the start of the line. [^;]* to skip as many characters as possible that aren't ;. A literal ;. And then \s* to match any whitespace that might be between the ; and the next cookie. RFC 6265 says there must be a single space at this point, but I'm just being a little extra forgiving here. And then the rest is just like the regex earlier.

  unset req.http.Cookie;
}

And last, but not least, the Cookie header is removed. Just like with the Authorization header, this is to keep the default vcl_recv from deciding to do a pass-through.

sub vcl_miss {
  unset bereq.http.X-First-CookieName;
}      

If the request could not be satisfied from the cache, all the headers are copied from req (the request) to bereq (the backend request). The snippet above removes the custom header from the request that will be sent to the backend, but does not touch req, leaving the original in place.

sub vcl_pass {
  unset bereq.http.X-First-CookieName;
}

If the default vcl_recv decided to do a pass-through (most probably because the request was a POST instead of a GET) Varnish will not execute vcl_miss, but vcl_pass instead.

Left out here is vcl_pipe, but that would look the same.

sub vcl_fetch {
  unset beresp.http.Set-Cookie;
}

To prevent the origin from ever setting a cookie again, the Set-Cookie header is removed from any responses from the backend. It is done in vcl_fetch and not in vcl_deliver purely for efficiency reasons. If the response from the backend is cacheable, it will be stored in the cache after vcl_fetch has made its changes.

If there are 99 hits for every miss, removing it in vcl_fetch would mean it is removed once. Removing it in vcl_deliver would mean it is removed 100 times.

sub vcl_deliver {
  if (req.http.X-First-CookieName) {
    set resp.http.Set-Cookie = req.http.X-First-CookieName
                               + "=; Expires=Thu, 01 Jan 1970 00:00:00 GMT";
  }
}

And last but not least, if the custom header (X-First-CookieName) exists, a Set-Cookie header is added to the response. The name of the cookie is the name of the first cookie in the request, and the Expires is far in the past. Most browsers will delete the cookie after receiving this.

Caveats and possible improvements

The highly hypothetical case above does not fully take into account the domain parameter, and flat out ignores path. If you feel up to a challenge, try to come up with a way to cover either of them. You can use the comments below to post your code and/or have a conversation with me about it. Or you can post on our Community Forum where some colleagues and I usually answer VCL questions.

Since this Varnish is only serving static assets, it doesn't need to support anything except GET and HEAD. So you could add the following to the end of vcl_recv:

if (req.request != "GET" && req.request != "HEAD") {
  error 403 "Forbidden";
}

This will prevent anything that is not a GET or HEAD from being forwarded to your backend. Instead, the 403 error status will be returned to the browser, with the body defined in the default vcl_error.

If you use this snippet, you can do away with the vcl_pass, of course.

Another possible improvement to combine with the very last snippet is to not bother with removing the Cookie and Authorization headers, copying the X-Forwarded-For code from the default VCL, and ending your vcl_recv with return(lookup);. Personally I prefer these two headers not to show up on my static assets backend, so I would still remove them, have the 403 snippet, and put return(lookup);.

As you can tell there are lots of little details you can tweak and fiddle with in VCL.

And then there's this...