While a forward proxy is usually situated between the client application (such as a web browser) and the server(s) hosting the desired resources, a reverse proxy is usually situated closer to the server(s) and will only return a configured set of resources.
Facebook uses Varnish to serve billions of requests every day to users around the world.
David Recordon, Head of open source initiatives, Facebook.
"Facebook uses Varnish to serve billions of requests every day to users around the world. We like its simple architecture, which is designed for modern operating systems and find that it does not consume much CPU while handling heavy loads. Varnish is our favored HTTP cache and we use it heavily; whenever you load photos and profile pictures of your friends on Facebook, there's a very good chance that Varnish is involved."
David Recordon, Head of open source initiatives, Facebook.
Google top 1000 list:
16 www.twitter.com (search.twitter.com)
49 www.weather.com (www.weather.com)
53 www.answers.com (www.answers.com)
80 www.globo.com (www.globo.com)
116 www.nytimes.com (or so they tell themselves)
160 www.slideshare.net (www.slideshare.net)
186 www.livejournal.com (www.livejournal.com)
etc
Global Alexa toolbar users:
(Note that facebook (#2) is using varnish, but this is not automatically discovered
9 twitter.com (search.twitter.com)
76 livejournal.com
95 weather.com
111 globo.com
112 stumbleupon.com
144 answers.com
203 wikia.com
220 squidoo.com
etc
Google top 1000 list:
Place 16 Varnish running on www.twitter.com (search.twitter.com)
Place 49 Varnish running on www.weather.com (www.weather.com)
Place 53 Varnish running on www.answers.com (www.answers.com)
Place 80 Varnish running on www.globo.com (www.globo.com)
Place 116 Varnish running on www.nytimes.com (or so they tell themselves)
Place 160 Varnish running on www.slideshare.net (www.slideshare.net)
Place 186 Varnish running on www.livejournal.com (www.livejournal.com)
Place 208 Varnish running on www.mercadolivre.com.br (www.mercadolivre.com.br)
Place 215 Varnish running on www.y8.com (www.y8.com)
Place 313 Varnish running on www.robtex.com (www.robtex.com)
Place 368 Varnish running on www.manta.com (www.manta.com)
Place 370 Varnish running on www.squidoo.com (www.squidoo.com)
Place 403 Varnish running on www.taleo.net (www.taleo.net)
Place 488 Varnish running on www.gazeta.pl (www.gazeta.pl)
Place 564 Varnish running on www.wetter.com (www.wetter.com)
Place 579 Varnish running on www.spiegel.de (www.spiegel.de)
Place 596 Varnish running on www.merriam-webster.com (www.merriam-webster.com)
Place 603 Varnish running on www.wat.tv (www.wat.tv)
Place 614 Varnish running on www.rutube.ru (www.rutube.ru)
Place 619 Varnish running on www.mercadolibre.com.mx (www.mercadolibre.com.mx)
Place 652 Varnish running on www.zappos.com (www.zappos.com)
Place 693 Varnish running on www.xpg.com.br (www.xpg.com.br)
Place 700 Varnish running on www.espncricinfo.com (www.espncricinfo.com)
Place 722 Varnish running on www.stumbleupon.com (www.stumbleupon.com)
Place 739 Varnish running on www.wwe.com (www.wwe.com)
Place 766 Varnish running on www.kapook.com (www.kapook.com)
Place 795 Varnish running on www.mercadolibre.com.ar (www.mercadolibre.com.ar)
Place 870 Varnish running on www.xywy.com (www.xywy.com)
Place 885 Varnish running on www.break.com (www.break.com)
Place 969 Varnish running on www.mobifiesta.com (www.mobifiesta.com)
Place 987 Varnish running on www.mercadolibre.com (www.mercadolibre.com)
Place 991 Varnish running on www.popcap.com (www.popcap.com)
Global Alexa toolbar users:
(Note that facebook (#2) is using varnish, but this is not automatically discovered
Place 9 Varnish running on twitter.com (search.twitter.com)
Place 76 Varnish running on livejournal.com
Place 95 Varnish running on weather.com
Place 111 Varnish running on globo.com
Place 112 Varnish running on stumbleupon.com
Place 144 Varnish running on answers.com
Place 203 Varnish running on wikia.com
Place 220 Varnish running on squidoo.com
Place 241 Varnish running on espncricinfo.com
Place 251 Varnish running on slideshare.net
Place 373 Varnish running on soundcloud.com
Place 389 Varnish running on drupal.org
Place 392 Varnish running on mercadolivre.com.br
Place 403 Varnish running on businessinsider.com
Place 435 Varnish running on forbes.com
Place 441 Varnish running on gazeta.pl
Place 479 Varnish running on posterous.com
Use your standard package manager (HomeBrew, AppGet, etc). You'll need CYGWIN for Windows (sucks to be you :(
# Simple .VCL file
backend default {
.host = "varnish.local";
.port = "80";
}
The default varnish configuration is automatically appended to yours.
Default varnish installation serves cached content from port 6081 proxied from 127.0.0.1:80.
varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,1G -a 0.0.0.0:8080
-T telnet listen address/port, -s storage, -a varnish address/port
<CFHEADER NAME="Cache-Control" VALUE="s-maxage=600">
<cfoutput>
<html>
<body>
<h1>Hello cruel world!</h1>
<strong>Generated by server:</strong>
#dateformat(now(),'ddd, mmm d yyyy')# #timeformat(now(),'HH:mm:ss')#<br>
<strong>Loaded by browser:</strong>
<script type="text/javascript">document.write(new Date());</script>
</body>
</html>
</cfoutput>
Shows the server time (affected by the cache) and the client time on the browser.
COOKIES!!! UMM-NUM-NUM-NUM-NUM!!!
Cookie Monster, CEO, No More Cookies Inc.
vcl_recv()
Called after a request is received from the browser, but before it is processed.
vcl_pipe()
Called when a request must be forwarded directly to the backend with minimal handling by Varnish (think HTTP CONNECT)
vcl_hash()
Called to determine the hash key used to look up a request in the cache.
vcl_hit()
Called after a cache lookup when the object requested has been found in the cache.
vcl_miss()
Called after a cache lookup when the object requested was not found in the cache.
vcl_pass()
Called when the request is to be passed to the backend without looking it up in the cache.
vcl_fetch()
Called when the request has been sent to the backend and a response has been received from the backend.
vcl_deliver()
Called before a response object (from the cache or the web server) is sent to the requesting client.
vcl_recv()
and vcl_fetch()
."The principal configuration mechanism is VCL (Varnish Configuration Language), a domain-specific language (DSL) used to write hooks which are called at critical points in the handling of each request. Most policy decisions are left to VCL code, making Varnish far more configurable and adaptable than most other HTTP accelerators. When a VCL script is loaded, it is translated to C, compiled to a shared object by the system compiler, and linked directly into the accelerator." Wikipedia: Varnish Cache
sub vcl_recv {
# Remove user agent
if (req.http.User-Agent) {
set req.http.User-Agent = "";
}
# remove any cookies from client request
unset req.http.cookie;
# look up the cache
return(lookup);
}
sub vcl_fetch {
# cache period (ttl: time to live)
set beresp.ttl = 45s;
# cache grace period; serve dirty
set beresp.grace = 15s;
}
# simple.vcl
backend default {
.host = "varnish.local";
.port = "80";
}
sub vcl_recv {
# Remove user agent
if (req.http.User-Agent) {
set req.http.User-Agent = "";
}
# remove any cookies from client request
unset req.http.cookie;
# look up the cache
return(lookup);
}
sub vcl_fetch {
# cache period (ttl: time to live)
set beresp.ttl = 45s;
# cache grace period; serve dirty
set beresp.grace = 15s;
}
sub vcl_recv {
if (req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$") {
unset req.http.Accept-Encoding;
// Remove user agent
if (req.http.User-Agent) {
set req.http.User-Agent = "";
}
unset req.http.Cookie;
return(lookup);
}
}
~ is a regex operator for strings, set and unset update and remove a property, req.http properties are request headers, return(lookup) returns control to varnish, requesting a cache lookup if possible.
<VirtualHost *:80>
# Set up caching on media files for 1 year (forever?)
<FilesMatch "\.(jpg|jpeg|gif|png|ico|css|zip|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$">
ExpiresDefault A29030400
Header append Cache-Control "public"
</FilesMatch>
</VirtualHost>
This configuration leaves control of cache timeouts with Apache, but you can force a cache timeout in varnish with a line like "set beresp.ttl = 48h;"
# IPs/domains that can access the purge url
acl purge {
"localhost";
"203.26.11.39";
}
sub vcl_recv {
# Purge everything url - this isn't the squid way, but works
if (req.url ~ "^/varnishpurge") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
if (req.url == "/varnishpurge") {
ban("req.http.host == " + req.http.host + " && req.url ~ ^/");
error 841 "Purged site.";
}
else {
ban("req.http.host == " + req.http.host + " && req.url
\ ~ ^" + regsub( req.url, "^/varnishpurge(.*)$", "\1" ) + "$" );
error 842 "Purged page.";
}
}
}
Use an ACL to restrict access to the sensitive functionality like flushing. For ACLs, the ~ operator is an 'in' check.
http://www.fullasagoog.com/varnishpurge
http://www.fullasagoog.com/varnishpurge/blog/daemonite
Can be used as a hook to manage the cache from your upstream server. Allows administrators to flush problem pages on demand easily.
ban
is for varnish 3 what purge
was for 2. Bans are stored in memory, and every page request is checked against every ban - there are performance implications.
acl CTRLF5 {
"192.168.100.100";
}
sub vcl_hit {
if (client.ip ~ CTRLF5) {
if (req.http.pragma ~ "no-cache" || req.http.Cache-Control ~ "no-cache")
{
set obj.ttl = 0s;
return(pass);
}
else { return(deliver); }
}
else { return(deliver); }
}
# IPs/domains that bypass cache
acl bypass {
"1.2.3.4";
}
sub vcl_recv {
if (client.ip ~ bypass) {
return(pass);
}
}
return(pass) passes the request through to the backend, bypassing the cache.
sub vcl_recv {
if (req.http.Cookie ~ "LOGGED-IN=1") {
return(pass)
}
}
Bypasses cache if there is a LOGGED-IN cookie with value 1.
sub vcl_recv {
if (req.http.X-Requested-With == "XMLHttpRequest") {
return(pass)
}
}
ajax requests bypass cache. Keep everything static except personlised regions that are ajaxed into place.
Make sure you Javascript implementation for AJAX actually sets XMLHttpRequest :)
Varnish can include the user-agent and accept-encoding headers in cache hash, so normalising them is a good idea:
sub vcl_recv {
if (req.http.Accept-Encoding) {
if (req.http.Accept-Encoding ~ "gzip") {
# if the browser supports it, we'll use gzip
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
# next, try deflate if it is supported
set req.http.Accept-Encoding = "deflate";
} else {
# unknown algorithm. Probably junk, remove it
unset req.http.Accept-Encoding;
}
}
# Remove user agent
if (req.http.User-Agent) {
set req.http.User-Agent = "";
}
}
if (req.http.User-Agent ~ "iP(hone|od)" || req.http.User-Agent ~ "Android" ||
\ req.http.User-Agent ~ "Symbian" || req.http.User-Agent ~ "^BlackBerry" ||
\ req.http.User-Agent ~ "^SonyEricsson" || req.http.User-Agent ~ "^Nokia" ||
\ req.http.User-Agent ~ "^SAMSUNG" || req.http.User-Agent ~ "^LG" ||
\ req.http.User-Agent ~ " webOS") {
set req.http.User-Agent = "mobile";
}
else {
set req.http.User-Agent = "desktop";
}
sub vcl_hash {
# these 2 entries are the default ones used for vcl. Below we add our own.
set req.hash += req.http.host;
set req.hash += req.url;
set req.http.X-Varnish-Hashed-By = req.http.host req.url;
if (req.http.Cookie ~ "ROLES") {
set req.hash += regsub( req.http.Cookie, "^.*?ROLES=([^;]*);*.*$", "\1" );
set req.http.X-Varnish-Hashed-By = req.http.X-Varnish-Hashed-By regsub( req.http.Cookie, "^.*?ROLES=([^;]*);*.*$", "\1" );
}
return(hash);
}
Adds a cookie ROLES
to the cache key. req.http.X-Varnish-Hashed-By
adds a helpful header for debugging.
Pass along original IP address as a custom header
sub vcl_recv {
// Pass along the IP for uncached pages
if (req.http.x-forwarded-for) {
set req.http.X-Forwarded-For = req.http.X-Forwarded-For ", " client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
}
// vcl_fetch()
// Let any non "GET" / "HEAD" right through
if (req.request != "GET" && req.request != "HEAD"){
set beresp.http.X-Cacheable = "NO:Not GET or HEAD";
return(pass);
}
// strip cookies for everything except specific pages
elsif ( req.url ~ "profile" || req.url ~ "Login" || req.url ~ "login" || req.url ~ "logout" ) {
set beresp.http.X-Cacheable = "NO:Login or logout page";
return(pass);
}
else {
unset beresp.http.set-cookie;
}
Dealing with comment forms, secure content & newsletter driven rushes on the site
sub vcl_recv {
set req.grace = 300s;
}
sub vcl_fetch {
if (beresp.status == 500) {
set beresp.saintmode = 20s;
if (req.request != "POST") {
return(restart);
} else {
error 500 "Failed";
}
set beresp.ttl = 5s;
}
set beresp.grace = 300s;
}
req.grace
defines how old a cache can be if there is a backend problemberesp.saintmode
flags the current page on the current back end as "bad"return(restart)
to retry the request; if there is no "good" backend for the page, the grace will kick inPOST
information can't really be restartedberesp.grace
is effectively the maximum grace the request can haveberesp.ttl
is the cache timeout - setting it overrides the value in the Cache-Control headerWhen several clients are requesting the same page Varnish will send one request to the backend and place the others on hold while fetching one copy from the back end. In some products this is called request coalescing and Varnish does this automatically.
Sometimes servers get flaky. They start throwing out random errors. You can instruct Varnish to try to handle this in a more-than-graceful way - enter Saint mode. Saint mode enables you to discard a certain page from one backend server and either try another server or serve stale content from cache.
IDDQD
#show a histogram of hits (|) and misses (#) by the request time
varnishhist
#show the misses by most frequent
varnishtop -b -i txurl
#show general stats (e.g. proportion of hits to misses over time)
varnishstat
#live varnish logs (this is VERY verbose)
varnishlog
#varnish logs, grouped by request, for the daemon office IP
varnishlog -c -o SessionOpen 203.26.11.39
#live varnish logs in apache/NCSA "combined" log format
varnishncsa
8.89 TxURL /registered-user/displayTypeLoginPod/ajaxmode/1
7.99 TxURL /post/rss
5.62 TxURL /favicon.ico
4.95 TxURL /xml/ColdFusionMX.xml
4.17 TxURL /googblogpost/rss?format=xml
3.94 TxURL /googblogpost/rss
3.74 TxURL /xml/fullasagoog50.xml
3.71 TxURL /
3.57 TxURL /F15D6000-0A77-11E1-ACA5005056A301A8/redirect
3.53 TxURL /xml/export.xml
3.47 TxURL /8D5BD510-0A76-11E1-ACA5005056A301A8/redirect
3.43 TxURL /xml/FlashMX.xml
3.35 TxURL /index.cfm?objectID=08579C9F-C085-7808-85EB3D1CD6488792&flushcache=1
3.11 TxURL /cache/webskinAjaxLoader--0-D41D8CD98F00B204E9800998ECF8427E-D41D8CD98F00B204E9800998ECF8427E-313CA0907DADE6EA6215AA16A0685740.js
3.11 TxURL /cache/fcga--1311305428809-2FC28720EE6EF54DA2F5353760DAEA80-D41D8CD98F00B204E9800998ECF8427E-D41D8CD98F00B204E9800998ECF8427E.js
3.11 TxURL /cache/jquery--1305519744181-AA09F36EDAB33BAEF6A415F165FE5657-D41D8CD98F00B204E9800998ECF8427E-4CD1C3C5B197FB174E1BC7C9641A5874.js
3.09 TxURL /lib/tnw_pr
https://www.varnish-cache.org/docs/trunk/tutorial/advanced_backend_servers.html
Map different server content into a single URL. Introduce a PHP application into your CFML web site. Imagine we hack the thing onto port 8000. Now, Varnish magic:
backend default {
.host = "127.0.0.1";
.port = "80";
}
backend php {
.host = "127.0.0.1";
.port = "8000";
}
sub vcl_recv {
if (req.url ~ "^/forums/") {
set req.backend = php;
} else {
set req.backend = default.
}
}
Mobile devices from a different backend? No problem.
if (req.User-agent ~ /mobile/)
Totally arbitrary data -- it's awesome.
https://www.varnish-cache.org/docs/trunk/tutorial/advanced_backend_servers.html
Group several backend
into a group of backends, aka directors
. You can define several backends and group them together in a director
.
backend server1 {
.host = "192.168.0.10";
}
backend server2{
.host = "192.168.0.10";
}
Now we create the director.:
director example_director round-robin {
{ .backend = server1; }
{ .backend = server2; }
}
round-robin
director; distributes the incoming requests on a round-robin basis. random
director; distributes requests randomly.
But what if one of your servers goes down? Can Varnish direct all the requests to the healthy server? Sure it can. This is where the Health Checks come into play.
backend server1 {
.host = "server1.example.com";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1s;
.window = 5;
.threshold = 3;
}
}
backend server2 {
.host = "server2.example.com";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1s;
.window = 5;
.threshold = 3;
}
}
url
What URL should varnish request.
interval
How often should we poll
timeout
What is the timeout of the probe
window
Varnish will maintain a sliding window of the results. Here the window has five checks.
threshold
How many of the .window last polls must be good for the backend to be declared healthy.
initial
How many of the of the probes a good when Varnish starts - defaults to the same amount as the threshold.
Roughly speaking, ESI consists of three parts, of which Varnish so far, implements only one:
<esi:include>
, <esi:remove>
and <!--esi ... -->
(implemented)<esi:include>
is to web content what <cfinclude>
is to CFML
<esi:include>
constructs, but their depth is limited by the parameter max_esi_includes defaulting to 5.<esi:include>
instructions in one document.ESI Standard (Varnish only supports a sub-set)
ESI element tags are inserted into HTML or other text based content during creation. Instead of being displayed to viewers these ESI tags are directives that instruct an ESI processor to take some action. The XML based ESI tags indicate to the edge-side processing agent the action that needs to be taken to complete the page's assembly. One simple example of an ESI element is the include tag which is used to include content external to the page. An ESI include tag placed in-line within an HTML document would look like this:
In this case the ESI processor would retrieve the src URL, or failing that the alt URL, or if that failed do nothing. The ESI system is usually a caching proxy server so it may have a local copy of these files which it can insert without going back to the server. Alternatively the whole page with the ESI tags may be cached, and only the ESI requests may be made to the origin server. This allows different caching times for different parts of the page, or different degrees of personalisation.
<HTML>
<BODY>
Latest news: <esi:include src="/hotnews.cfm"/>
</BODY>
</HTML>
sub vcl_fetch {
if (req.url == "/test.html") {
set beresp.do_esi = true; /* Do ESI processing */
set beresp.ttl = 24 h; /* Sets the TTL on the HTML above */
} elseif (req.url == "/hotnews.cfm") {
set beresp.ttl = 1m; /* Sets a one minute TTL on */
/* the included object */
}
}
Example: esi remove The remove keyword allows you to remove output. You can use this to make a fall back of sorts, when ESI is not available, like this::
Example: This is a special construct to allow HTML marked up with ESI to render without processing. ESI Processors will remove the start ("") when the page is processed, while still processing the contents. If the page is not processed, it will remain, becoming an HTML/XML comment tag. For example:
This assures that the ESI markup will not interfere with the rendering of the final HTML if not processed.
http://www.fullasagoog.com/
public domainhttp://webtop.fullasagoog.com/
direct access bypassing cachehttp://proxy.fullasagoog.com/
always through the proxyTable of Contents | t |
---|---|
Exposé | ESC |
Full screen slides | e |
Presenter View | p |
Source Files | s |
Slide Numbers | n |
Toggle screen blanking | b |
Show/hide slide context | c |
Notes | 2 |
Help | h |