Varnish

Share on FriendFeed
Posted by marco
Wed, 28 Oct 2009 11:31:00 GMT

Varnish is a reverse proxy, and is a very good one. You can reduce effectively the load on your web servers, or deploy a load-balancer, and it has a very sophisticated configuration.

However, with the default configuration you could have a small cache hits/misses ratio. With a few modifications, you can improve the cache hit. I think that these modifications are general enough, so I share them here. Mostly are based on what you find in this blog post and in the Varnish FAQ

The only functions I modified with respect to the default configuration are vclrecv, vclpipe and vcl_fetch:

backend lighttpd {
.host = "::1";
.port = "80";
}
sub vcl_recv {
    set req.backend = lighttpd;
    if (req.http.Accept-Encoding) {
        if (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|pdf)$") {
                    # No compression fo these
                remove req.http.Accept-Encoding;
            } elsif (req.http.Accept-Encoding ~ "gzip") {
                set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
                    set req.http.Accept-Encoding = "deflate";
        } else {
                    # unkown
                    remove req.http.Accept-Encoding;
        }
    }
    set req.grace = 30s;
    # Allow purging of cache using shift + reload
    if (req.http.Cache-Control ~ "no-cache") {
        purge_url(req.url);
    }
    # Unset any cookies and autorization data for static links and icons, and fetch from cache
    if (req.request == "GET" && req.url ~ "^/files/") {
        unset req.http.cookie;
        unset req.http.Authorization;
        lookup;
     }

     # Look for images in the cache
     if (req.url ~ ".(svg|png|gif|jpg|ico|jpeg|swf|css|js)$") {
        unset req.http.cookie;
        lookup;
     }

}

sub vcl_pipe {
    # Warning, default is disabled, it may break some web applications 
    set req.http.connection = "close";
}

sub vcl_fetch {
    set obj.grace = 30s;
    if (obj.ttl < 180s) {
        set obj.ttl = 180s;
    }

    if (req.http.Authorization && !obj.http.Cache-Control ~ "public") {
        # don't allow caching pages that are protected by basic authentication
        # unless when they explicitly set the cache-control to public.
        pass;
    }
}

freedesktop-sound-theme

Share on FriendFeed
Posted by marco
Tue, 05 May 2009 15:14:00 GMT

During your journey in the unstable debian land of gnome 2.24/2.26, the bug #516080 can manifest itself. You will have the the sound preferences tab all greyed out (system -> preferences -> sound, sounds tab), the system sounds will not work and anything using libcanberra will not play notification sounds, like e.g. Empathy.

It seems tha libcanberra will not play notification sounds if it has not installed the default sound theme, which is not packaged (yet? See #486559) in Debian due to licensing issues, I suppose.

Here is my little workaround:

  1. download the control files needed to build the debian package into the freedesktop dir:

      git clone git://git.debian.org/git/collab-maint/freedesktop-sound-theme.git \ freedesktop
    
  2. download and extract the libcanberra sound files:

      wget http://0pointer.de/public/sound-theme-freedesktop.tar.gz && tar xvzf \ 
      sound-theme-freedesktop.tar.gz
    
  3. build the debian package (you may need to install dpkg-dev before)

     cd freedesktop && dpkg-buildpackage
    
  4. install the debian packages you got at the end of the process:

     cd .. && dpkg -i freedesktop-sound-theme_0.2-1_amd64.deb
    

Et voila, now you should be able to enable the system sounds, and the Empathy sounds will work

It's official (2)

Share on FriendFeed
Posted by marco
Mon, 09 Mar 2009 00:10:00 GMT

I had an answer. As someone recently told me, I’m a Vulcan.

It's official

Share on FriendFeed
Posted by marco
Wed, 28 Jan 2009 00:56:00 GMT

I’m not human.

More often than not, I happen to fail the captcha when signing in web pages.

It is just me? And, if I’m not human, what am I?

Days of Autumn

Share on FriendFeed
Posted by marco
Wed, 15 Oct 2008 21:00:00 GMT

My favourite blog right now is The Boston Globe Big Picture.

Days of Autumn is awesome.

Wireless Italian law

Share on FriendFeed
Posted by marco
Wed, 08 Oct 2008 12:36:00 GMT

I copy here the last update of our page about the laws regulating wireless networks in Italy, “Wireless: la normativa italiana” (italian only, sorry). Take this as an experiment to test the compliance amongst italian blogs (and company sites!) of the original text CC License.

Lenovo Thinkpad Laptop Rocks.

Share on FriendFeed
Posted by marco
Tue, 02 Sep 2008 00:17:00 GMT

My previous laptop collapsed: in the last few days both the wireless interface and the touchpad stopped working. The disk showed a few bad blocks and the bios had some quirks, while the cdrom stopped working a few months ago. At the end, this laptop it was not worth the money I paid for it, for the overall quality was in the low/medium end, and the price was in the very high end. I don’t think I will ever buy again a Vaio laptop.

Now I’m the happy owner of a Lenovo T61; its quality is an order of magnitude better than the Vaio one. The keyboard has a very good feeling, and it’s by far the best laptop keyboard I’ve ever used. The construction is very solid, everything works with Gnu/Linux with little effort, and last but non least I’ve just paid for it now less € than I paid for the Vaio one year and a half ago. I recommend Lenovo if you want a business laptop.

Non-negative Matrix Factorization

Share on FriendFeed
Posted by marco
Thu, 03 Jul 2008 00:13:00 GMT

Here it is a small ruby script that calculate the non-negative matrix factorization of a given matrix. I tried to mantain the notation consistent with the one used in the paper of Lee and Seung

The script requires the GSL library.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

#!/usr/bin/ruby


require("gsl")



def nmf(v, col, thresh)
   r, c = v.shape

   # w * h = v
   w = GSL::Matrix.alloc(r, col)
   h = GSL::Matrix.alloc(col, c)
   initrand(w, w.max)
   initrand(h, h.max)
   dist = thresh
   iter = 0
   while(dist >= thresh && iter < 1000)
      # multiplicative update rule
      dist, w, h = update(v, w, h)
      puts "#{iter}: #{dist}" #if (iter%10 == 0)
      iter += 1
   end   
   puts "Ended in #{iter} iteration(s) with dist=#{dist}"
   return w, h
end


def initrand(m, max)
   r, c = m.shape
   0.upto(r-1) { |i|
       0.upto(c-1) { |j|
          m[i,j] = rand(max)
       }
   }
end

def update(v, w, h)
   # choose an update rule
   # v, w, h are GSL::Matrix objects
   dist, w, h = update_eu(v, w, h)
   return dist, w, h
end

def update_eu(v, w, h)
   # multiplicative update rule
   # for minimizing euclidean distance
   # v, w, h are GSL::Matrix objects
   dist = dist_eu(v, w*h)
   wt = w.transpose
   ht =  h.transpose
   h = h.mul_elements(wt * v).div_elements(wt * w * h)
   w = w.mul_elements(v * ht).div_elements(w * h * ht)
   return dist, w, h
end


def update_kl(v, w, h)
   # multiplicative update rule
   # for minimizing divergence
   # v, w, h are GSL::Matrix objects
   dist = dist_kl(v, w*h)
   wt = w.transpose
   ht =  h.transpose
   num = v.div_elements(w * h)
   rh, ch = h.shape
   rw, cw = w.shape
   0.upto(rh-1) { |i|
       0.upto(ch-1) { |j|
          h[i, j] *= w.col(i).mul(num.col(j)).sum / w.col(i).sum
       }
   }
   0.upto(rw-1) { |i|
       0.upto(cw-1) { |j|
          w[i, j] *= h.row(j).mul(num.row(i)).sum / h.row(j).sum
       }
   }
   return dist, w, h
end



# Euclidean distance
def dist_eu(a, b)
   # a,b are GSL::Matrix objects
   dist = 0
   (a-b).collect { |d| dist += d * d}
   dist
end

# Kullback-Leibler divergence
def dist_kl(a, b)
   # a, b are GSL::Matrix objects
   dist = 0
   r, c = a.shape
   0.upto(r-1) { |i|
       0.upto(c-1) { |j|
          if (b[i,j] == 0) 
             puts "error"
             dist = 2**32
             break
          end
          dist += a[i,j] * Math::log(a[i,j]/b[i,j]) - a[i,j] + b[i,j]
       }
   }
   dist
end




Tanimoto Coefficient

Share on FriendFeed
Posted by marco
Wed, 02 Jul 2008 00:10:00 GMT

A small Ruby snippet to calculate the Tanimoto coefficient (aka extended Jaccard coefficient) of two real vectors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#!/usr/bin/ruby

class Array 
   def sum
      inject( 0 ) { |sum,x| sum+x }
   end
   def sum_square
      inject( 0 ) { |sum,x| sum+x*x }
   end
   def *(other) # dot_product
      ret = []
      return nil if !other.is_a? Array || size != other.size
      self.each_with_index {|x, i| ret << x * other[i]}
      ret.sum
   end
end

def tanimoto(a, b)
   dot = (a * b)
   den = a.sum_square + b.sum_square - dot
   dot.to_f/den.to_f
end

# puts tanimoto([1,2,2],[3,3,1])



Pearson Correlation Coefficient

Share on FriendFeed
Posted by marco
Tue, 01 Jul 2008 00:13:00 GMT

A small Ruby snippet to calculate the Pearson produt-moment correlation coefficient:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

#!/usr/bin/ruby

class Array 
   def sum
      inject( 0 ) { |sum,x| sum+x }
   end
   def sum_square
      inject( 0 ) { |sum,x| sum+x*x }
   end
   def mean
      sum / size
   end
   def *(other) # dot product
      ret = []
      return nil if !other.is_a? Array || size != other.size
      self.each_with_index {|x, i| ret << x * other[i]}
      ret.sum
   end
end


x = Array.new(100) {|i| i }
y = Array.new(100) {|i| -i*2 }


sumx = x.sum
sumy  = y.sum
num = (x.size * (x*y)) - (sumx * sumy)
den = Math.sqrt((x.size * x.sum_square) - sumx * sumx) * Math.sqrt((y.size * y.sum_square) - sumy * sumy)
r = num /den

puts "X= [#{x.join(', ')}]"
puts "Y= [#{y.join(', ')}]"
puts "r= #{r}"


ζ(s)

Share on FriendFeed
Posted by marco
Wed, 18 Jun 2008 23:56:00 GMT

Such a terse and strong abstract, and a really short paper, for an hypothesis that lasts unproved for a century. We shall see if the proof is correct.

Ruby Fiber Ring Benchmark

Share on FriendFeed
Posted by marco
Thu, 22 May 2008 17:51:00 GMT

I stole an exercise from the Erlang world: the ring benchmark

I used the exercise as a way to get acquainted with a new class in Ruby, Fiber (starting from the version 1.9). Fibers are a way to implement asymmetric coroutines in Ruby,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

#!/usr/bin/ruby1.9

require 'fiber'
require 'benchmark'

class Ring
   attr_reader :id
   attr_accessor :attach 

   def initialize(id)
      @id = id
      @fiber = Fiber.new do
         pass_message
      end
   end

   def |(other)
      other.attach = self if !other.nil?
      other
   end

   def resume
      @fiber.resume
    end

   def pass_message
      while message = message_in
         message_out(message)      
      end
   end

   def message_in
      @attach.resume if !@attach.nil?
   end

   def message_out(message)
      Fiber.yield(message)
   end

end

class RingStart < Ring
   attr_accessor :message
   def initialize(n, m, message)
      @m = m
      @message = message
      super(n)
   end
   
   def pass_message 
      loop { message_out(@message) }
   end

end


def create_chain_r(i, chain)
   # recursive version
   return chain if i<=0
   r = chain.nil? ? Ring.new(i) :  chain | Ring.new(i)
   create_chain(i-1, r)
end

def create_chain(n, chain)
   # loop version
   # needed to avoid stack overflow for high n
   n.downto(0) {
      chain = chain | Ring.new(n)
   }
   chain
end



n=ARGV[0].to_i
m=ARGV[1].to_i
mess=:hello
tm  = Benchmark.measure {
   ringu = RingStart.new(0, m, mess)
   chain = create_chain(n, ringu)
   m.times { ringu.message = chain.resume }
}.format("%10.6r\n").gsub!(/\(|\)/, "")

puts "#{n}, #{m}, #{tm}"
The comparison with the equivalent Erlang program is of course unfair, for one of the Erlang specialty is the spawning of cooperative processes. Anyway, here are the results:    
NMExecution Time
Ruby [ms]Erlang [ms]
10 10 1 0
10 20 2 0
10 50 4 10
10 100 10 0
10 200 18 0
10 500 42 0
10 1000 82 10
100 10 15 0
100 20 23 10
100 50 60 0
100 100 99 10
100 200 186 20
100 500 433 50
100 1000 805 90
1000 10 320 10
1000 20 423 20
1000 50 809 60
1000 100 1479 130
1000 200 2803 250
1000 500 6914 600
1000 1000 15410 1190
10000 10 16168 190
10000 20 17491 320
10000 50 21348 710
10000 100 28314 1480
10000 200 41547 2860
10000 500 81166 7200
10000 1000 147570 14190

Erlang ring benchmark

Share on FriendFeed
Posted by enrico
Thu, 22 May 2008 17:30:00 GMT

Trying to be a cool kid, I’ve recently spent some time playing with Erlang. So I’ve found in Joe Armstrong’s Programming Erlang the following exercise:

”Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M.

Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the Internet! ”

(Already a classic now …).

I dutifully complied: here’s my solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-module(ringu).
-export([start/2]).

% ----------------------------------------------------------------------

start(N, M) -> 
    Pid_0 = self(),
    statistics(runtime),
    statistics(wall_clock),
    Pid = spawn(fun() -> ringu(N, M, self(), Pid_0) end),
    Pid ! x,
    receive 
        _ -> true
    end,
    {_, TCPU} = statistics(runtime),
    {_, Time} = statistics(wall_clock),
    io:format("~nElapsed time: ~w ms~nCPU time:     ~w ms ~n" ,
          [Time, TCPU]).

% ----------------------------------------------------------------------

ringu(1, M, Pid_1, Pid_0) -> ringu_loop(M, Pid_1), Pid_0 ! itsover;
ringu(N, M, Pid_1, Pid_0) -> 
    ringu_loop(M, spawn(fun() -> ringu(N-1, M, Pid_1, Pid_0) end)).

ringu_loop(1, Pid) -> ringu_fwd_msg(Pid);   
ringu_loop(M, Pid) -> ringu_fwd_msg(Pid), ringu_loop(M-1, Pid).

ringu_fwd_msg(Pid) ->    
    receive 
        X -> Pid ! X %, io:format(".", [])
    end.

Benchmark results, and an (uneven :) ) comparison with an equivalent script in Ruby here.

Oil price

Share on FriendFeed
Posted by marco
Thu, 22 May 2008 00:13:00 GMT

I’d like to reiterate a past meme. This time I restrict the meme to one single prediction: please tell me wether you think the crude oil price will pass the 150$ per barrell in the very near future, and, if you think that of it’s only matter of weeks, dare to say the exact day. I venture myself to bespeak the 30th of June, 2008.

Carlo, Alx, what do you think?

Caching Mephisto with lighttpd and mod_magnet

Share on FriendFeed
Posted by marco
Thu, 15 May 2008 00:15:00 GMT

This blog uses Mephisto, a rails application, and Thin as backend webserver. The frontend webserver is Lighttpd.

While Mephisto has a good caching system, producing a static html file for every visited page, the file is still served via Thin, reducing the speed and the scalability of the system and introducing an unnecessary step.

Lighttpd is very good at serving static files, thus I searched a way to serve a file from the cache directory of Mephisto, if present; I found a hack using mod_magnet.

Hereafter I wrote the relevant snippets of configuration files and how to put every piece in the right position.

Note: the same procedure should apply if you use Mongrel instead of Thin, for the relevant point is to bypass the backend rails webserver and serve all files (whether they exist) from the Mephisto cache.

First I configure Lighttpd to proxy to Thin for the rails application. I assume Thin (or Mongrel) is listening on port 3000 on two hosts.

  HTTP[''host"] == "virtualhost.example.org" {
        magnet.attract-physical-path-to = ( "/var/www/cache_mephisto.lua" ) 
        proxy.balance = "hash"
        proxy.server  = ( "" => ( ( "host" => "10.0.0.1", "port" =>  3000 ),
                                ( "host" => "10.0.0.2", "port" =>  3000 ) ) )
  }

The magnet.attract-physical-path-to = ( "/var/www/cache_mephisto.lua" ) is the magic trick. cache_mephisto.lua is a script written in lua programming language. The whole script is:

  local dir =  "/path/to/mephisto/public/cache/" .. lighty.request["Host"]

  if ( lighty.env["request.uri"] == nil or lighty.env["request.uri"] == "/") then
    file = dir .. "/index.html"
  else
    file = dir .. lighty.env["request.uri"] ..  ".html"
  end

  if lighty.stat(file) then
  lighty.header["Content-Type"] = "text/html"
  lighty.content = { { filename = file } }
  return 200
  end

 return lighty.RESTART_REQUEST

Inside the script you should adapt the variable dir to your Mephisto installation path. I need to append the virtual host name lighty.request["Host"] to the directory, for I run a multisite Mephisto installation; if you use a single site mephisto installation, probably you need to remove this append.

Now for some benchmarks:

  • serving the file via Thin:

       Requests per second:    1.40 [#/sec] (mean)
       Time per request:       716.007 [ms] (mean)
    
  • serving the file from the cache

       Requests per second:    11.46 [#/sec] (mean)
       Time per request:       87.247 [ms] (mean)
    

Almost a tenfold improvement!

Future works: serve from the cache the image files.