My favourite blog right now is The Boston Globe Big Picture.
Days of Autumn is awesome.
My favourite blog right now is The Boston Globe Big Picture.
Days of Autumn is awesome.
I copy here the last update of our page about the laws regulating wireless networks in Italy, “Wireless: la normativa italiana” (italian only, sorry). Take this as an experiment to test the compliance amongst italian blogs (and company sites!) of the original text CC License.
My previous laptop collapsed: in the last few days both the wireless interface and the touchpad stopped working. The disk showed a few bad blocks and the bios had some quirks, while the cdrom stopped working a few months ago. At the end, this laptop it was not worth the money I paid for it, for the overall quality was in the low/medium end, and the price was in the very high end. I don’t think I will ever buy again a Vaio laptop.
Now I’m the happy owner of a Lenovo T61; its quality is an order of magnitude better than the Vaio one. The keyboard has a very good feeling, and it’s by far the best laptop keyboard I’ve ever used. The construction is very solid, everything works with Gnu/Linux with little effort, and last but non least I’ve just paid for it now less € than I paid for the Vaio one year and a half ago. I recommend Lenovo if you want a business laptop.
Here it is a small ruby script that calculate the non-negative matrix factorization of a given matrix. I tried to mantain the notation consistent with the one used in the paper of Lee and Seung
The script requires the GSL library.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
#!/usr/bin/ruby require("gsl") def nmf(v, col, thresh) r, c = v.shape # w * h = v w = GSL::Matrix.alloc(r, col) h = GSL::Matrix.alloc(col, c) initrand(w, w.max) initrand(h, h.max) dist = thresh iter = 0 while(dist >= thresh && iter < 1000) # multiplicative update rule dist, w, h = update(v, w, h) puts "#{iter}: #{dist}" #if (iter%10 == 0) iter += 1 end puts "Ended in #{iter} iteration(s) with dist=#{dist}" return w, h end def initrand(m, max) r, c = m.shape 0.upto(r-1) { |i| 0.upto(c-1) { |j| m[i,j] = rand(max) } } end def update(v, w, h) # choose an update rule # v, w, h are GSL::Matrix objects dist, w, h = update_eu(v, w, h) return dist, w, h end def update_eu(v, w, h) # multiplicative update rule # for minimizing euclidean distance # v, w, h are GSL::Matrix objects dist = dist_eu(v, w*h) wt = w.transpose ht = h.transpose h = h.mul_elements(wt * v).div_elements(wt * w * h) w = w.mul_elements(v * ht).div_elements(w * h * ht) return dist, w, h end def update_kl(v, w, h) # multiplicative update rule # for minimizing divergence # v, w, h are GSL::Matrix objects dist = dist_kl(v, w*h) wt = w.transpose ht = h.transpose num = v.div_elements(w * h) rh, ch = h.shape rw, cw = w.shape 0.upto(rh-1) { |i| 0.upto(ch-1) { |j| h[i, j] *= w.col(i).mul(num.col(j)).sum / w.col(i).sum } } 0.upto(rw-1) { |i| 0.upto(cw-1) { |j| w[i, j] *= h.row(j).mul(num.row(i)).sum / h.row(j).sum } } return dist, w, h end # Euclidean distance def dist_eu(a, b) # a,b are GSL::Matrix objects dist = 0 (a-b).collect { |d| dist += d * d} dist end # Kullback-Leibler divergence def dist_kl(a, b) # a, b are GSL::Matrix objects dist = 0 r, c = a.shape 0.upto(r-1) { |i| 0.upto(c-1) { |j| if (b[i,j] == 0) puts "error" dist = 2**32 break end dist += a[i,j] * Math::log(a[i,j]/b[i,j]) - a[i,j] + b[i,j] } } dist end |
A small Ruby snippet to calculate the Tanimoto coefficient (aka extended Jaccard coefficient) of two real vectors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
#!/usr/bin/ruby class Array def sum inject( 0 ) { |sum,x| sum+x } end def sum_square inject( 0 ) { |sum,x| sum+x*x } end def *(other) # dot_product ret = [] return nil if !other.is_a? Array || size != other.size self.each_with_index {|x, i| ret << x * other[i]} ret.sum end end def tanimoto(a, b) dot = (a * b) den = a.sum_square + b.sum_square - dot dot.to_f/den.to_f end # puts tanimoto([1,2,2],[3,3,1]) |
A small Ruby snippet to calculate the Pearson produt-moment correlation coefficient:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
#!/usr/bin/ruby class Array def sum inject( 0 ) { |sum,x| sum+x } end def sum_square inject( 0 ) { |sum,x| sum+x*x } end def mean sum / size end def *(other) # dot product ret = [] return nil if !other.is_a? Array || size != other.size self.each_with_index {|x, i| ret << x * other[i]} ret.sum end end x = Array.new(100) {|i| i } y = Array.new(100) {|i| -i*2 } sumx = x.sum sumy = y.sum num = (x.size * (x*y)) - (sumx * sumy) den = Math.sqrt((x.size * x.sum_square) - sumx * sumx) * Math.sqrt((y.size * y.sum_square) - sumy * sumy) r = num /den puts "X= [#{x.join(', ')}]" puts "Y= [#{y.join(', ')}]" puts "r= #{r}" |
Such a terse and strong abstract, and a really short paper, for an hypothesis that lasts unproved for a century. We shall see if the proof is correct.
I stole an exercise from the Erlang world: the ring benchmark
I used the exercise as a way to get acquainted with a new class in Ruby, Fiber (starting from the version 1.9). Fibers are a way to implement asymmetric coroutines in Ruby,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
#!/usr/bin/ruby1.9 require 'fiber' require 'benchmark' class Ring attr_reader :id attr_accessor :attach def initialize(id) @id = id @fiber = Fiber.new do pass_message end end def |(other) other.attach = self if !other.nil? other end def resume @fiber.resume end def pass_message while message = message_in message_out(message) end end def message_in @attach.resume if !@attach.nil? end def message_out(message) Fiber.yield(message) end end class RingStart < Ring attr_accessor :message def initialize(n, m, message) @m = m @message = message super(n) end def pass_message loop { message_out(@message) } end end def create_chain_r(i, chain) # recursive version return chain if i<=0 r = chain.nil? ? Ring.new(i) : chain | Ring.new(i) create_chain(i-1, r) end def create_chain(n, chain) # loop version # needed to avoid stack overflow for high n n.downto(0) { chain = chain | Ring.new(n) } chain end n=ARGV[0].to_i m=ARGV[1].to_i mess=:hello tm = Benchmark.measure { ringu = RingStart.new(0, m, mess) chain = create_chain(n, ringu) m.times { ringu.message = chain.resume } }.format("%10.6r\n").gsub!(/\(|\)/, "") puts "#{n}, #{m}, #{tm}" |
| N | M | Execution Time | |
|---|---|---|---|
| Ruby [ms] | Erlang [ms] | ||
| 10 | 10 | 1 | 0 |
| 10 | 20 | 2 | 0 |
| 10 | 50 | 4 | 10 |
| 10 | 100 | 10 | 0 |
| 10 | 200 | 18 | 0 |
| 10 | 500 | 42 | 0 |
| 10 | 1000 | 82 | 10 |
| 100 | 10 | 15 | 0 |
| 100 | 20 | 23 | 10 |
| 100 | 50 | 60 | 0 |
| 100 | 100 | 99 | 10 |
| 100 | 200 | 186 | 20 |
| 100 | 500 | 433 | 50 |
| 100 | 1000 | 805 | 90 |
| 1000 | 10 | 320 | 10 |
| 1000 | 20 | 423 | 20 |
| 1000 | 50 | 809 | 60 |
| 1000 | 100 | 1479 | 130 |
| 1000 | 200 | 2803 | 250 |
| 1000 | 500 | 6914 | 600 |
| 1000 | 1000 | 15410 | 1190 |
| 10000 | 10 | 16168 | 190 |
| 10000 | 20 | 17491 | 320 |
| 10000 | 50 | 21348 | 710 |
| 10000 | 100 | 28314 | 1480 |
| 10000 | 200 | 41547 | 2860 |
| 10000 | 500 | 81166 | 7200 |
| 10000 | 1000 | 147570 | 14190 |
Trying to be a cool kid, I’ve recently spent some time playing with Erlang. So I’ve found in Joe Armstrong’s Programming Erlang the following exercise:
”Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M.
Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the Internet! ”
(Already a classic now …).
I dutifully complied: here’s my solution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
-module(ringu). -export([start/2]). % ---------------------------------------------------------------------- start(N, M) -> Pid_0 = self(), statistics(runtime), statistics(wall_clock), Pid = spawn(fun() -> ringu(N, M, self(), Pid_0) end), Pid ! x, receive _ -> true end, {_, TCPU} = statistics(runtime), {_, Time} = statistics(wall_clock), io:format("~nElapsed time: ~w ms~nCPU time: ~w ms ~n" , [Time, TCPU]). % ---------------------------------------------------------------------- ringu(1, M, Pid_1, Pid_0) -> ringu_loop(M, Pid_1), Pid_0 ! itsover; ringu(N, M, Pid_1, Pid_0) -> ringu_loop(M, spawn(fun() -> ringu(N-1, M, Pid_1, Pid_0) end)). ringu_loop(1, Pid) -> ringu_fwd_msg(Pid); ringu_loop(M, Pid) -> ringu_fwd_msg(Pid), ringu_loop(M-1, Pid). ringu_fwd_msg(Pid) -> receive X -> Pid ! X %, io:format(".", []) end. |
Benchmark results, and an (uneven :) ) comparison with an equivalent script in Ruby here.
I’d like to reiterate a past meme. This time I restrict the meme to one single prediction: please tell me wether you think the crude oil price will pass the 150$ per barrell in the very near future, and, if you think that of it’s only matter of weeks, dare to say the exact day. I venture myself to bespeak the 30th of June, 2008.
This blog uses Mephisto, a rails application, and Thin as backend webserver. The frontend webserver is Lighttpd.
While Mephisto has a good caching system, producing a static html file for every visited page, the file is still served via Thin, reducing the speed and the scalability of the system and introducing an unnecessary step.
Lighttpd is very good at serving static files, thus I searched a way to serve a file from the cache directory of Mephisto, if present; I found a hack using mod_magnet.
Hereafter I wrote the relevant snippets of configuration files and how to put every piece in the right position.
Note: the same procedure should apply if you use Mongrel instead of Thin, for the relevant point is to bypass the backend rails webserver and serve all files (whether they exist) from the Mephisto cache.
First I configure Lighttpd to proxy to Thin for the rails application. I assume Thin (or Mongrel) is listening on port 3000 on two hosts.
HTTP[''host"] == "virtualhost.example.org" {
magnet.attract-physical-path-to = ( "/var/www/cache_mephisto.lua" )
proxy.balance = "hash"
proxy.server = ( "" => ( ( "host" => "10.0.0.1", "port" => 3000 ),
( "host" => "10.0.0.2", "port" => 3000 ) ) )
}
The magnet.attract-physical-path-to = ( "/var/www/cache_mephisto.lua" ) is the magic trick.
cache_mephisto.lua is a script written in lua programming language.
The whole script is:
local dir = "/path/to/mephisto/public/cache/" .. lighty.request["Host"]
if ( lighty.env["request.uri"] == nil or lighty.env["request.uri"] == "/") then
file = dir .. "/index.html"
else
file = dir .. lighty.env["request.uri"] .. ".html"
end
if lighty.stat(file) then
lighty.header["Content-Type"] = "text/html"
lighty.content = { { filename = file } }
return 200
end
return lighty.RESTART_REQUEST
Inside the script you should adapt the variable dir to your Mephisto installation path. I need to append the virtual host name lighty.request["Host"] to the directory, for I run a multisite Mephisto installation; if you use a single site mephisto installation, probably you need to remove this append.
Now for some benchmarks:
serving the file via Thin:
Requests per second: 1.40 [#/sec] (mean)
Time per request: 716.007 [ms] (mean)
serving the file from the cache
Requests per second: 11.46 [#/sec] (mean)
Time per request: 87.247 [ms] (mean)
Almost a tenfold improvement!
Future works: serve from the cache the image files.
A thoughtful article: why Us economy is collapsing
A really interesting movie: money as debt
I wrote a small tool, postgresqlgraph, to monitor and plot the statistics of a postgresql database, using rrdtool.
The tool is able to monitor every database created inside a postgresql installation.
The parameters plotted for each database in the graph are:
For each parameter the graph shows the maximum and average values, and plots the derivative (rate change).
The software is published under GPLv3 .
A demo is available here.
Please feel free to submit me bugs, critics, improvements, or tell me if you find the tool of any worthwhileness.
I spent a frustrating week of problem solving (or at least trying to…) of obscure integration issues among different programs, like trying to get zabbix to talk with a jabber server (the solution was found in IPV6), reading unuseful logs to debug problems (what the hell do you mean with “connection failed”? To what?) , recompiling software with arcane options, rebooting servers with the swap space full (apache, what are you doing?), or trying to stop a server that continues to reboot itself at boot (Xen, please tell me why…), or fighting with buggy PHP frontend, and, last but not least, investigate why a monitoring software just stopped working.
So I come to wonder if it exists a software that “just works”. If you ever happen to read this rambling and know such a software, please, please tell me. In the meanwhile I coined a law:
Dela’s Law of software: there is no software that “just works”
Corollary to the Dela’s Law of software: if a software seems to “just work”, after some times it will stop to work for an obscure, unknown and very difficult to find reason.
So, it seems that the simplest universal turing machine has been demonstrated.
The simplest universal turing machine (T. M.) is, well, the simplest: 2 states and 3 symbols (or colors, as they are called by Stephen Wolfram), since a 2-states 2-symbols T.M. can not be universal.
This particular machine has however a subtle issue, for it requires an infinite memory (or ”tape” in the original definition of Turing), not satisfying perfectly the definition of universality given by Davis and Minsky.
The answer to the issue is that the pattern of bits in the memory can be set up without an universal T.M., so the universality is independent from the encoding of the pattern. I wonder if a (2,3) universal T.M. can be built with finite memory.