Ruby Pattern Matching

Posted: December 10th, 2012 | Author: | Filed under: ti | Tags: , , , , | No Comments »

Don’t you hate when and elsif??

Everyone needs change. I code in ruby most of the day, so I like to code in Scala just for the kicks.

One of the features I miss in Ruby after coding in scala is Pattern Matching.

Wouldn’t it be nice to be able to do something like that in ruby instead of using ugly if’s and whens?

Do you like this code?

class Dummy
    def initialize(x, y, z)
        @x, @y, @z = 1, 2, 3
    end

    def sum
        @x + @y + @z
    end
end

obj = Dummy.new(1, 2, 3)

# Later on... after the sun sets and the moon rises

if obj.responds_to?(:x) && obj.responds_to?(:y)
    puts "Do something awesome with dummy.x and dummy.y"
else if obj.kind_of?(Dummy)
    puts "This is a dummy"
end

Well, maybe it’s ok, but it can get messy.

So I wrote a small ruby gem to be able to match objects against patterns. So the above example could be written as:

Matchmaker.match(obj) do
    pattern :x, :y do puts "Do something awesome with #{x} and #{y}" end
    pattern Dummy do puts "This is a dummy" end
end

Looks good hum?

It’s even better when you are trying to match enums over a set of conditions:

array_obj = [1, 2, 3, 4]

if array_obj.size == 1
    puts "The single element is array_obj[0]"
elsif array_obj.size == 2
    first, second = array_obj.first(2)
    puts "first: #{first}, second: #{second}"
else
    first_element, others = array_obj.first, array_obj[1, array_obj.size]
    puts "head is #{first_element}, tail is #{others}"
end

This can be written with Matchmaker like this:

Matchmaker.match(array_obj) do
    enum :x do puts "The single element is array_obj[0]" end
    enum :x, :y do puts "first: #{x}, second: #{y} end"
    enum_cons :x, :xs do puts "head is #{x}, tail is #{xs}" end
end

I think it’s neat :)

There are more example at the Github Repo.


Simulating a Poisson process with Ruby

Posted: August 7th, 2012 | Author: | Filed under: ti | Tags: , , , , , , | No Comments »

If you ever worked with a big enough distributed system, you know that at some point you need to test how the system works with a large amount of traffic before deploying it to production.

One of the methods you can use to get more confident with your system is to simulate traffic on one end of the system and gather as much information as you can in order to understand what is happening.

Recently, when developing one of these systems, we needed to build a simulation that could test the system under a load of 1000 requests/minute.

The problem is that in order to do this, we shouldn’t just generate 1000 events, sleep for a minute, and then generate 1000 more events. This behavior doesn’t emulate the behavior of users in production at all and the system would probably perform in a very different way. What we need is a way to distribute those 1000 events over a minute, preferably in a way that resembles production users behavior.

One simple way to do this is model the users behavior as Poisson process. This way, the events will not be uniformly distributed over a minute, instead, they will follow a Poisson distribution over time.

We were developing this system in ruby, so I will explain how you can generate events that simulate a Poisson process using ruby and then check the results of observe to confirm they are close to what you would expect.

Generating events

I tried two methods to generate events modeled by a Poisson process.

Method 1: Sleeping between events

If a process can be modeled using a Poisson Process, the probability distribution of the waiting time until the next occurrence of an event is an exponential distribution 1.

This means that one way to model this system is to generate an event, sleep until the next event should be generated and then start over.

We can determine the time our process needs to sleep until the next event using the following code:

def sleepFor(rateParameter)
    -Math.log(1.0 - Random.rand) / rateParameter
end

Where rate parameter is the number of events that should happen in each unit of time. In our case, 1000 events per minute, that is about 17 events per second. The rate parameter is then our λ for our Poisson Process.

Jeff Presshing wrote a very good blog post that explains this rational in more detail.

So with our sleepFor function in place, to generate events in your system, we just need the following code:

require 'socket'

def sleepFor(rateParameter)
    -Math.log(1.0 - Random.rand) / rateParameter
end

def generate_event
    c = UDPSocket.new
    c.send('hello', 0, 'localhost', 6767)
    c.close
end

20000.times do
    generate_event

    sleep_for = sleepFor(17) # lambda = 17
    sleep(sleep_for)
end

Problem: Does sleep have enough granularity?

For a small rateParameter, this works as expected, see below to learn how you can confirm this. For a big number of events events per second (a big λ), the sleep function does not have sufficient granularity. This means that sleep does not actually sleep for enough time, so we need to use a different approach.

Method 2: Constantly check if an event should be generated

Another approach would be to not use sleep at all. Start an infinite loop and constantly check if we should generate an event in the current iteration. For example:

require 'socket'

LAMBDA = 17

def f_next_time(lambda)
    -Math.log(1 - Random.rand) / lambda
end

def generate_event
    c = UDPSocket.new
    c.send('hello', 0, 'localhost', 6767)
    c.close
end

start = Time.now.to_i
next_time = 0

2000.times do
    time = Time.now.to_f

    if time >= next_time
        generate_event
        next_time = time + f_next_time(LAMBDA)
    end
end

This solution is more CPU intensive, since it’s executing an infinite loop checking if it should generate an event, but the results should be more precise, especially for a bigger lambda value.

Caveats

Get as much information about your system as much as you can

Your system is probably more complex than this, so you should measure and get as much information as possible in order to ensure that you are generating enough events and that those events are properly distributed in each second.

Time required to generate an event

In my example, the time to generate an event is actually negligible, but if your events take more time to generate than the inter-arrival time of the events, this will not work! Make your event generation is fast enough or try to find another solutions.

Observing events and saving data

To check if the generated events are following our Poisson process, we need to collect those events and group them by the second in which they happened, then we can analyze the results.

This can be done with the following code.

require 'json'
require 'socket'

def start_server port
  s = UDPSocket.new
  s.bind(nil, port)
  data = Hash.new(0)
  start = nil

  Signal.trap('SIGINT') do
    File.open('poisson.json','w') do |f|
      f.write(data.to_json)
    end

    puts data

    exit
  end

while true
    msg, sender = s.recvfrom(256)
    start = Time.now.to_f if start.nil?
    bucket = (Time.now.to_f - start).round.to_i

    data[bucket] += 1
  end
end

start_server 6767

Using this code, when the server is interrupted, using SIGINT (CTRL-C), the results will be saved to a json file than can then be analyzed.

Analyzing results

With the results collected using the above scripts, we can easily analyze the number of events generated in each second and check if they are distributed in time as we expect them to.

I used Python with matplotlib to process and visualize the results.

λ = 17 Events/Second

Since the results are already grouped by arrival second, we can plot the number of events that happened in each second.

We can then generate enough random samples using a Poisson distribution with λ = 17 and plot this data against our results.

If the plots are similar enough, then the events were generated following a Poisson process.

There are better and more exact ways to check this, for my problem this is just good enough. You should use whatever method you feel comfortable with.

With λ = 17 and using method 2, I got the following results:

As you can see, the standard deviation and mean are very close to the values you would expected (between parenthesis) and plotted data seems to also match our expectations.

Conclusion

If you are trying to simulate traffic into your system don’t just throw traffic sequentially into it. Try to find a way to simulate production traffic. Even if you end up not using a perfect model of expected traffic, using some kind of model is always better than just generate uniformly distributed events.

If you want to try to model your production traffic using a Poisson process, use method 2 describe above if you don’t mind the CPU intensive algorithm or use method 1 if your λ is small enough.

Don’t forget to always check your results against or expectations.


MDUnify – Produce beautiful, unified markdown files

Posted: August 5th, 2012 | Author: | Filed under: ti | Tags: , , , | No Comments »

MDUnify produces a single unified and beautiful HTML document from a Markdown file.

I like using markdown to organize my text files, but when I need to send them to someone they are usually not very beautiful. The markdown compiler does not include any CSS when converting markdown to HTML, so you’ll need to add them yourself.

I built a small tool that simply compiles your markdown files to HTML and includes beautiful default stylesheets to produce a nice HTML file. Currently, mdunify uses the stylesheets from the Blueprint CSS project, including the typography CSS styles, so the resulting output is a nicely formatted, typographically beautiful and easy to read document.

If your markdown document also includes images, MDUnify will download the images and include them inside your HTML file, so if you need to send the file to someone, you can send a single file instead of having to send multiple files.

You can download mdunify at my github repository.


Use Backbone JS without REST

Posted: September 12th, 2011 | Author: | Filed under: ti | Tags: , , , , | No Comments »

Backbone.persistence is a simple adapter to use Backbone.js without using the REST persistence layer.

Backbone is great, but not only for applications that use a remote persistence mechanism, for example, the demo TODO app uses the HTML5 localStorage instead of the usual REST layer.

Sometimes you don’t even need to use localStorage, for example, when you don’t need to save data between two different sessions for the same user. If you want to run your unit tests using Rhino using Jenkins, you can’t use things like localStorage.

That’s where Backbone.persistance comes in, it overrides Backbone.Sync to persist all the data in memory instead of using a persistence mechanism.

You can download Backbone.persistance at my github repository.


What I learned from Whitfield Diffie and Martin Hellman

Posted: January 13th, 2011 | Author: | Filed under: blog, informação | Tags: , , | No Comments »

A few years ago, while attending a “Computer Systems and Networks Security” class, I was talking with a friend of mine. We said to each other: “There’s no way in hell that two principals can exchange a key without a pre-shared key of some sort“.

A few minutes later, our professor introduced us to the Diffie-Hellman method for key exchange.

As you may know, the Diffie-Hellman method allows just that, two parties can share a secret key without any prior knowledge of each other. That secret key can then be used to encrypt data or whatever anyone wants to do with a secret key.

This made me think. There we were thinking a problem was plain impossible and a few minutes later we saw a solution that seemed way too simple for such a hard problem. But it works.

Whitfield Diffie and Martin Hellman found an “impossible” problem and transformed it in a problem with a solution. They didn’t say “That’s just impossible, let’s just solve other problem”.

From that day on, when I find an “impossible” problem I remember the day I heard about the D-H method and I think that maybe, just maybe, there’s a solution hidden somewhere. It motivates me to dig deeper, try harder and never dismiss a problem as “impossible”.


AppEngineLogTz – A User Script to convert timestamps in the Google App Engine logs to the browser’s current timezone

Posted: October 4th, 2010 | Author: | Filed under: geek | Tags: , , , | 2 Comments »

I noticed the time stamps used by the Google App Engine to show the application logs were all in  the Pacific Standard time Timezone. That’s just not useful for everyone not living in the pacific coast.

So I wrote an user script you can use to show the time stamps in your browser’s timezone.

I tested my script on Firefox with greasemonkey and google chrome and it seems to work.

You can get it here: http://github.com/simao/mycode/raw/master/UserScripts/appenginelogtz.user.js

You can also fork this script at github and change it. If you do, please push your changes back to the repository. :)


LXBus – Obtaining Lisbon bus waiting time information

Posted: October 1st, 2010 | Author: | Filed under: geek, informação, portugal, ti | 4 Comments »

A few months ago I started using a new service provided by carris to check how much time I have to wait for the bus.

Problem is, people at Carris never heard about webservices and rest apis, so I had to send an e-mail and wait for an automated reply with information about waiting times for the stop I was in.

So I wrote an Google App Engine App to wrap this operation and manage this process for me. All I have to do is provide the stop code and wait for the results.

You can check out the app here: http://lxbusinfo.appspot.com

The app is NOT beautiful  when seen on mobile devices, but it’s at least usable. I’ll get around to that as soon as I have some free time, but that is somewhat related with the second part of this post:

Since I developed a REST API to do this, you can also build your own app to check bus waiting times. Quite easily, in fact.

The process of obtaining bus information is of course asynchronous, since my app has to wait for an e-mail from carris.

All your App has to do is send a POST request with the stopcode parameter to http://lxbusinfo.appspot.com/api/newBusRequest You’ll receive a reply like this one:

newBusRequest Form data: stopcode:10503

[
    {
        "requestid": "d71175fc592e4557b76d04f836dae30ef7afcb1d",
        "statuscode": 1
    }
]

This way, you’ll receive a requestid you can use to poll the server at http://lxbusinfo.appspot.com/api/updateBusRequest using a GET request. This API call receives a requestid as parameter. Please wait at least 5 seconds between update requests. Here’s an example of a GET request on updateBusRequest:

updateBusRequest?requestid=d71175fc592e4557b76d04f836dae30ef7afcb1d

[
    {
        "message": "",
        "payload": [
            {
                "dest": "PORTAS BENFICA",
                "eta_minutes": 4,
                "pt_timestamp": "21:09",
                "last_modified": "2010-10-01T20:04:41.481256",
                "busnr": 758
            },
            {
                "dest": "PR. REAL",
                "eta_minutes": 12,
                "pt_timestamp": "21:17",
                "last_modified": "2010-10-01T20:04:41.556630",
                "busnr": 790
            }
        ],
        "statuscode": 0
    }
]

As you can see, all replys are json encoded so you can parse them easily with JavaScript.

There are other replies and error codes, leave a comment if you want to implement something and I’ll send you a nicer specification.

I’ll open the source and setup a github repository as soon as I can get around to shape things up a bit.

I want to make this open source not only because I want to share the source but also because probably carris will do something to block my app, but if the source is open anyone can install the app elsewhere and setup a new bus information server. Problem is I don’t want to share the source with Carris. :) Any thoughts on this? Lets hear it in the comments.


Browser fingerprinting

Posted: May 31st, 2010 | Author: | Filed under: blog, informação | Tags: , , , , | No Comments »

Tomei conhecimento do estudo da Electronic Frontier Foundation intitulado “Web Browsers Leave ‘Fingerprints’ Behind as You Surf the Net”, através do Miguel Almeida, mas já vi várias opiniões em vários sítios diferentes.

Apesar de achar que os resultados do estudo são de facto preocupantes, acho que se está a exagerar bastante.

Ao correr o teste disponível aqui, obtive os seguintes resultados:

Your browser fingerprint appears to be unique among the 1,763 tested so far. Currently, we estimate that your browser has a fingerprint that conveys at least 10.78 bits of identifying information.
Browser Characteristic bits of identifying information one in x browsers have this value value
User Agent -0.84 0.56 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
HTTP_ACCEPT Headers -6.98 0.01 text/html, */* ISO-8859-1,utf-8;q=0.7,*;q=0.7 gzip,deflate en-us,en;q=0.5
Browser Plugin Details 10.78+ 1763 A bunch of them
Time Zone -7.13 0.01 -60
Screen Size and Color Depth -5.38 0.02 1280x800x24
System Fonts 6.26 76.65 A bunch of them
Are Cookies Enabled? -8.89 0 Yes
Limited supercookie test -8.03 0 DOM localStorage: Yes, DOM sessionStorage: Yes, IE userData: No

Analisando os resultados linha a linha, podemos ver que o único parâmetro que é verdadeiramente único é o “Browser Plugin Details” e em segundo lugar o “System fonts”, sendo que 1 em cada 76 browsers testados têm as mesmas fontes que eu. Ou seja, o meu browser é unico entre os 1963 browsers testados, mas apenas pela minha lista de plugins. Ora em 1763 browsers testados, não acho que o problema seja tão grave como se faz parecer, ainda para mais quando é vulnerável ao uso de plausible deniability.

Como se pode ver, há um certo exagero nas opiniões que tenho visto sobre o estudo. Na prática, apesar de se provar que de facto o nosso browser está longe de ser comum, e consequentemente garantir qualquer tipo de anonimato, também não é tão particular como se faz parecer.

Just my 2 cents.


PrinScreen RSS File – Long live Prt.sc… sort of

Posted: April 12th, 2010 | Author: | Filed under: blog, Uncategorized | Tags: , , , , | 1 Comment »

Recently, one of my favorite blog aggregators died, but left us with a nice OPML file containing all rss feeds from the authors of the aggregator.

Yahoo! My pipes couldn’t parse the opml file, so I built a python script to do it and hosted it in my server.

The RSS File is updated every hour and contains all RSS entries from all Prt.Sc authors not older than seven days.

You can get the RSS file at http://simaom.com/prtsc.xml.

So prt.sc lives… well… sort of… I won’t update the opml feed with any authors, we only get access to rss entries from the authors included in the last version of Prt.sc.

Let me know what you think.


opml2rss.py – An opml to rss converter

Posted: April 12th, 2010 | Author: | Filed under: blog | Tags: , , , , | 1 Comment »

I just uploaded another script to my github repository.

It’s a python script to parse an opml file and generate a rss file with entries from all rss feeds in the rss file not older than a certain number of days.

You can get the script here: http://github.com/…/opml2rss.py

The script has a few configuration parameters that are pretty self explanatory, should be easy. You also need to install a few python modules: Feedparser, OPML and PYRSS2Gen.