Posted: December 10th, 2012 | Author: simaom | Filed under: ti | Tags: fp, functional-programming, it, programming, ruby | No Comments »
Don’t you hate when and elsif??
Everyone needs change. I code in ruby most of the day, so I like to
code in Scala just for the kicks.
One of the features I miss in Ruby after coding in scala is
Pattern Matching.
Wouldn’t it be nice to be able to do something like that in ruby
instead of using ugly if’s and whens?
Do you like this code?
class Dummy
def initialize(x, y, z)
@x, @y, @z = 1, 2, 3
end
def sum
@x + @y + @z
end
end
obj = Dummy.new(1, 2, 3)
# Later on... after the sun sets and the moon rises
if obj.responds_to?(:x) && obj.responds_to?(:y)
puts "Do something awesome with dummy.x and dummy.y"
else if obj.kind_of?(Dummy)
puts "This is a dummy"
end
Well, maybe it’s ok, but it can get messy.
So I wrote a small ruby gem to be able to match objects against
patterns. So the above example could be written as:
Matchmaker.match(obj) do
pattern :x, :y do puts "Do something awesome with #{x} and #{y}" end
pattern Dummy do puts "This is a dummy" end
end
Looks good hum?
It’s even better when you are trying to match enums over a set of
conditions:
array_obj = [1, 2, 3, 4]
if array_obj.size == 1
puts "The single element is array_obj[0]"
elsif array_obj.size == 2
first, second = array_obj.first(2)
puts "first: #{first}, second: #{second}"
else
first_element, others = array_obj.first, array_obj[1, array_obj.size]
puts "head is #{first_element}, tail is #{others}"
end
This can be written with Matchmaker like this:
Matchmaker.match(array_obj) do
enum :x do puts "The single element is array_obj[0]" end
enum :x, :y do puts "first: #{x}, second: #{y} end"
enum_cons :x, :xs do puts "head is #{x}, tail is #{xs}" end
end
I think it’s neat :)
There are more example at the Github Repo.
Posted: August 7th, 2012 | Author: simaom | Filed under: ti | Tags: distributed systems, events, poisson, python, ruby, statistics, tutorial | No Comments »
If you ever worked with a big enough distributed system, you know that
at some point you need to test how the system works with a large
amount of traffic before deploying it to production.
One of the methods you can use to get more confident with your system
is to simulate traffic on one end of the system and gather as much
information as you can in order to understand what is happening.
Recently, when developing one of these systems, we needed to build a
simulation that could test the system under a load of 1000
requests/minute.
The problem is that in order to do this, we shouldn’t just generate
1000 events, sleep for a minute, and then generate 1000 more
events. This behavior doesn’t emulate the behavior of users in
production at all and the system would probably perform in a very
different way. What we need is a way to distribute
those 1000 events over a minute, preferably in a way that resembles
production users behavior.
One simple way to do this is model the users behavior as Poisson
process. This way, the events will not be uniformly distributed over a
minute, instead, they will follow a Poisson distribution over time.
We were developing this system in ruby, so I will explain how you can
generate events that simulate a Poisson process using ruby and then
check the results of observe to confirm they are close to what you
would expect.
Generating events
I tried two methods to generate events modeled by a Poisson process.
Method 1: Sleeping between events
If a process can be modeled using a Poisson Process, the
probability distribution of the waiting time until the next occurrence
of an event is an exponential distribution 1.
This means that one way to model this system is to generate an event,
sleep until the next event should be generated and then start over.
We can determine the time our process needs to sleep until the next
event using the following code:
def sleepFor(rateParameter)
-Math.log(1.0 - Random.rand) / rateParameter
end
Where rate parameter is the number of events that should happen in
each unit of time. In our case, 1000 events per minute, that is about
17 events per second. The rate parameter is then our λ for our
Poisson Process.
Jeff Presshing wrote a very good blog post that explains this
rational in more detail.
So with our sleepFor function in place, to generate events in your
system, we just need the following code:
require 'socket'
def sleepFor(rateParameter)
-Math.log(1.0 - Random.rand) / rateParameter
end
def generate_event
c = UDPSocket.new
c.send('hello', 0, 'localhost', 6767)
c.close
end
20000.times do
generate_event
sleep_for = sleepFor(17) # lambda = 17
sleep(sleep_for)
end
Problem: Does sleep have enough granularity?
For a small rateParameter, this works as expected, see below to
learn how you can confirm this. For a big number of events events per
second (a big λ), the sleep function does not have sufficient
granularity. This means that sleep does not actually sleep for enough time, so
we need to use a different approach.
Method 2: Constantly check if an event should be generated
Another approach would be to not use sleep at all. Start an infinite
loop and constantly check if we should generate an event in the
current iteration. For example:
require 'socket'
LAMBDA = 17
def f_next_time(lambda)
-Math.log(1 - Random.rand) / lambda
end
def generate_event
c = UDPSocket.new
c.send('hello', 0, 'localhost', 6767)
c.close
end
start = Time.now.to_i
next_time = 0
2000.times do
time = Time.now.to_f
if time >= next_time
generate_event
next_time = time + f_next_time(LAMBDA)
end
end
This solution is more CPU intensive, since it’s executing an infinite
loop checking if it should generate an event, but the results should
be more precise, especially for a bigger lambda value.
Caveats
Get as much information about your system as much as you can
Your system is probably more complex than this, so you should measure
and get as much information as possible in order to ensure that you
are generating enough events and that those events are properly
distributed in each second.
Time required to generate an event
In my example, the time to generate an event is actually negligible,
but if your events take more time to generate than the inter-arrival
time of the events, this will not work! Make your event generation is
fast enough or try to find another solutions.
Observing events and saving data
To check if the generated events are following our Poisson process, we
need to collect those events and group them by the second in which
they happened, then we can analyze the results.
This can be done with the following code.
require 'json'
require 'socket'
def start_server port
s = UDPSocket.new
s.bind(nil, port)
data = Hash.new(0)
start = nil
Signal.trap('SIGINT') do
File.open('poisson.json','w') do |f|
f.write(data.to_json)
end
puts data
exit
end
while true
msg, sender = s.recvfrom(256)
start = Time.now.to_f if start.nil?
bucket = (Time.now.to_f - start).round.to_i
data[bucket] += 1
end
end
start_server 6767
Using this code, when the server is interrupted, using SIGINT
(CTRL-C), the results will be saved to a json file than can then be
analyzed.
Analyzing results
With the results collected using the above scripts, we can easily analyze the number of events
generated in each second and check if they are distributed in time as we expect them to.
I used Python with matplotlib to process and visualize the results.
λ = 17 Events/Second
Since the results are already grouped by arrival second, we can plot
the number of events that happened in each second.
We can then generate enough random samples using a Poisson
distribution with λ = 17 and plot this data against our
results.
If the plots are similar enough, then the events were generated
following a Poisson process.
There are better and more exact ways to check this, for my problem
this is just good enough. You should use whatever method you feel
comfortable with.
With λ = 17 and using method 2, I got the following results:

As you can see, the standard deviation and mean are very close to the
values you would expected (between parenthesis) and plotted data seems
to also match our expectations.
Conclusion
If you are trying to simulate traffic into your system don’t just
throw traffic sequentially into it. Try to find a way to simulate
production traffic. Even if you end up not using a perfect model of
expected traffic, using some kind of model is always better than
just generate uniformly distributed events.
If you want to try to model your production traffic using a Poisson
process, use method 2 describe above if you don’t mind the CPU
intensive algorithm or use method 1 if your λ is small enough.
Don’t forget to always check your results against or expectations.
Posted: August 5th, 2012 | Author: simaom | Filed under: ti | Tags: css, markdown, python, tools | No Comments »
MDUnify produces a single unified and beautiful HTML document from a Markdown
file.
I like using markdown to organize my text files, but when I need to send them to someone they are usually not very beautiful. The markdown compiler does not include any CSS when converting markdown to HTML, so you’ll need to add them yourself.
I built a small tool that simply compiles your markdown files to HTML and includes beautiful default stylesheets to produce a nice HTML file. Currently, mdunify uses the stylesheets from the Blueprint CSS project, including the typography CSS styles, so the resulting output is a nicely formatted, typographically beautiful and easy to read document.
If your markdown document also includes images, MDUnify will download the images and include them inside your HTML file, so if you need to send the file to someone, you can send a single file instead of having to send multiple files.
You can download mdunify at my github repository.
Posted: September 12th, 2011 | Author: simaom | Filed under: ti | Tags: backbone, javascript, js, persistence, rest | No Comments »
Backbone.persistence is a simple adapter to use Backbone.js without using the REST persistence layer.
Backbone is great, but not only for applications that use a remote persistence mechanism, for example, the demo TODO app uses the HTML5 localStorage instead of the usual REST layer.
Sometimes you don’t even need to use localStorage, for example, when you don’t need to save data between two different sessions for the same user. If you want to run your unit tests using Rhino using Jenkins, you can’t use things like localStorage.
That’s where Backbone.persistance comes in, it overrides Backbone.Sync to persist all the data in memory instead of using a persistence mechanism.
You can download Backbone.persistance at my github repository.
Posted: October 1st, 2010 | Author: simaom | Filed under: geek, informação, portugal, ti | 4 Comments »
A few months ago I started using a new service provided by carris to check how much time I have to wait for the bus.
Problem is, people at Carris never heard about webservices and rest apis, so I had to send an e-mail and wait for an automated reply with information about waiting times for the stop I was in.
So I wrote an Google App Engine App to wrap this operation and manage this process for me. All I have to do is provide the stop code and wait for the results.
You can check out the app here: http://lxbusinfo.appspot.com
The app is NOT beautiful when seen on mobile devices, but it’s at least usable. I’ll get around to that as soon as I have some free time, but that is somewhat related with the second part of this post:
Since I developed a REST API to do this, you can also build your own app to check bus waiting times. Quite easily, in fact.
The process of obtaining bus information is of course asynchronous, since my app has to wait for an e-mail from carris.
All your App has to do is send a POST request with the stopcode parameter to http://lxbusinfo.appspot.com/api/newBusRequest You’ll receive a reply like this one:
newBusRequest
Form data: stopcode:10503
[
{
"requestid": "d71175fc592e4557b76d04f836dae30ef7afcb1d",
"statuscode": 1
}
]
This way, you’ll receive a requestid you can use to poll the server at http://lxbusinfo.appspot.com/api/updateBusRequest using a GET request. This API call receives a requestid as parameter. Please wait at least 5 seconds between update requests. Here’s an example of a GET request on updateBusRequest:
updateBusRequest?requestid=d71175fc592e4557b76d04f836dae30ef7afcb1d
[
{
"message": "",
"payload": [
{
"dest": "PORTAS BENFICA",
"eta_minutes": 4,
"pt_timestamp": "21:09",
"last_modified": "2010-10-01T20:04:41.481256",
"busnr": 758
},
{
"dest": "PR. REAL",
"eta_minutes": 12,
"pt_timestamp": "21:17",
"last_modified": "2010-10-01T20:04:41.556630",
"busnr": 790
}
],
"statuscode": 0
}
]
As you can see, all replys are json encoded so you can parse them easily with JavaScript.
There are other replies and error codes, leave a comment if you want to implement something and I’ll send you a nicer specification.
I’ll open the source and setup a github repository as soon as I can get around to shape things up a bit.
I want to make this open source not only because I want to share the source but also because probably carris will do something to block my app, but if the source is open anyone can install the app elsewhere and setup a new bus information server. Problem is I don’t want to share the source with Carris. :) Any thoughts on this? Lets hear it in the comments.
Posted: February 15th, 2010 | Author: simaom | Filed under: ti | Tags: git, github, python, scripts, ti | No Comments »
I just set up a github repository to hold my code.
Here’s the link http://github.com/simao/mycode
Currently, the repository contains only the code of my latest Python script, rssTorrents.py.
Posted: January 25th, 2010 | Author: simaom | Filed under: geek, ti | Tags: download, feeds, python, rss, torrentz, transmission | 5 Comments »
I was looking for a way to parse a RSS feed I built using yahoo pipes and add new torrents to Transmission to download them automatically.
I couldn’t find anything useful, so I just wrote a python script to do just that.
If you want to use it, you’ll need to configure the first lines of this file to suit your needs.
The script is pretty self explanatory.
You can always find the latest version of the script at my github repo
Disclaimer:
I use this script to download legal torrents ;)
Posted: June 25th, 2009 | Author: simaom | Filed under: apple, geek, ti | Tags: apple, backup, jungle disk, leopard, network, tcp | 2 Comments »
I recently signed up for an account at Jungle disk, http://www.jungledisk.com.
I’m paranoid about backups, I use Time Machine to do a full weekly backup and Jungle Disk as an off-site backup solution. It seemed the cheapest option since you only pay for what you upload, and although I have a full 160GB hard drive, my sensitive files only total about 10 GB. At 0.18$/Month that’s 1,8$/Month + bandwidth.
Jungle Disk Uploads files to an Amazon S3 disk, in my case, located in Europe. I chose to pay 0.03$/Month + bandwith extra for that location because I thought latency would influence the speed of my backups, that’s why I went with Amazon over Rackspace. I’ll probably migrate anyway when Jungle Disk offers a migration tool for this.
I have access to an internet connection with a symmetric 100 Mbit link so I was surprised when I noticed jungle disk was only using about 40 KB/s. After thinking about it, it actually makes sense.
The reason for the slowness of the backup is due to several factors.
Jungle Disk Uploads Individual files, not a big compressed file containing all files to be backed up.
In practice, this means JD does not have a continuous stream of bytes to upload like with a big file, instead it has to stop sending information while preparing to send the next file (including encrypting, it see next item). Besides, it needs to setup the S3 file system to receive the new file. During this time, the TCP connection is almost stopped. When JD starts uploading the next file, the TCP connection already lost all it’s speed, that’s why the speed is only high when JD is uploading big files, the TCP connection has enough time to adjust and recognize the speed of the link.
Jungle Disk Encrypts the files using AES before sending it over the internet through a SSL connection. (UPDATE: This is wrong, see comments)
This means JD has to stop sending files until it finishes encrypting and entire file. This could be solved if JD could encrypt files at the same time it sends the previous file. This could reduce the time the TCP connection is stopped.
Uploading only differences is not really an improvement.
While it’s true that uploading only new files or changed files is a big improvement, uploading only the differences of the files themselves it’s not that big of an improvement. When you think about it, what files have you edited lately with small differences? Text files? So you only send 10 KB instead of 50KB? You gain 40KB? That’s nothing in today’s bandwidth speeds. When you edit a big file, like a big image, you most likely edited a big part of the file and you still have to upload most bytes of the file.
Jungle Disk is a really nice service, and I think it’s the best you can get. Probably there’s not an off-site backup solution that isn’t slow. Besides, it’s only slow the first time, when you have to upload 10GB in one time, after that you only have to upload new or changed files, that’s about 1GB per backup in my case.
If you don’t have an off-site backup solution yet, Jungle Disk is the way to go. And you DO need an off-site backup solution, right?