Posted: December 10th, 2012 | Author: simaom | Filed under: ti | Tags: fp, functional-programming, it, programming, ruby | No Comments »
Don’t you hate when and elsif??
Everyone needs change. I code in ruby most of the day, so I like to
code in Scala just for the kicks.
One of the features I miss in Ruby after coding in scala is
Pattern Matching.
Wouldn’t it be nice to be able to do something like that in ruby
instead of using ugly if’s and whens?
Do you like this code?
class Dummy
def initialize(x, y, z)
@x, @y, @z = 1, 2, 3
end
def sum
@x + @y + @z
end
end
obj = Dummy.new(1, 2, 3)
# Later on... after the sun sets and the moon rises
if obj.responds_to?(:x) && obj.responds_to?(:y)
puts "Do something awesome with dummy.x and dummy.y"
else if obj.kind_of?(Dummy)
puts "This is a dummy"
end
Well, maybe it’s ok, but it can get messy.
So I wrote a small ruby gem to be able to match objects against
patterns. So the above example could be written as:
Matchmaker.match(obj) do
pattern :x, :y do puts "Do something awesome with #{x} and #{y}" end
pattern Dummy do puts "This is a dummy" end
end
Looks good hum?
It’s even better when you are trying to match enums over a set of
conditions:
array_obj = [1, 2, 3, 4]
if array_obj.size == 1
puts "The single element is array_obj[0]"
elsif array_obj.size == 2
first, second = array_obj.first(2)
puts "first: #{first}, second: #{second}"
else
first_element, others = array_obj.first, array_obj[1, array_obj.size]
puts "head is #{first_element}, tail is #{others}"
end
This can be written with Matchmaker like this:
Matchmaker.match(array_obj) do
enum :x do puts "The single element is array_obj[0]" end
enum :x, :y do puts "first: #{x}, second: #{y} end"
enum_cons :x, :xs do puts "head is #{x}, tail is #{xs}" end
end
I think it’s neat :)
There are more example at the Github Repo.
Posted: August 7th, 2012 | Author: simaom | Filed under: ti | Tags: distributed systems, events, poisson, python, ruby, statistics, tutorial | No Comments »
If you ever worked with a big enough distributed system, you know that
at some point you need to test how the system works with a large
amount of traffic before deploying it to production.
One of the methods you can use to get more confident with your system
is to simulate traffic on one end of the system and gather as much
information as you can in order to understand what is happening.
Recently, when developing one of these systems, we needed to build a
simulation that could test the system under a load of 1000
requests/minute.
The problem is that in order to do this, we shouldn’t just generate
1000 events, sleep for a minute, and then generate 1000 more
events. This behavior doesn’t emulate the behavior of users in
production at all and the system would probably perform in a very
different way. What we need is a way to distribute
those 1000 events over a minute, preferably in a way that resembles
production users behavior.
One simple way to do this is model the users behavior as Poisson
process. This way, the events will not be uniformly distributed over a
minute, instead, they will follow a Poisson distribution over time.
We were developing this system in ruby, so I will explain how you can
generate events that simulate a Poisson process using ruby and then
check the results of observe to confirm they are close to what you
would expect.
Generating events
I tried two methods to generate events modeled by a Poisson process.
Method 1: Sleeping between events
If a process can be modeled using a Poisson Process, the
probability distribution of the waiting time until the next occurrence
of an event is an exponential distribution 1.
This means that one way to model this system is to generate an event,
sleep until the next event should be generated and then start over.
We can determine the time our process needs to sleep until the next
event using the following code:
def sleepFor(rateParameter)
-Math.log(1.0 - Random.rand) / rateParameter
end
Where rate parameter is the number of events that should happen in
each unit of time. In our case, 1000 events per minute, that is about
17 events per second. The rate parameter is then our λ for our
Poisson Process.
Jeff Presshing wrote a very good blog post that explains this
rational in more detail.
So with our sleepFor function in place, to generate events in your
system, we just need the following code:
require 'socket'
def sleepFor(rateParameter)
-Math.log(1.0 - Random.rand) / rateParameter
end
def generate_event
c = UDPSocket.new
c.send('hello', 0, 'localhost', 6767)
c.close
end
20000.times do
generate_event
sleep_for = sleepFor(17) # lambda = 17
sleep(sleep_for)
end
Problem: Does sleep have enough granularity?
For a small rateParameter, this works as expected, see below to
learn how you can confirm this. For a big number of events events per
second (a big λ), the sleep function does not have sufficient
granularity. This means that sleep does not actually sleep for enough time, so
we need to use a different approach.
Method 2: Constantly check if an event should be generated
Another approach would be to not use sleep at all. Start an infinite
loop and constantly check if we should generate an event in the
current iteration. For example:
require 'socket'
LAMBDA = 17
def f_next_time(lambda)
-Math.log(1 - Random.rand) / lambda
end
def generate_event
c = UDPSocket.new
c.send('hello', 0, 'localhost', 6767)
c.close
end
start = Time.now.to_i
next_time = 0
2000.times do
time = Time.now.to_f
if time >= next_time
generate_event
next_time = time + f_next_time(LAMBDA)
end
end
This solution is more CPU intensive, since it’s executing an infinite
loop checking if it should generate an event, but the results should
be more precise, especially for a bigger lambda value.
Caveats
Get as much information about your system as much as you can
Your system is probably more complex than this, so you should measure
and get as much information as possible in order to ensure that you
are generating enough events and that those events are properly
distributed in each second.
Time required to generate an event
In my example, the time to generate an event is actually negligible,
but if your events take more time to generate than the inter-arrival
time of the events, this will not work! Make your event generation is
fast enough or try to find another solutions.
Observing events and saving data
To check if the generated events are following our Poisson process, we
need to collect those events and group them by the second in which
they happened, then we can analyze the results.
This can be done with the following code.
require 'json'
require 'socket'
def start_server port
s = UDPSocket.new
s.bind(nil, port)
data = Hash.new(0)
start = nil
Signal.trap('SIGINT') do
File.open('poisson.json','w') do |f|
f.write(data.to_json)
end
puts data
exit
end
while true
msg, sender = s.recvfrom(256)
start = Time.now.to_f if start.nil?
bucket = (Time.now.to_f - start).round.to_i
data[bucket] += 1
end
end
start_server 6767
Using this code, when the server is interrupted, using SIGINT
(CTRL-C), the results will be saved to a json file than can then be
analyzed.
Analyzing results
With the results collected using the above scripts, we can easily analyze the number of events
generated in each second and check if they are distributed in time as we expect them to.
I used Python with matplotlib to process and visualize the results.
λ = 17 Events/Second
Since the results are already grouped by arrival second, we can plot
the number of events that happened in each second.
We can then generate enough random samples using a Poisson
distribution with λ = 17 and plot this data against our
results.
If the plots are similar enough, then the events were generated
following a Poisson process.
There are better and more exact ways to check this, for my problem
this is just good enough. You should use whatever method you feel
comfortable with.
With λ = 17 and using method 2, I got the following results:

As you can see, the standard deviation and mean are very close to the
values you would expected (between parenthesis) and plotted data seems
to also match our expectations.
Conclusion
If you are trying to simulate traffic into your system don’t just
throw traffic sequentially into it. Try to find a way to simulate
production traffic. Even if you end up not using a perfect model of
expected traffic, using some kind of model is always better than
just generate uniformly distributed events.
If you want to try to model your production traffic using a Poisson
process, use method 2 describe above if you don’t mind the CPU
intensive algorithm or use method 1 if your λ is small enough.
Don’t forget to always check your results against or expectations.
Posted: August 5th, 2012 | Author: simaom | Filed under: ti | Tags: css, markdown, python, tools | No Comments »
MDUnify produces a single unified and beautiful HTML document from a Markdown
file.
I like using markdown to organize my text files, but when I need to send them to someone they are usually not very beautiful. The markdown compiler does not include any CSS when converting markdown to HTML, so you’ll need to add them yourself.
I built a small tool that simply compiles your markdown files to HTML and includes beautiful default stylesheets to produce a nice HTML file. Currently, mdunify uses the stylesheets from the Blueprint CSS project, including the typography CSS styles, so the resulting output is a nicely formatted, typographically beautiful and easy to read document.
If your markdown document also includes images, MDUnify will download the images and include them inside your HTML file, so if you need to send the file to someone, you can send a single file instead of having to send multiple files.
You can download mdunify at my github repository.
Posted: September 12th, 2011 | Author: simaom | Filed under: ti | Tags: backbone, javascript, js, persistence, rest | No Comments »
Backbone.persistence is a simple adapter to use Backbone.js without using the REST persistence layer.
Backbone is great, but not only for applications that use a remote persistence mechanism, for example, the demo TODO app uses the HTML5 localStorage instead of the usual REST layer.
Sometimes you don’t even need to use localStorage, for example, when you don’t need to save data between two different sessions for the same user. If you want to run your unit tests using Rhino using Jenkins, you can’t use things like localStorage.
That’s where Backbone.persistance comes in, it overrides Backbone.Sync to persist all the data in memory instead of using a persistence mechanism.
You can download Backbone.persistance at my github repository.
Posted: January 13th, 2011 | Author: simaom | Filed under: blog, informação | Tags: problems, solutions, thoughts | No Comments »
A few years ago, while attending a “Computer Systems and Networks Security” class, I was talking with a friend of mine. We said to each other: “There’s no way in hell that two principals can exchange a key without a pre-shared key of some sort“.
A few minutes later, our professor introduced us to the Diffie-Hellman method for key exchange.
As you may know, the Diffie-Hellman method allows just that, two parties can share a secret key without any prior knowledge of each other. That secret key can then be used to encrypt data or whatever anyone wants to do with a secret key.
This made me think. There we were thinking a problem was plain impossible and a few minutes later we saw a solution that seemed way too simple for such a hard problem. But it works.
Whitfield Diffie and Martin Hellman found an “impossible” problem and transformed it in a problem with a solution. They didn’t say “That’s just impossible, let’s just solve other problem”.
From that day on, when I find an “impossible” problem I remember the day I heard about the D-H method and I think that maybe, just maybe, there’s a solution hidden somewhere. It motivates me to dig deeper, try harder and never dismiss a problem as “impossible”.
Posted: October 4th, 2010 | Author: simaom | Filed under: geek | Tags: app engine, change timezone, google, javascript js | 2 Comments »
I noticed the time stamps used by the Google App Engine to show the application logs were all in the Pacific Standard time Timezone. That’s just not useful for everyone not living in the pacific coast.
So I wrote an user script you can use to show the time stamps in your browser’s timezone.
I tested my script on Firefox with greasemonkey and google chrome and it seems to work.
You can get it here: http://github.com/simao/mycode/raw/master/UserScripts/appenginelogtz.user.js
You can also fork this script at github and change it. If you do, please push your changes back to the repository. :)
Posted: October 1st, 2010 | Author: simaom | Filed under: geek, informação, portugal, ti | 4 Comments »
A few months ago I started using a new service provided by carris to check how much time I have to wait for the bus.
Problem is, people at Carris never heard about webservices and rest apis, so I had to send an e-mail and wait for an automated reply with information about waiting times for the stop I was in.
So I wrote an Google App Engine App to wrap this operation and manage this process for me. All I have to do is provide the stop code and wait for the results.
You can check out the app here: http://lxbusinfo.appspot.com
The app is NOT beautiful when seen on mobile devices, but it’s at least usable. I’ll get around to that as soon as I have some free time, but that is somewhat related with the second part of this post:
Since I developed a REST API to do this, you can also build your own app to check bus waiting times. Quite easily, in fact.
The process of obtaining bus information is of course asynchronous, since my app has to wait for an e-mail from carris.
All your App has to do is send a POST request with the stopcode parameter to http://lxbusinfo.appspot.com/api/newBusRequest You’ll receive a reply like this one:
newBusRequest
Form data: stopcode:10503
[
{
"requestid": "d71175fc592e4557b76d04f836dae30ef7afcb1d",
"statuscode": 1
}
]
This way, you’ll receive a requestid you can use to poll the server at http://lxbusinfo.appspot.com/api/updateBusRequest using a GET request. This API call receives a requestid as parameter. Please wait at least 5 seconds between update requests. Here’s an example of a GET request on updateBusRequest:
updateBusRequest?requestid=d71175fc592e4557b76d04f836dae30ef7afcb1d
[
{
"message": "",
"payload": [
{
"dest": "PORTAS BENFICA",
"eta_minutes": 4,
"pt_timestamp": "21:09",
"last_modified": "2010-10-01T20:04:41.481256",
"busnr": 758
},
{
"dest": "PR. REAL",
"eta_minutes": 12,
"pt_timestamp": "21:17",
"last_modified": "2010-10-01T20:04:41.556630",
"busnr": 790
}
],
"statuscode": 0
}
]
As you can see, all replys are json encoded so you can parse them easily with JavaScript.
There are other replies and error codes, leave a comment if you want to implement something and I’ll send you a nicer specification.
I’ll open the source and setup a github repository as soon as I can get around to shape things up a bit.
I want to make this open source not only because I want to share the source but also because probably carris will do something to block my app, but if the source is open anyone can install the app elsewhere and setup a new bus information server. Problem is I don’t want to share the source with Carris. :) Any thoughts on this? Lets hear it in the comments.
Posted: May 31st, 2010 | Author: simaom | Filed under: blog, informação | Tags: browsers, firefox, privacy, security, ti | No Comments »
Tomei conhecimento do estudo da Electronic Frontier Foundation intitulado “Web Browsers Leave ‘Fingerprints’ Behind as You Surf the Net”, através do Miguel Almeida, mas já vi várias opiniões em vários sítios diferentes.
Apesar de achar que os resultados do estudo são de facto preocupantes, acho que se está a exagerar bastante.
Ao correr o teste disponível aqui, obtive os seguintes resultados:
Your browser fingerprint appears to be unique among the 1,763 tested so far.
Currently, we estimate that your browser has a fingerprint that conveys at least 10.78 bits of identifying information.
| Browser Characteristic |
bits of identifying information |
one in x browsers have this value |
value |
| User Agent |
-0.84 |
0.56 |
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 |
| HTTP_ACCEPT Headers |
-6.98 |
0.01 |
text/html, */* ISO-8859-1,utf-8;q=0.7,*;q=0.7 gzip,deflate en-us,en;q=0.5 |
| Browser Plugin Details |
10.78+ |
1763 |
A bunch of them |
| Time Zone |
-7.13 |
0.01 |
-60 |
| Screen Size and Color Depth |
-5.38 |
0.02 |
1280x800x24 |
| System Fonts |
6.26 |
76.65 |
A bunch of them
|
| Are Cookies Enabled? |
-8.89 |
0 |
Yes |
| Limited supercookie test |
-8.03 |
0 |
DOM localStorage: Yes, DOM sessionStorage: Yes, IE userData: No |
Analisando os resultados linha a linha, podemos ver que o único parâmetro que é verdadeiramente único é o “Browser Plugin Details” e em segundo lugar o “System fonts”, sendo que 1 em cada 76 browsers testados têm as mesmas fontes que eu. Ou seja, o meu browser é unico entre os 1963 browsers testados, mas apenas pela minha lista de plugins. Ora em 1763 browsers testados, não acho que o problema seja tão grave como se faz parecer, ainda para mais quando é vulnerável ao uso de plausible deniability.
Como se pode ver, há um certo exagero nas opiniões que tenho visto sobre o estudo. Na prática, apesar de se provar que de facto o nosso browser está longe de ser comum, e consequentemente garantir qualquer tipo de anonimato, também não é tão particular como se faz parecer.
Just my 2 cents.
Posted: April 12th, 2010 | Author: simaom | Filed under: blog, Uncategorized | Tags: blogs, opml, prt.sc, pytho, rss | 1 Comment »
Recently, one of my favorite blog aggregators died, but left us with a nice OPML file containing all rss feeds from the authors of the aggregator.
Yahoo! My pipes couldn’t parse the opml file, so I built a python script to do it and hosted it in my server.
The RSS File is updated every hour and contains all RSS entries from all Prt.Sc authors not older than seven days.
You can get the RSS file at http://simaom.com/prtsc.xml.
So prt.sc lives… well… sort of… I won’t update the opml feed with any authors, we only get access to rss entries from the authors included in the last version of Prt.sc.
Let me know what you think.
Posted: April 12th, 2010 | Author: simaom | Filed under: blog | Tags: opml, python, rss, syndication, xml | 1 Comment »
I just uploaded another script to my github repository.
It’s a python script to parse an opml file and generate a rss file with entries from all rss feeds in the rss file not older than a certain number of days.
You can get the script here: http://github.com/…/opml2rss.py
The script has a few configuration parameters that are pretty self explanatory, should be easy.
You also need to install a few python modules: Feedparser, OPML and PYRSS2Gen.