Thursday, May 2, 2019

Moved

For anyone who stumbles across this blog, I've been publishing tech blog posts to a new personal space instead of here.

See it here: ltriant.github.io

I may continue to put personal and non-tech stuff here, but I'm not committing to anything.

Friday, May 4, 2018

tree

Cherie and I recently moved into our first home, and outside of the excitement of buying furniture, fixing broken stuff, unpacking boxes, and the rest of the mountain of work that goes into moving houses, I found myself on a Monday afternoon with a spare couple of hours to work on my NES emulator. But I ran into a small problem; on my personal MacBook, I don't have the tree command installed, and I didn't have an internet connection at the new house yet. So I decided to write it.

https://github.com/ltriant/tree

It was a fun way to burn a couple of hours, I got to have some fun writing some C for the first time in a while, and I was reintroduced to writing makefiles (I forgot about all the handy implicit rules). It's incredibly simple code for an incredibly simple problem, and it's not really portable at all, but it works on my MacBook.

Friday, April 27, 2018

Perl for DevOps: Mojo::UserAgent

There's no way anyone "doing devops" can work without needing to interact with products/systems remotely, and these days pretty much everything offers an API and that API is more than likely available over HTTP. The Mojolicious distribution provides a ton of useful modules for doing almost anything HTTP-related; both client- and server-side.

This post is going to mostly focus on writing client code with Mojo::UserAgent, and some utility modules that Mojolicious ships with. I will also touch on using Mojolicious::Lite for writing web interfaces and services, but deployment will be deferred until the next post, which will focus on Plack/PSGI, the various Plack servers available, and some of the web frameworks that support PSGI.

Mojo::UserAgent vs LWP::UserAgent

For the longest time, LWP::UserAgent has been the de-facto standard HTTP client library that the Perl community uses. But I've written before about why I prefer Mojo::UserAgent over LWP::UserAgent; it has both blocking and non-blocking interfaces, so you can write both kinds of applications and only need to know one API, should you get into writing non-blocking code in your devops journey.

But there's more.

Prototype with mojo

The mojo command is provided by the Mojolicious package, and is a great tool to query web services and pages. I wouldn't call it a curl or wget replacement (that's not its purpose), but it's a great tool for extracting data that is embedded in a response, for use in shell scripts, and for rapid prototyping before converting to Mojo::UserAgent in a Perl script.

As an example, if you wanted to use the MetaCPAN API to get the latest version of a package:


$ mojo get http://fastapi.metacpan.org/v1/package/Mojolicious
{
   "version" : "7.74",
   "module_name" : "Mojolicious",
   "dist_version" : "7.74",
   "file" : "S/SR/SRI/Mojolicious-7.74.tar.gz",
   "distribution" : "Mojolicious",
   "author" : "SRI"
}

This shows the JSON output of the API request. Mojolicious comes bundled with its own JSON decoding/encoding module, Mojo::JSON, which can be used directly in any application you want - perhaps as a replacement for the other JSON modules, if you so desired - but it's also integrated into Mojo::UserAgent, for decoding and easily extracting data from JSON responses.

The version is what I'm after. We can easily grab that with another parameter, utilising some simple notation.


$ mojo get http://fastapi.metacpan.org/v1/package/Mojolicious /version
7.74

But what if there is no nice JSON API, and we have to extract the same data from a web page? Well, we can do that too:


$ mojo get https://metacpan.org/pod/Mojolicious 'span[itemprop="softwareVersion"]' text
7.74

This fetches the Mojolicious documentation on MetaCPAN and looks for a span tag with an itemprop attribute value of softwareVersion and displays the text in the tag. In this case, we're pretty lucky that the MetaCPAN page gives us a friendly way to locate this data, but more complex queries can be used for less mojo-friendly websites.

The beauty of the mojo tool is that once you've prototyped how to extract the information that you want, you can either leave it in a bash script, or you can port the code to use the Mojo::UserAgent module and use it as part of a larger application.


#!/usr/bin/env perl

use v5.10;
use warnings;
use strict;

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;
my $tx = $ua->get('http://fastapi.metacpan.org/v1/package/Mojolicious');

if ($tx->res->is_success) {
 say $tx->res->json->{version};
}

This is just a part of what the Mojolicious distribution has to offer; there's also an event loop and a promises implementation for writing non-blocking client and server code, and a whole lot more. Mojolicious wants to be a self-contained installation with as few external dependencies as possible, which makes it stable, and resilient to issues in the greater CPAN package ecosystem. Check out the other packages it provides.

Next in the series (whenever I get to it) I'll go through Mojolicious::Lite and Plack/PSGI, for when the time comes to write and deploy web sites and services.

Friday, March 9, 2018

NES Emulator, Part 1: I have no idea what I'm doing

For a very long time I've wanted to have a go at writing an emulator, and for one reason or another I never did it, but a few weeks ago I decided to pull the trigger and start writing a NES emulator. And I've chosen to write it in Rust, as I have a goal this year to achieve a moderate-to-high level of competency in the language.

This is the first of a few blog posts to briefly document my journey as someone who has never written an emulator before.

I have very limited technical knowledge in this space, so maybe this will be useful to someone in the future.

The Beginning

The NesDev wiki is full of useful information, and is easily the most useful resource I've found so far.

On the wiki there are a ton of links and pages describing the basic architecture of the NES; the CPU is a 6502 processor, and there's a PPU for rendering the graphics, and an APU for handling the audio. As far as I can tell (and I'll correct this in future posts if I need to), the CPU simply writes to certain memory addresses that the PPU and APU then read from to do stuff. And in a single emulator cycle, based on the clock rates of each component, X number of CPU cycles are executed, Y number of PPU cycles are executed, and Z number of APU cycles are executed.

The first place I decided to dive in, after reading various threads on the EmuDev subreddit, was with the CPU implementation. I have zero experience with the 6502 beyond reading about the early days of Apple and Steve Wozniak's stories from the Homebrew Computer Club, but it's a thoroughly documented processor and the NesDev wiki has plenty of resources for it. It's a pretty basic CPU to understand; there are three standard registers, a few status flags, a program counter, and a stack pointer. Nothing new if you've ever written any assembly, or had to debug stuff in gdb before.

Initially, I started from the bottom up, modelling the memory map and writing the CPU's interactions with the memory. However, because of the different things that the memory maps to (controllers, joysticks, the game ROM data, etc.), I realised that I'd have to write a mountain of code before I'd even know if the design I was rolling with would even work, something that can be fairly unforgiving to redo in a Rust project because of how strict the compiler is. So I changed track a little by going top down instead, and started writing a very simple and incomplete parser for the iNES file format, which is the format that most ROMs are available in. There's a wiki page for that too.

I then grabbed the nestest ROM from the emulator test page, and starting implementing new instructions and addressing modes every time I hit something my emulator didn't know how to do.

The disassembly output that my emulator prints isn't exactly the same as what the nestest log output is, and I'm not sure how worried I should be about that yet. Most posts that I find on the NesDev forums suggest that being mostly correct is good enough at the start, and to just use it as a guide. But it makes me feel all kinds of uncomfortable.

At this point in time, I'm still implementing instructions for the CPU.

It's alive! (sort of)

Addressing Modes

Addressing modes (which dictate how an opcode/instruction gets its operands/arguments) confused the hell out of me at first, but I understand it a lot better after following through the addressing modes page on the NES Hacker wiki.

Learn from Others

The last thing that has been super useful is reading the source of other emulators, and any blog posts people may have written.

Michael Fogleman's article about writing a NES emulator was great source of lower-level information to start with, and the code is super easy to follow, even if you're not overly familiar with Go.

This Rust emulator has also been useful, when I'm trying to figure out how best to model certain things in Rust.

Friday, February 9, 2018

Project Euler problem 67

I've used Project Euler a number of times in the past as a learning platform; whether it's to keep some skills sharp or to learn something new. A few months ago, I revisited some old problems when I was learning the basics of Rust.

Problem 67 is simple as its core; in a pyramid of numbers, find the maximum path sum of numbers from the top to the bottom. The example pyramid is:

      3
    7   4
  2   4   6
8   5   9   3

The earlier problem (problem 18) is a much simpler version of the problem, where the input is small and allows for very naive recursive solution:

fn max_sum_recursion_dumb(v: &Vec<Vec<u32>>, i: usize, j: usize) -> u32 {
    if i >= v.len() {
        return 0;
    }

    let left = max_sum_recursion_dumb(&v, i+1, j);
    let right = max_sum_recursion_dumb(&v, i+1, j+1);

    v[i][j] + left.max(right)
}

The basic idea is that to get the total sum of the current node, get the total sum of the left child node and the total sum of the right child node, and return the sum that is larger, with the current node's value added to it.

The function can be called easily, assuming the pyramid is represented as a two-dimensional vector and we start at the top (zero indexes):

println!("{}", max_sum_recursion_dumb(&input_data, 0, 0));

The problem with this approach is there is a ton of re-computation happening, which only gets worse as the input gets larger (as in problem 67).

If sticking with this solution, the technique to get around recomputing for the same input over and over again is to memoize (or cache) results of previous calls. This results in a top-down dynamic programming solution:

fn max_sum_recursion(v: &Vec<Vec<u32>>,
                     i: usize,
                     j: usize,
                     mut cache: &mut HashMap<(usize, usize), u32>) -> u32 {

    if i >= v.len() {
        return 0;
    }

    if let Some(rv) = cache.get(&(i,j)) {
        return *rv;
    }

    let left = max_sum_recursion(&v, i+1, j, &mut cache);
    let right = max_sum_recursion(&v, i+1, j+1, &mut cache);

    let rv = v[i][j] + left.max(right);
    cache.insert((i,j), rv);
    return rv;
}

This can then be called with an extra parameter for the cache:

let mut cache = HashMap::new();
println!("{}", max_sum_recursion(&input_data, 0, 0, &mut cache));

But to be honest, although the solution worked well, I don't like the look of the code at all. In the original Perl version of this solution that I'd written years ago, I just used the Memoize module, so the code remained very much like the first function, but with an extra line outside of the function to enable memoization.

But in this Rust version of the function, I have to pass around mutable references, which I don't want to do unless absolutely necessary, and I didn't feel like it was absolutely necessary.

Looking at the core of how the function finds the maximum sum - once you unravel the recursion - it actually starts at the bottom of the pyramid, compares pairs of numbers for the larger number, adds it to the parent number on the row above, and repeats, until we get to the top. So what if, instead of starting at the top, the code just started from the bottom and bubbled up to the top?

Using the example in the problem description, the pyramid starts like:

      3
    7   4
  2   4   6
8   5   9   3

Then we take the bottom row (8, 5, 9, 3), get the largest number of each pair, and add it to the number of the parent node in the row above. So, after this step, we end up with the following pyramid:

      3
    7   4
  10  13  15
8   5   9   3

And then we repeat for the row above:

      3
    20  19
  10  13  15
8   5   9   3

And then once more for the top row:

      23
    20  19
  10  13  15
8   5   9   3

Finally, the top of the pyramid contains the maximum path sum, 23.

fn max_sum_norecursion(v: &Vec<Vec<u32>>) -> u32 {
    let mut sums = v[v.len() - 1].clone();

    for i in (1 .. v.len()).rev() {
        for j in 1 .. v[i].len() {
            let max = sums[j].max(sums[j-1]);
            sums[j-1] = max + v[i-1][j-1];
        }
        sums.pop();
    }

    sums[0]
}

This bottom-up dynamic programming approach has resulted in - I think - better looking Rust code, without the need for recursion, and without the need to hand around a mutable reference for caching.

It was fun to come back and solve this problem in a different way, even if the original reason I even revisited it was just to learn some basic Rust.

Friday, January 19, 2018

Perl for DevOps: IO::All

A stupidly common task to perform is file and directory IO. In the Perl world, the IO::All module has wrapped up nearly every common IO task I can think of into a very expressive interface. With the many options available to perform the same task, it can fit into scripts in many different ways.

For example - as a sort of "hello world" of file IO - if I wanted to read the contents of a file, do some basic processing and then output to a new file, here is a very simple solution:

use IO::All;

my $contents < io("foo.txt");
$contents =~ s{foo}{bar}g;
$contents > io("bar.txt");

Or, if you're not a fan of operator overloading, that's cool too! Here's the same script, with a more explicit usage:

use IO::All;

my $contents = io("foo.txt")->slurp;
$contents =~ s{foo}{bar}g;
io("bar.txt")->print($contents);

And there are a bunch more options to do similar things in the documentation.

What about reading a file backwards? This is sometimes useful to look for the last instance of an event in a log file:

use v5.10;
use IO::All;

my $io = io("/var/log/maillog");
$io->backwards;
while (my $line = $io->getline) {
  if ($line =~ m{ dsn = 4\.5\.0 }xms) {
    say "last success: $line";
    last;
  }
}

What About Directories?

Perhaps we wanted to traverse /var/log recursively and list out anything that's gzip compressed:

use v5.10;
use IO::All;

my @files = io('/var/log')->deep->all_files;
foreach my $file (grep { $_->ext eq 'gz' } @files) {
  say $file->name;
}

Something I've had to do on more than one occasion - when bringing up and initialising a new VM - is create a directory structure and all of the parent directories with it:

use IO::All;

foreach my $a ('0' .. '9', 'a' .. 'f') {
  foreach my $b ('0' .. '9', 'a' .. 'f')
    io->dir("/var/my-application/tmp/$a/$b")->mkpath;
  }
}

So What?

The tendency is to just use bash scripts for a lot of these tasks. But bash scripts become unwieldy when the scope of a tiny script creeps, and it now needs to compress files, encrypt data, maybe upload stuff to S3, logging everything it does along the way to a centralised location, perhaps logging all errors to a Slack channel, or maybe just sending a notification to a Slack channel when the job is done. Perl is more than ready to handle those tasks.

I'll tackle some of these modules and tasks in more detail in future posts.

Worthy Mention: Path::Tiny

Although more can be accomplished with IO::All, the Path::Tiny module is also worth knowing about. There have been certain times where I've needed more specific control that IO::All doesn't provide. In those cases, Path::Tiny usually does what I want, so it's a handy backup tool and worth knowing about.

Between these two modules, pretty much all filesystem IO needs should be taken care of.

So Much More

I'd encourage anyone to look through the docs. There are tons of examples for all kinds of tasks that I haven't touched on at all in this post, even as far as being able to send emails via a plugin.

Unless - or until - you need to use the low-level IO functions for fine-grained control and/or better performance for a critical piece of functionality, the IO::All module (and Path::Tiny as its companion) should be more than enough almost all of the time.

Friday, January 12, 2018

Handy Skills: Start a Campfire

It's been a while since I've done one of these! I thought this was a good one to post, having just returned from a camping trip last week and already dying to go on the next one.

Being able to start a fire is a super handy skill to have, more so when camping, but even when at someone's house with a fire pit of some sort. Most people who've never really started one think it's dead simple, only to fail miserably when they have to actually do it on their own. I know I did, at least.

Even if you've got fire starters and a deep fire pit to protect from any wind, if you don't do it right, it can take a lot more effort to get the fire going than is necessary. Ideally, I wanted to get to a point where I could start a fire with just a lighter (or matches) and not have to rely on anything else that I may either forget to take camping, or that I may run out of. When you go camping with friends who smoke, you're never short on lighters :)

The video that gave me all of the info that I needed was The Bird Nest and Tinder Bundle from Dave Canterbury (below). This is a great video about creating a bird nest to start a fire. From this video, you can skip the use of the char cloth to ignite the bundle and just use a Bic lighter, and I don't necessarily create my bird nest as big as in the video, but the practice of gathering or creating a bunch of fine, dry material for the bird nest - that'll catch fire quickly and easily - and gathering small sticks for kindling to then build on top of with larger logs is probably 95% of what I needed to reliably and consistently get a fire going. Another quick example can be seen in the Basic Camp Overnighter series too.

The Upside Down Fire is another good idea for getting a campfire going without needing much attention to maintain (while you go do other things), and I've started these a couple of times with great success. The biggest benefit to using this style of fire in winter is that the fire starts on a dry piece of wood, rather than on the ground, which may be wet or damp, and the above method of starting a fire with a bird nest is still relevant.

Thursday, November 30, 2017

Perl for DevOps: perlbrew and carton

I'm sick of seeing the same, old, and very dated articles/books/whatever relating to Perl and systems administration. There are a ton of Perl modules and tools available to make life easy for developers, testers and operations staff in a DevOps environment, but unless you're already deep in the Perl world, many remain fairly hidden from the public eye and hard to come by.

I'm hoping that the next few posts will show off what's currently on offer in the Perl world. Whether it's a brand new startup launching a product from scratch, or an established organisation with an already mature product, I'm not trying to convince anyone to change their main product's stack, but for all of the glue required to support the application in a production environment, Perl is an excellent choice.

Perl's "There's More That One Way To Do It" attitude has inspired a variety of modules with expressive APIs and tools that make working with Perl in a production environment easy.

But before getting into specific modules and tools, the first thing to discuss, even if it's less exciting, is the management of multiple perl versions and the management of CPAN dependencies.

What I'm going to discuss here isn't new material; an article from 2016, A Perl toolchain for building micro-services at scale, summed up a great set of relevant tools for using Perl which can be extended from building microservices to doing almost any other development work with Perl. This first post will focus on the two tools that I think are the most important.

Perlbrew

Unfortunately, most Linux distributions still ship with perl 5.8, despite reaching end-of-life years ago. This often leads to people sticking with perl 5.8 and installing modules from CPAN to the system-level perl, sometimes even using their distribution's package manager instead of a CPAN client to do it. This is a terrible idea. Often, depending on which OS and distribution you're running, the system-level perl is used for internal tools, and breaking the system-level perl starts to break other important things.

This is where perlbrew is a no-brainer.

Perlbrew is just like pyenv for Python, or rbenv for Ruby; it's a tool for managing and using various perl versions without interfering with the system-level perl.

The added bonus of running a more recent version of perl out of perlbrew is the availability of some modules which require perl 5.10 or later, having left behind 5.8 long ago, e.g. Mojolicious.

Alternatively, plenv is another tool for managing Perl versions, although it's not a tool I have a lot of experience with.

Carton

There are a few options for managing Perl dependencies. I'm only going to describe Carton, but there are also distribution- or OS-specific options aswell that cover all or some of the functionality of Carton, e.g. Red Hat Software Collections.

While not strictly necessary, carton - comparable to using a combination of virtualenv + pip for Python or Bundler for Ruby - is an excellent tool to manage dependencies.

The cpanfile (used by carton) provides the ability to specify the direct dependencies of the script or system and, along with the generated cpanfile.snapshot file which contains the full dependency tree, can be checked into a source control system along with the code it supports. The carton utility then provides the ability to use this cpanfile to create a local repository containing only the modules and versions specified in the snapshot.

Multiple cpanfiles may be used to track the dependencies of multiple different systems or subsystems.

An example setup might be to only use a carton bundle for critical customer-facing services, as you would want that environment to be as static as possible and not be prone to failure just because someone updated a dependency for a utility script. Or perhaps use one carton bundle for critical production stuff, and another one for the less critical stuff. Or perhaps a more granular setup, depending on the situation.

The caveat with carton is that for any dependency on third party libraries (e.g. IO::Socket::SSL requiring openssl, or EV requiring libev), the third party library will not be bundled into your carton repository.

Kinda Boring But Important

I feel like this was a pretty boring introduction to using Perl as a language for your devops needs, but it's an important topic that - unfortunately, in my own personal experience - can be a real pain in the ass to deal with if it's not considered early on in the piece.

Perlbrew and Carton are powerful tools, are both worth knowing and, when used in tandem, they allow any development to be as isolated as possible, so as to interfere with as little as possible on a system.

Friday, October 27, 2017

Perl Hack: perlbrew libs

The libs feature of perlbrew is one I don't see used very often. At least, not by the developers I currently work with and have worked with in the past.

Sometimes I want to run a piece of code against the core libraries and only the core libraries. Sometimes I wrap a script up with Carton and want to verify that a base install + Carton can run my script. And sometimes I just want a place to install anything and everything from CPAN, play with new versions' features, etc...

This is where the libs feature comes in handy.

I have three sets of perl 5.20.3 libs:

$ perlbrew list
  perl-5.20.3
  perl-5.20.3@carton
* perl-5.20.3@dev

99% of my time is spent on the "dev" lib, where I install anything I want. The "perl-5.20.3" is just a base installation of 5.20.3. And the "carton" lib is just a base 5.20.3 installation with only Carton installed. And if I ever break the "carton" or "dev" libs, they're easily recreated from the base installation.

Thursday, September 28, 2017

Rehab

In December 2016, on my last heavy deadlift session before competing in Victoria's Strongest Man in January, I injured my back. Although the damage wasn't too serious, it was enough that I had to pull out of the competition.

At first, the rehab protocol was simply to do 50-100 reps of hyperextensions and reverse hyperextensions as part of my warm-up. During this time I didn't squat or deadlift anything over about 60% of my max. This felt like it was working for a couple of months, and then I hurt my back in the same way again, but this time while doing a speed squat session.

This was frustrating, considering that I thought I was doing the "right thing" this time around.

I changed my rehab protocol to instead focus on a lot of direct ab work - something I've neglected a lot in the past - and regular stretching of the hip region and other areas in close proximity. The ab work work was just some ab wheel work to begin with, and then I started doing 100 reps of light pulldown abs as part of my warm-up protocol along with the ab wheel work.

This time, things have started to get better; I've been able to squat up to ~180kg again and deadlift in the 200-240kg range again, although not for the higher volume I was used to before the injury. There has been some slight discomfort and tightness, but some stretching and mobility work has sorted it out.

I then, more recently, added the lower-back rehab work from earlier in the year back in; 100 reps of hyperextensions in my warm-up protocol on top of the ab work I'm already doing.

This has been working well and I'm back on track to squat and pull some heavier numbers by the end of the year. Just in time for the next Victoria's Strongest Man competition in January 2018.

Tuesday, August 29, 2017

Serving the Current Directory over SSL

Recently at work, we needed to setup a dummy HTTPS server just as an endpoint that needed to do... something. Nothing specific. Just something that did SSL/TLS and returned a 200 response. Immediately I thought of python's builtin SimpleHTTPServer, which can be used to serve the current directory:

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

And away it goes. But to put SSL into it, more code is needed, but there are examples around.

I wondered how easily (or not) I could do it with Perl and a Plack server.

First, I needed the following dependencies installed:

  1. Plack::App::Directory. This comes with the standard Plack distribution, but it's used to serve a directory listing.
  2. Starman. This is currently the only Plack server that supports SSL, without requiring something like nginx in front of it. A little disappointing, but not a big deal.
  3. IO::Socket::SSL. To do the SSL stuff. Requires OpenSSL.

These can either be managed by Carton, or you can just install them with cpanm.

$ cpanm Plack Starman IO::Socket::SSL

Next, I need to generate a dummy SSL certificate.

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout server.key -out server.crt

Now I can run the server:

$ plackup -s Starman --listen :8000 \
  --enable-ssl --ssl-cert server.crt --ssl-key server.key \
  -MPlack::App::Directory -e'Plack::App::Directory->new(root=>".")'
2017/08/29-12:27:01 Starman::Server (type Net::Server::PreFork) starting! pid(38015)
Resolved [*]:8000 to [0.0.0.0]:8000, IPv4
Binding to SSL port 8000 on host 0.0.0.0 with IPv4
Setting gid to "20 20 20 504 401 12 61 79 80 81 98 33 100 204 395 398 399"
Starman: Accepting connections at https://*:8000/

... and the directory listing is available via https://locahost:8000/

It's not as simple as Python's SimpleHTTPServer to get going, but it works!

Friday, July 28, 2017

Concurrency, Perl and Web Services, Oh My!

The majority of the systems I develop - both in my spare time and at work - are heavily IO-based; connecting various web services, databases and files on disk together in some fashion. In those particular systems, I'd be stupid not to promote a concurrency-based solution.

Lately I've spent a lot of time developing small web services (or microservices or whatever it's called this year), both as brand new systems and also as a way of shaving off parts of large monoliths into smaller chunks to auto-scale.

There are many ways to design the architecture for this kind of system and its subsystems, and often there are instances where pre-forking a bunch of worker processes to handle requests is either too resource-hungry or just not appropriate for the work it's doing.

Let's Write Something

I want to write an example web service, but I'm sick of seeing the same, "Hello World"-esque web services that can be written in a dozen lines of code that no way represent any web service that anyone ever has written. I want to write an application that can actually benefit from a concurrent solution and semi-resembles a real-world thing. So I've got an example:

  1. HTTP-based web service
  2. Runs in a single process
  3. Accepts a domain name, and returns the geographic locations of the domain's mail servers, in JSON format

To satisfy the first two criteria, a Plack/PSGI application running out of either Twiggy or Feersum should do just fine.

In order to satisfy the last point (i.e. the actual functionality), the app needs to perform a few steps:

  1. Retrieve the mail servers of the domain via a DNS lookup. AnyEvent::DNS can do this.
  2. For each of the mail servers, resolve the IP addresses via another DNS lookup. AnyEvent::DNS to the rescue again.
  3. For each of the IP addresses, I'm going to use the IP Vigilante API to retrieve the geographic location data. There are no modules on CPAN for the IP Vigilante service, so I'll need to write something. AnyEvent::HTTP would work just fine here, but lately I prefer to use Mojo::UserAgent where possible, because it's much more versatile, e.g. providing a proper request/response object for us, and handling JSON responses.

There's a fairly straight-forward sequence of operations to perform, so I'm going to use a Promises-based approach. I've found that this makes concurrent Perl code much easier to follow, especially when other developers need to jump in and understand and maintain it. There's no real reason for why I've settled on Promises over Futures (or any other implementation of these two patterns); either will do just fine.

Firstly, I need a function that can lookup the MX records for a single domain and return an arrayref of addresses (via a promise).


sub lookup_mx {
    my ($domain) = @_;
    AE::log trace => "lookup_mx($domain)";

    my $d = Promises::deferred;

    AnyEvent::DNS::mx $domain, sub {
        my (@addrs) = @_;

        if (@addrs) {
            $d->resolve(\@addrs);
            return;
        }

        $d->reject("unable to perform MX lookup of $domain");
    };

    return $d->promise;
}

This is actually a pretty boring function to look at.

Next, I need a function that can resolve a domain name to an IP address (or addresses).


sub resolve_addr {
    my ($domain) = @_;
    AE::log trace => "resolve_addr($domain)";

    my $d = Promises::deferred;

    AnyEvent::DNS::a $domain, sub {
        my (@addrs) = @_;

        if (@addrs) {
            $d->resolve(\@addrs);
            return;
        }

        $d->reject("unable to resolve $domain");
    };

    return $d->promise;
}

This is also a pretty boring function.

Now I need a function that can perform a lookup to the IP Vigilante service for a single IP address and return an arrayref containing the continent, country and city for which it resides.


my $ua = Mojo::UserAgent->new->max_redirects(5);

sub ipvigilante {
    my ($address) = @_;
    AE::log trace => "ipvigilante($address)";

    my $d = Promises::deferred;
    my $url = sprintf "https://ipvigilante.com/json/%s", $address;

    $ua->get($url, sub {
        my ($ua, $tx) = @_;
        if ($tx->res->is_success) {
            my $json = $tx->res->json;
            my $rv = [
                $json->{data}->{continent_name},
                $json->{data}->{country_name},
                $json->{data}->{city_name},
            ];
            $d->resolve($rv);
            return;
        }
        $d->reject( $tx->res->error );
    } );

    return $d->promise;
}

This function is slightly more interesting - it receives a JSON response from IP Vigilante - but, in the end, is still fairly boring, since Mojo::UserAgent handles all of it for us.

The next function will need to take an arrayref of IP addresses, and collate the IP Vigilante data into a hashref, for which the keys will be the IP addresses and the values will be the IP Vigilante information from the previous function.


sub get_ip_informations {
    my ($ips) = @_;

    my $d = Promises::deferred;

    my %rv;
    Promises::collect( map {
            my $ip = $_;
            ipvigilante($ip)
                ->then( sub {
                    my ($ip_info) = @_;
                    $rv{$ip} = $ip_info;
                } )
            } @$ips )
        ->then( sub { $d->resolve(\%rv) } )
        ->catch( sub { $d->reject(@_) } );

    return $d->promise;
}

This is the first note-worthy function, and it's still not that big of a function. The call to the ipvigilante()->then() chain will return a new promise, and we have used map and the Promises::collect() function to collate the results of multiple promises. This means that if we are trying to get the IP information for 10 addresses, the map will return 10 promises, and for this function to return a result, we need the response from all 10 promises. The entire batch executes concurrently and only runs as slow as the slowest IP Vigilante lookup. Yay concurrency!

Lastly, I need a function that will take an arrayref of domain names, resolve each domain to its IP address(es) and get the IP Vigilante information for each IP address (via the previous function) and return it as a hashref.


sub get_mx_informations {
    my ($addrs) = @_;

    my $d = Promises::deferred;

    my %rv;
    Promises::collect( map {
                my $mx = $_;
                resolve_addr($mx)
                    ->then( sub { get_ip_informations($_[0]) } )
                    ->then( sub { $rv{$mx} = $_[0] } );
            } @$addrs )
        ->then( sub { $d->resolve(\%rv) } )
        ->catch( sub { $d->reject(@_) } );


    return $d->promise;
}

This function is basically the bulk of the application.

I feel like these last two functions shouldn't be necessary and they rub me the wrong way a little, as they're essentially just for-loops where the inside of the loop has already been put into another function, but for the purposes of maintainability and testability, I kept them.

The beauty about all of the code written so far is that because Promises, AnyEvent and Mojo all integrate with the lower-level EV event loop, and in some cases with each other, everything works together. This makes it simple to mix and match your favorite libraries that were originally written for different frameworks.

The whole thing just needs to be wrapped in a Plack/PSGI application.


my $app = sub {
    my ($env) = @_;
    my $request = Plack::Request->new($env);

    if ($request->method ne 'GET') {
        return [ 400, [], [] ];
    }

    (my $domain = $request->path_info) =~ s{^/}{};

    if (not $domain) {
        return [
            400,
            [ 'Content-Type' => 'application/json' ],
            [ Mojo::JSON::encode_json( { error => 'domain required' } ) ]
        ];
    }

    return sub {
        my ($responder) = @_;
        my $response = Plack::Response->new;

        lookup_mx($domain)
            ->then( sub { get_mx_informations($_[0]) } )
            ->then( sub {
                    my ($mx_informations) = @_;
                    $response->status(200);
                    return { $domain => $mx_informations };
                } )
            ->catch( sub {
                    my ($error) = @_;
                    $response->status(400);
                    return { error => $error };
                } )
            ->finally( sub {
                    my ($json) = @_;
                    $response->headers( [
                        'Content-Type' => 'application/json'
                    ] );
                    $response->body( Mojo::JSON::encode_json($json) );
                    $responder->( $response->finalize )
                } );
    }
};

I'm going to use Carton to handle and bundle the dependencies. This step isn't absolutely necessary, but when deploying Perl applications across many machines in a production environment, it's a solid tool for keeping things consistent across the board. Not having a solution for this is a massive headache once many different pieces of code have been deployed again and again for a few years. The Carton FAQ has a good rundown of its use-cases. I now need to declare my immediate dependencies in a new file - for Carton to consume - cpanfile.


requires 'Plack';
requires 'Feersum';
requires 'AnyEvent';
requires 'IO::Socket::SSL';
requires 'Mojolicious';
requires 'Promises';

I'm not tied down to specific versions of any of these modules.

The last step is to - with the help of carton - install the dependencies, which will also generate a snapshot file with all my dependencies' dependicies, and then run the server.

$ carton install
$ carton exec -- feersum --listen :5000 -a mx.psgi

... and in another shell ...

$ curl -s http://localhost:5000/nab.com.au
{
   "nab.com.au" : {
      "cust23659-2-in.mailcontrol.com" : {
         "116.50.58.190" : [
            "Oceania",
            "Australia",
            null
         ]
      },
      "cust23659-1-in.mailcontrol.com" : {
         "116.50.59.190" : [
            "Asia",
            "India",
            null
         ]
      }
   }
}

The full code is available on github.

Just Release It Now, Right?

This is beyond the original scope of this post, but there's still a lot more to do. The application is just barely in an acceptable state. There are a number of extra steps before this can/should be deployed to production, for which I may write follow-up posts:

  1. Unit tests. The application and functions should be moved into its own package in order to have unit tests written against it. I've had great success using Plack::Test to test Plack applications and Test::MockObject::Extends to mock functions that would perform network calls, so that I don't require a working internet connection to run unit tests.
  2. Logging. Self-explanatory (I hope).
  3. Rate limiting the ipvigilante.com API requests. I don't want the service to inundate IP Vigilante with tons of connections/requests at the same time.
  4. Dealing with ipvigilante.com failures. The circuitbreaker pattern will help the service remain stable and not constantly hit a remote service which is having an outage.
  5. Caching. IP addresses aren't likely to move geographic locations very often (if at all), so caching the IP Vigilante responses will be of great benefit. Either a simple local cache with Cache::FastMmap, or perhaps with a remote cache in Cache::Memcached, if I end up with a cluster of servers - which are an auto-scaling group - and I want a centralised cache for all hosts to use.
  6. Monitoring. How long do DNS lookups take? How long do ipvigilante.com API requests take? How often do they fail? When they fail, do they fail fast or do they timeout after 5 minutes of waiting?
  7. There's probably more...