Friday, March 2, 2012

Getting Some Quick Experience with a New Programming Language

One of the biggest problems I have when I mess around with a new programming language is deciding what to do with it after I've been through the examples and official tutorials. What do you do when you just want to get more comfortable with the language and you just can't think of anything, or don't have any ideas that this language would fit nicely?

My solution? Use it for odds and ends. If the implementation has one, fire up a REPL and keep it running in the background while you work on other things. Need to do some kind of simple calculation? Need to transform some data? Jump into the REPL and figure out how to do it with your new, shiny toy. It may take a little longer than if you used something you're more fluent in, but the reward is often worth it.

Tuesday, February 28, 2012

Future Blog Posts

In my spare time I've put down a lot of the programming-related stuff I was doing in favor of other things. I started playing guitar a couple of years ago, I started lifting weights (to get better at basketball at first, but now with a focus to compete in strongman) and I started having more of a social life with my girlfriend. So I'm going to broaden my range when it comes to blog posts. No longer will this blog be solely for tech-related topics. I will start writing about various things in my life, which may include lessons I learned on the basketball court, reviews for cafes and restaurants around Melbourne and little tidbits about weight lifting and guitar and music theory.

I always wanted to blog about some of these topics, and I always thought about keeping it to separate blogs. But I've decided that this blog isn't busy enough and doesn't have many (if any) readers, so I'll mix it up with the tech talk.

Tuesday, February 7, 2012

Dock Stopped Working in OS X?

It doesn't happen often, but when it does, I don't want to have to restart my computer to fix it (or anything, really).

Every now and then, on OS X, my Dock disappears and I can't Cmd+Tab between applications anymore. After a bit of searching around on forums, I found the solution; kill the Dock process from Terminal.

killall -KILL Dock

After killing the process, a new Dock will automatically start up.

Problem solved!

Monday, January 2, 2012

How I Deal with Lots of Data

Just for some context, by "lots of data", I mean a couple hundred million rows in a table joined to other tables of similar size. By no means is it a lot compared to what other people deal with, but it's certainly enough to warrant some forethought instead of just diving right in.

Late last year, I was tasked with a few one-off reports that required me to summarise data that was stored over a couple hundred million rows in a database. Knowing that it was going to be a long process to retrieve the data for these reports, I had to come up with a game plan to do this as quickly and as efficiently as possible. In the process of doing this, as always when handling relatively large quantities of data, a few things were learned. None of these lessons/ideas are new or original, but I wanted to write them down somewhere, and here feels like as good a place as any.

Extracting the data that I needed (which, in one instance, was a subset of a couple of ~300 million row tables) into a local database meant I was able to modify the data (like cleaning up dirty, inconsistent spellings of suburbs, states and countries) and modify the schema (like adding new indexes which, because of the odd nature of the reports, the production databases didn't have or ever need previously).

Assuming that you don't require something like schema changes, that local database doesn't even need to be a relational database. CSV files work perfectly fine a lot of the time. For a couple of reports I wrote a handful of scripts, the first of which was to pull the data out of the MySQL database, perform some simple operations on the data and output it into a CSV file. The other scripts that needed to operate on the same data could then easily (thanks to Text::CSV_XS) read the CSV data, which was a lot quicker than reading it from a relational database.

Why do CSVs lend themselves nicely to this kind of task? Because with reports like these, in my experience, you very rarely perform complicated operations on the data after extracting it from the original data source(s); you just want to suck the data up, summarise the data, output the summary, and then output the nitty gritty details on subsequent pages or into a separate file.

An obvious advantage to storing the data like this is the speed in which you can retrieve and process the data. That improvement made a huge difference for me because I like to run my scripts very often throughout the development process, no matter how small the change.

Of course, depending on the size of the data (in bytes), extracting the data into a local database of some sort may not always be possible.

The last big win I had was not using object-relational mappers (Class::DBI in this case). They're great a lot of the time and save on code and development time, but when dealing with millions of rows, they just add bloat and everything runs much slower than it should.

That's all I can think of now, a few months later.

Tuesday, June 28, 2011

Crappy IRC and Unicode

I use MacIrssi as my IRC client at work and at home. It's mostly great. By mostly, I mean, I wish when people sent smart-ass messages on IRC filled with unicode characters, I could actually appreciate how much of a smart-ass they are being. Instead, all I see is this:

< ganeshanator> gonna go to the \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588
    and \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588
    getting those \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588
    and the \u2588\u2588\u2588\u2588\u2588
    on \u2588\u2588\u2588\u2588\u2588

I always forget how to convert it, so I wrote this, and now I never have to remember again:

#!/usr/bin/env perl

use warnings;
use strict;

use Encode;

sub unicode_plz { encode( 'UTF-8', pack( 'U', hex shift ) ) }

( my $message = shift ) =~ s{\\u([a-fA-F0-9]+)}{unicode_plz($1)}ge;
print "$message\n";

And, BAM!

Crappy was a harsh word. MacIrssi is pretty great, except for this.

Thursday, March 31, 2011

DateTime Woes

This morning, I hate DateTime for this.

$ perl -MDateTime -wle'$a = DateTime->today( time_zone => "Australia/Melbourne" ); $b = DateTime->today; $b->set_time_zone("Australia/Melbourne"); print $a->iso8601; print $b->iso8601'
2011-03-31T00:00:00
2011-03-30T11:00:00

I understand why it happens, but it's still an annoying, pissy, little bug to find. sigh...

Thursday, February 3, 2011

Update to an Old Post

A past post about tweeting from the command line with the Net::Twitter Perl module is one of my most visited entries. Anytime I check my traffic, it is always the second most visited entry for the day/week/whatever. However, it has a problem. The code does not work anymore for a simple, yet annoying, reason.

Last year, Twitter turned off its basic authentication support and now requires application developers to use OAuth. More detail about how that affected Net::Twitter can be read here. This kinda sucks because it's not longer as simple as providing a username/password combination. You need consumer keys/secrets and access tokens/secrets and that makes things a tad messier.

The original code I posted uses basic authentication and, obviously, will not work anymore. However, if anyone is interested, the code for that small side-project, Twitsh.pm, is on github and I recently updated it to use OAuth so that it works again, however the configuration file requires you to add your own consumer/access keys/tokens/secrets. Enjoy.

Sunday, January 23, 2011

Goal for the Year: IPv6

First blog post for the year! Yay? And I'm only setting myself one goal for the year: Learn everything about IPv6 and set it up at home.

And by everything, I mean everything; from IPv6 packet headers to the implementation complexities to the business issues to the security issues to the political issues.

Why? Because, potentially, in the next decade, migrating from IPv4 to IPv6 will become important, and considering that it's a topic that I know next-to-nothing about, other than what I've read in passing on IRC, it's a knowledge gap that needs filling.

It's certainly possible that this will all take me less than a month, assuming enough free time. When that time comes, perhaps I'll pick a non-geek goal to spend my time on this year.

Wednesday, December 1, 2010

Host Discovery via SPF Records?

So... my once-a-month posts OCD issue went unsatisfied for the past 2 months. Oh well. Back to business...

Sender Policy Framework (SPF) records are often used as one method of combatting spam; specifically, spam that wants to look like it has been sent by your company. The general idea is that your SPF records specify which hosts and IP addresses are allowed to send email using your domain(s) in the envelope 'from' address. Any other host, which is not contained within the SPF policies, that attempts to send mail from your domain will fail the SPF check and, hopefully, get picked up by some kind of spam detection software further down the track. It is highly recommended that you setup SPF records for any domains that you own.

How are SPF records requested? Via DNS. They are usually contained within the TXT record, however they can also, occasionally, be found in the SPF record aswell. The format is very easy to understand. An typical example to look at would be optus.com.au:

IN TXT "v=spf1 mx/24 include:opt01._spf.optin2.com.au ip4:180.92.216.0/21 include:rightnowtech.com include:rnmk.com include:custhelp.com ~all"

Do the hosts (and the range of IP addresses) in the policies belong to optus.com.au? Maybe. Maybe not. We cannot discern that just by looking at the host names, nor can we, within any real degree of certainty, determine it programmatically. However, the hosts' SPF records have been trusted to send mail on behalf of optus.com.au, so they remain hosts of interest for penetration testers and may not pop up in other DNS requests (A, MX, AFXR, etc..)

What is a common example of a host who would be in a domain's SPF policies but not actually part of the company who owns the policy? Any mail filtering company who filters outbound mail.

Just a thought that has been floating around in my head...

Thursday, September 30, 2010

End of the Month...

Well, it's the end of the month, which seems to have become the day that I write a new blog entry (I feel somewhat empty if there's a month missing from the sidebar over there), but I don't have anything very interesting to post, so I'll post something a little less interesting: A github project I use for testing libraries, Mazer.

Mazer is a rewrite of an old assignment I did years ago in C++. Originally it just had to automatically solve a maze step-by-step, as if simulating a human walking through the maze. I often write this same program when I start learning a new programming language (which I did for Haskell and Common Lisp), or, as was the case this time around, to play around with libraries. It was the first bit of code I wrote using Moose, Test::Class (the tests actually pass!) and, most recently, Curses::UI.

There's nothing special about it. It has basically become my sandbox for Perl libraries. Perhaps others would find it useful, though.