GFX::Monk Home

Posts tagged python:

 

Ruby's split() function makes me feel special (in a bad way)

Quick hand count: who knows what String.split() does?

Most developers probably do. Python? easy. Javascript? probably. But if you’re a ruby developer, chances are close to nil. I’m not trying to imply anything about the intelligence or skill of ruby developers, it’s just that the odds are stacked against you.


So, what does String.split() do?

In the simple case, it takes a separator string. It returns an array of substrings, split on the given string. Like so:

py> "one|two|three".split("|")
["one", "two", "three"]

Simple enough. As an extension, some languages allow you to pass in a num_splits option. In python, it splits only this many times, like so:

py> "one|two|three".split("|", 1)
["one", "two|three"]

Ruby is similar, although you have to add one to the second argument (it talks about number of returned components, rather than number of splits performed).

Javascript is a bit odd, in that it will ignore the rest of the string if you limit it:

js> "one|two|three".split("|", 2)
["one", "two"]

I don’t like the javascript way, but these are all valid interpretations of split. So far. And that’s pretty much all you have to know for python and javascript. But ruby? Pull up a seat.

Stereoscoper and the Depth of Awesomeness

A few days ago I got a shiny new toy: a 3d camera from thinkgeek (I’d link to it, but it seems to have disappeared from their catalogue). I’m a massive fan of 3d photos / video, so it’s pretty cool to have a device that allows me to take stereoscopic photo pairs simultaneously (you can do it manually with a static scene, but those get boring).

Sadly (although not surprisingly), the quality is not great. The limited resolution is not really an issue given how you’re likely to view them, but the pictures come out awkwardly stretched to half the expected horizontal resolution. They are also pre-combined in a single JPEG, they are the way around for cross-eyed viewing, and the colour balance is frequently off between the two sensors (which can be really jarring).

Seeing a lot of manual photo fixing in my future, I set out to automate it. And thus stereoscoper was born, as a way to bulk-convert stereo images to other formats. Aside from the obvious geometry changes (the horizontal resolution and image placement), I also learnt all about histogram matching in order to make the colour balance consistent across stereo pairs. And in order to make animated gifs that match up nicely, there’s even an interactive mode where you can fine-tune the alignment of the image pairs.

In the wiggly-animated spirit of 3ERD (note: some images there are NSFW), here’s some fun we had in the park with my new toy:

(click to toggle each animation. It’s off by default to save your brain from having a fit ;)

(stereo)

(stereo)

(stereo)

(stereo)

Update: Click the (stereo) link under each image for a cross-eyed viewing version.

Experiment: Using Zero-Install as a Plugin Manager

So for a little while now I’ve been wanting to try an experiment, using Zero Install as a Plugin Manager. Some background:

  • Zero install is designed to be used for distributing applications and libraries.
  • One of its greatest strengths is that it has no global state, and doesn’t require elevated privileges (i.e root) to run a program (and as the name implies, there is no installation to speak of).

Since there’s no global state, it seems possible to use this as a dependency manager within a program - also known as a plugin system.

Happily, it turns out it’s not so difficult. There are a few pieces required to set it up, but for the most part they are reusable - and much simpler to implement (and more flexible) than most home-rolled plugin systems.

Ruby's unicode treatment

I recently came across this enlightening post on the changes to strings and encodings in ruby 1.9. As a python lover who has only used ruby 1.8 so far, it’s interesting to see the different approaches to very similar problems in python 3 and ruby 1.9.

I may be biased, but ruby’s implementation sounds like it will lead to a lot of pain and bugs, while python’s implementation will lead to a little more pain as you are forced to learn about encodings, and a lot less bugs (as you are forced to learn about encodings). Let me explain why:

Pea

…three days after this post, pea, the tiniest BDD framework is born.

The Best-Named Python Library

I hereby declare to be monocle. To see why, check out the second example.

How I Replaced Cucumber With 65 Lines of Python

Update:

I’ve since cleaned up the code here and published it as a tiny library: pea on github

Aside: why cucumber doesn’t work as well as everyone thinks it should

I’ve used cucumber at work for a reasonably large project, and I wasn’t impressed. Having one canonical language for stories sounds great, until you have enough arguments about how things should be phrased that you eventually come to the realisation that BAs don’t want to write their specifications as tests, and you don’t write your tests as specifications.

This is a test style assertion step:

Then the total of the items should be 42

..and this is the same step in a requirements-style of language:

Then the total of the items should equal the sum of the number of items in each category

To a BA, the first example is a lie. The sum shouldn’t be 42, it should be the correct number! And to an automated program, the second statement is nigh on useless. Saying what something is supposed to be made from is just doing the same calculation twice - there’s nothing stopping you from doing it wrong both times! If you want to check that it’s getting the right answer, you need to tell it what the right answer is, not just tell it (again) how to make it.

So I’m not a huge fan of having cucumber scenarios be the single source of truth for requirements. If the programmers have their way it’s just a series of examples (also known as “tests”), and if the BAs have their way it’s just a series of feeble assertions that don’t necessarily check what they say they’re checking.

But it’s not all bad…

But on programmer-oriented projects, I can see them working quite well. For example, I’ve recently upgraded a large suite of specs to rspec 2, and made heavy use of the browsable cucumber scenarios on relishapp.com as actual, useful documentation.

So I decided to try cucumber on one of my own projects. Since I am obviously a python fiend outside of work, I wasn’t going to use cucumber. So out came the (very young) python port of cucumber, called lettuce (where did this salad theme even come from? o_O). I gave it a go, and of course it’s naturally a bit more awkward than ruby because python doesn’t have blocks. It’s also more than a little buggy, and lacking some useful features that cucumber has (which is to be expected of such a young project).

I started hacking on it to add or improve features, and then got sick of it. It really does seem a little ridiculous. We’re actually inventing a (trivial) language, and parsing it, and using little regex parsers in each of our steps, and mapping each of those regexes to little chunks of code. And all this makes it hard to find usages, hard to track duplication and dead code, and generally just awful to navigate and manage.

The punchline

So, you know what? I just transformed all my steps into valid python code instead. Each regex replaced with a function name, and each matching group an argument (python’s keyword-arguments help here). 65 lines of code later, I have a very similar result using plain-old python.

Here is a comparison. The old feature:

Feature: running indicate-task
	Running a basic, blocking process that
	consumes and produces output.

Scenario: running and cancelling a program
	When I run indicate-task -- cat
	And I enter "input"
	And I press ctrl-c
	And I wait for the task to complete

	Then there should be a "cat" indicator
	And it should have a menu description of "cat: running..."
	And the output should be: input
	And the error output should be empty
	And the return code should not be 0
	And it should display the task's output to the user
	And it should notify the user of the task's completion

And the new, normal, actual-python-code-that-works-just-fine-with-ctags-and-isn’t-built-with-dirty-regexes version:

from makeshift_cucumber import *
from base_test import BaseTest

class TestRunning(BaseTest):
	"""
	Feature: Running a basic, blocking process that
	consumes and produces output.
	"""

	def test_running_and_cancelling_a_program(self):
		When.I_run_indicate_task('--', 'cat')
		And.I_enter("input")
		And.I_press_ctrl_c()
		And.I_wait_for_the_task_to_complete()
		Then.there_should_be_an_indicator_named("cat")
		And.it_should_have_a_menu_description_of("cat: running...")
		And.the_output_should_be('input')
		And.the_error_output_should_be_empty()
		And.the_return_code_should_not_be(0)
		And.it_should_display_the_tasks_output_to_the_user()
		And.it_should_notify_the_user_of_the_tasks_completion()

No, you probably wouldn’t be able to get a businessman to write a scenario. But has that ever actually worked with cucumber either? I find it doubtful. The results are just as readable, and insanely simpler in terms of the complexity of the testing infrastructure. Plus, it’s just a normal test, the functions are just normal functions, and the arguments are just normal arguments.

And if you don’t want to give that to a BA, just show them the test output instead:

terminal output

I’ll try to clean this up some time into a proper library & formatter sometime, because I think the mess of code you end up with cucumber is just too ridiculous for the benefits you get, and this sort of thing is much more developer-friendly while maintaining most of the readability benefits.

mocktest 0.5

Over the holidays, I’ve finally had the time to rewrite my mocking library for python, mocktest.

The original version had what turned out to be a confusing distinction between mock anchdors, mock wrappers and raw mocks. You should no longer need to know about that distinction when using mocktest 0.5, as it takes a more traditional approach using global methods like when() and expect() to differentiate between setting up the mock object and actually using it.

Please check out the brand-new documentation if you’re looking into a mocking library for python - it works with the standard unittest infrastructure, so it’ll work just fine with your favourite test runner (nosetests, surely….)

Speaking of documentation, this is my first time using sphinx. I am very impressed, and really quite keen to properly document a lot of my other code, when I get the time.

A new edit-server for TextAid and friends

ItsAllText has long been one of my most useful firefox extensions. It allows you to edit the contents of a <textarea> in an external editor (i.e. vim, emacs, etc) and insert the results back into the web page.

I’ve been having trouble with itsalltext, so I scoped out other alternatives. One such extension is TextAid for chrome (Edit with Emacs is another, which thankfully you can use with vim despite the name ;)).

The funny thing about chrome extensions is that they’re not allowed to spawn new processes, which injects a large portion of awkwardness into an extension whose main goal is to spawn your text editor. The workaround is to run a server (locally) that receives a POST request with some text content. The server then spawns your favourite text editor and waits for you to edit its contents. When you’re done, the new text is send back as the response body. It seems a rather roundabout mechanism, but it’s nonetheless kind of neat. And entirely necessary to fit in with chrome’s security model - the chrome extension is just making a long-running ajax call.

So anyway. I took the python server from the emacs_chrome project, cleaned it up, added multithreading so you can edit multiple files at once, and packaged it all up as a 0install package (yep, I still love 0install). You can get it here, if you are ever in the need for such an outrageous piece of software.

group_sequential

So the other day I had a list of (html) elements, and I wanted to get an array representing lines of text. The only problem being that some of the elements are displayed inline - so I needed to join those together. But only when they appeared next to each other.

I would call the generic way of doing so group_sequential, where an array is chunked into sub-arrays, and sequential elements satisfying some predicate are included in the same sub-array. That way, my predicate could be :inline?, and I could join the text of each grouped element together to get the lines out.

For example, using even numbers for simplicity:

[1,2,3,4,6,8,5,4,4].group_sequential(&:even?)
=> [[1],[2],[3],[4,6,8],[5],[4,4]]

Here’s the ruby code I came up with:

	class Array
		def group_sequential
			result = []
			group = []
			finish_group = lambda do
				unless group.empty?
					result << group
					group = []
				end
			end

			self.each do |elem|
				if yield elem
					group << elem
				else
					finish_group.call
					result << [elem]
				end
			end
			finish_group.call
			result
		end
	end
	

Things are slightly less noisy, but assignment is subtly awkward in python without using nonlocal scope keyword (only available in python3):

	def group_sequential(predicate, sequence):
		result = []
		group = []
		def finish_group():
			if group:
				result.append(group)
			return []

		for item in sequence:
			if predicate(item):
				group.append(item)
			else:
				group = finish_group()
				result.append([item])
		group = finish_group()
		return result
	

This feels like something that should be doable in a much more concise way than I came up with above. Any ideas? (In either ruby or python)


Update:

My friend Iain has posted a number of interesting solutions over yonder, which got me thinking differently about it (specifically, reminding me of dropwhile and takewhile). I applied python’s itertools to the problem to get this rather satisfactory result in python:

	from itertools import takewhile, tee, dropwhile
	def group_sequential(pred, sequence):
		taker, dropper = tee(iter(sequence))
		while True:
			group = list(takewhile(pred, taker))
			if group: yield group
			yield [dropwhile(pred, dropper).next()]
	

(note: this is a generator which is fine for my purposes - you can always wrap it in a call to list() to force it into an actual list).

The same approach is acceptable when done in ruby, but a bit more verbose because of the need to explicitly check for the end of the sequence, and to collect the results array:

	class Array
		def group_sequential(&pred)
			sequence = self
			results = []
			while true
				group = sequence.take_while &pred
				results << group if group.size > 0
				sequence = sequence.drop_while &pred
				return results if sequence.empty?
				results << [sequence.shift]
			end
		end
	end
	

Update (the second):

While poking around itertools, I managed to miss groupby. I assumed it did the same thing as ruby’s Enumerable#group_by, which is to say not at all what I want (though it’s surely useful at other times). So here is presumably the most concise version I’ll find, for the sake of closure:

	from itertools import groupby
	def group_sequential(pred, sequence):
		return [list(group) for key, group in groupby(sequence, pred)]
	

Recursively Default Dictionaries

Today I was asked if I knew how to make a recursively default dictionary (although not in so many words). What that means is that it’s a dictionary (or hash) which is defaulted to an empty version of itself for every item access. That way, you can throw data into a multi-dimensional dictionary without regard for whether keys already exist, like so:

h["a"]["b"]["c"] = 5

Without having to first initialise h[“a”] and h[“a”][“b”].

A dictionary with a default value of an empty hash sprang to mind, but after trying it out I realised that this only works for one level. Recursion was evidently required.

So, here’s the python solution:

from collections import defaultdict
new_dict = lambda: defaultdict(new_dict)
h = defaultdict(new_dict)

And the ruby, which seems overly noisy:

new_hash = lambda { |hash, key| hash[key] = Hash.new &new_hash }
h = Hash.new(&new_hash)

Autonose: continuous test runner for python's nosetests

Today I’ve put up the first “releaseable” version of autonose. Basically, it analyses your code’s imports, and determines exactly which tests rely on which code. So whenever you change a file, it’ll automatically run the tests that depend on the changed file (be it directly or transitively). Give it a go, and please let me know your feedback.

All you need do is:

$ easy_install autonose
$ cd [project_with_some_tests_in_it]
$ autonose

See the github project page for further information (and a screenshot).

The joys of functional programming with side-effects

Quick quiz: what will the following pieces of ostensibly identical python and ruby programs print out?

If you answered “1 2 3 4” or something similar, you’re half right. That’s what you’ll get from ruby. Surprisingly, python will give you “4 4 4 4”. I’m not the first to discover this, but that doesn’t make it any less startling.

The “fix” in python is to replace the lambda line with q.append(lambda i=i: puts(i)), because default function paramaters are evaluated at definition-time, not evaluation-time (another difference from ruby, potentially even related?). It makes sense once it’s explained what’s going on, but it’s hardly obvious.

I don’t know enough about how closures are done to say that python is wrong, but it’s certainly less desirable and more surprising…

(for those keeping score, this makes ruby: 1, python: still heaps ;P)

P.S. I couldn’t resist mentioning the title of a proposed solution to this problem: For-loop variable scope: simultaneous possession and ingestion of cake

update: As matt just pointed out, the ruby port is different in that it uses i as an argument to a block. A more faithful translation would be:

for i in (0..9)
  q << (lambda {puts i})
end

Which has exactly the same outcome as python.

So there you go. It turns out closures are confusing. Who knew? ;P

rednose: coloured output formatting plugin for nosetests

I recently wrote a plugin for nosetests which greatly (imho) improves the output for failed and errored tests. The screenshot explains it best.

To install, just run easy_install rednose. Then you can run nosetests with the –rednose option.

See github.com/gfxmonk/rednose for code and more information.

ruby - longing for some discipline

More and more, I am wishing that there was some sort of strict mode I could enable in ruby to say “you know what? I’m careful with my code. Please don’t assume things behind my back”. And to be honest, this mode would pretty much be synonymous with “just do what python would do”

By default, python is strict. If you index a dict (hash) with a nonexistant key, you get a keyError. If you don’t want to have to deal with that, you can use the get method and provide a default for if the key doesn’t exist. In ruby, if you want to be strict about anything, you generally have to write your own checks to guard against the core library’s forgivingness. Forgivingness sounds nice at first, but goes completely against the idea of failing fast, and frequently delays the manifestation of bugs, making them that much harder to actually track down.

Two examples that I came across within minutes of each other the other day:

Struct.new(:a,:b,:c).new('a','b')

that should NOT go without an exception

"a|b||c".split("|")
=> ["a", "b", "", "c"]

good…. so now:

"a|b||".split("|")
=> ["a", "b"]

argh! what have you done to my third field?

Pretty Decorators

Python decorators are cool, but they can become very messy if you sart taking arguments:

def decorate_with_args(arg):
    def the_decorator(func):
        def run_it():
            func(arg)
        return run_it
    return the_decorator

@decorate_with_args('some string')
def messy(s):
    print s

ugh. Three levels of function definitions for a single decorator. And heaven forbid you want the decorator to be useable without supplying any arguments (not even empty brackets).

So then, I present a much cleaner decorator helper class:

class ParamDecorator(object):
    def __init__(self, decorator_function):
        self.func = decorator_function

    def __call__(self, *args, **kwargs):
        if len(args) == 1 and len(kwargs) == 0 and callable(args[0]):
            # we're being called without paramaters
            # (just the decorated function)
            return self.func(args[0])
        return self.decorate_with(args, kwargs)

    def decorate_with(self, args, kwargs):
        def decorator(callable_):
            return self.func(callable_, *args, **kwargs)
        return decorator

All that’s required of you is to take the decorated function as your first argument, and then any additional (optional) arguments. For example, here’s how you might implement a “pending” and “trace” decorator:

@ParamDecorator
def pending(decorated, reason='no reason given'):
    def run(*args, **kwargs):
        print "function '%s' is pending (%s)" % (decorated.__name__, reason)
    return run

@ParamDecorator
def trace(decorated, label=None):
    if label is None:
        label = decorated.__name__
    else:
        label = "%s (%s)" % (decorated.__name__, label)
    def run(*args, **kwargs):
        print "%s: started (args=%s, kwargs=%s)" % (label, args, kwargs)
        ret = decorated(*args, **kwargs)
        print "%s: returning: %s" % (label, ret)
        return ret
    return run

Which can then be used as either standard or paramaterised decorators:

@pending
def a():
    pass

@pending("I haven't done it yet!")
def b():
    pass

@trace
def foo():
    return "blah"

@trace("important function")
def bar():
    return "blech!"

And just to show what this all amounts to:

if __name__ == '__main__':
    a()
    b()
    foo()
    bar()

reveals the following output:

function 'a' is pending (no reason given)
function 'b' is pending (I haven't done it yet!)
foo: started (args=(), kwargs={})
foo: returning: blah
bar (important function): started (args=(), kwargs={})
bar (important function): returning: blech!

Fairly simple stuff, but hopefully useful for anyone who finds themselves tripped up by decorators - particularly when trying to allow for both naked and paramaterised decorators.

P.S: I’ve made a pastie of the code in this post, because my weblog engine is not cool enough to colour-code python ;)

mocktest

I’ve been working on this a little while now. At work, I use ruby. It has its good points, but my language of choice is still python. I am, however, blown away by rspec. So I’ve tried to bring some of its features into python (namely and should_receive matchers).

Thus was born mocktest. The readme has all the examples you should need to start using it in your own testing code.

It’s also available from the cheese shop, which means you can just run easy_install mocktest to install it on your system.

I should note that this code builds upon the excellent Mock library by Michael Foord.

HTML to PDF converter

Given the integration of PDF into Mac OS X, I was surprised to find that there didn’t seem to be any tool to convert HTML files into PDFs. So, like any frustrated coder I wrote my own little script to do it: html2pdf.py

Requires python and OSX 10.5 (Leopard). It uses a WebKit view for the rendering, so in theory it should work with any URL that safari can handle - but I haven’t exactly tested it thoroughly…