An Academic Inconvenience of Python «
»


Code:
13 comments

Sometimes Python’s roots in academia bug me. Lots of functions have a computer science feel instead of a software development feel. Here’s an example I just ran into: I wanted to fit as many sentences as possible from a long text into 255 characters. So I wrote:

s = s[:255][:max(s.rindex('.'), s.rindex('!'), s.rindex('?')) + 1]

This snippet chops it down to the 255 max, finds the ., !, or ? marking the end of the last sentence, and chops there. Great, right? Except it doesn’t work.

Instead of returning None when it can’t match the substring, rindex throws ValueError. So unless the first 255 characters of the string contain a ., !, and ? it’ll throw an exception. OK, let’s try:

rightmost = -1
try:
    rightmost = s.rindex('.')
except ValueError:
    pass
try:
    rightmost = max(rightmost, s.rindex('!'))
except ValueError:
    pass
try:
    rightmost = max(rightmost, s.rindex('?'))
except ValueError:
    pass
s = s[:255][:rightmost + 1]

Eww. OK, let’s encapsulate that redundancy:

def no_exception_rindex(s, substring):
    try:
        return s.rindex(substring)
    except ValueError:
        return None
 
s = s[:255][:max(no_exception_rindex(s, '.'), no_exception_rindex(s, '!'), no_exception_rindex(s, '?')) + 1]

That’s… well, it’s at least a little better. Lucky that max doesn’t mind seeing None, I could imagine it throwing its own ValueError. But I wouldn’t call this code good, we’ve been forced to switch out of object-oriented code because we can’t add our no_exception_rindex to the string objects.

Here’s another approach:

def rightmost_punctuation(s):
    index = len(s) - 1
    while index > 0 and s[index] not in ['.', '!', '?']:
        index -= 1
    return index
 
s = s[:255][:rightmost_punctuation(s) + 1]

I’d actually call this one worse, as it’s not immediately obvious what it’s doing. And anytime I create a variable and then tinker with it inside a loop I feel like I want to rewrite that code to use map and/or reduce.

Tomorrow I’ll redo this example in Ruby to talk about open classes, but for today does anyone have a better approach in Python?


Comments

  1. Long, long time ago, when I was came to Python from the PHP world I’ve appreciated Python for exactly this type of behaviour: don’t return stupid “default” values, and instead raise errors when something is not according to the programmer’s wishes.

    I, myself, prefer a somewhat longer code, but one that is more easily understood at first glance.

  2. Phil: Wow, I completely missed rfind. That does indeed work exactly the way I wanted.

    Tiberiu: Yeah, I go back and forth over how paranoid I want the language to be about these things by default. I also learned Python after PHP and I saw a lot of PHP bugs come from it being especially “helpful” by coercing types and not complaining about discrepancies.

  3. Wouldn’t using rfind instead of rindex work? It returns -1 when not found rather than raising an exception, and it is even one character less to type!

  4. Can you really use s like that? Consider which of these 2 variants – which would python execute internally?

    a = s[:255]
    s = [:max(a.rfind(’.’), a.rfind(’!’), a.rfind(’x’) + 1)]

    a = s[:255]
    s = [:max(s.rfind(’.’), s.rfind(’!’), s.rfind(’x’) + 1)]

    If ‘s’ somehow mutates halfway through the statement i’d consider it somewhat broken, but in order for your one-liner to work.. it would have to ?!?

  5. This is why the regexp module (re) exists (and rfind as others have pointed out). If you had to use rindex, you could do the hack approach and prepend the string with ‘!?.’ :)

  6. John: Yeah, you’re right. I just tested and found I had to write:

    s[:max(s[:255].rfind("."), s[:255].rfind("!"), s[:255].rfind("?")) + 1]

    The code was slightly different in the script I wrote (two lines, had a temp var) for unrelated actions and I ran it together for the post.

  7. Academic roots of Python the problem. Nah. First of all your example is missing a right paren to close the max function. Next, I just don’t see how your logic is going to work, maybe I’m missing something, but it looks to me that the value of s in the right index is going refer to the original longer string. rfind seems to be a better choice. This or a variant seems to be closer to to what your looking for:
    s[:max(s[:255].rfind(‘.’), s[:255].rfind(‘!’), s[:255].find(‘?’))+1]
    A little clunkier, but it seems to work.

  8. A shorter way (that works):

    s[:max([s.rfind(c,0,255) for c in ".!?"])+1]

  9. If you want to use regular expressions, this should do the trick:

    ^(.{,255}[.!?])

    I’m not the fan of clever one-liners that I used to be. They seem out of place in Python.

    Cheers.

Leave a Reply

Your email address will not be published.