Strings are a Domain-Specific Language

Question: Isn’t a domain-specific language just the same thing as a library?

Source: Pretty much everyone the first time they hear of DSLs.

Answer: No, a DSL is much more than a library, and I have an example that won’t make you say, “Well, sure, if you’re doing something that esoteric...”

My example of a domain-specific language is strings. No, seriously. Let’s figure out the length of a string in x86 assembly:

` strlen: push edi sub ecx, ecx mov edi, [esp+8] not ecx sub al, al cld repne scasb not ecx pop edi lea eax, [ecx-1] ret `{lang=”asm”}

Computer memory is one big linear stream of bytes we can scan through, looking for the null that terminates our string. Boy, is that some fast code -- we might even call it efficient, if we ignore the fact that we’ll reach the eventual heat death of the universe before we finish our web 2.0 app.

So we move up in abstraction to C, which has arrays. And you can pretend a string is an array and walk it looking for that null terminator:

` static int my_strlen(const char c) { int l = 0; while (c++) l++; return l; } `{lang=”c”}

This code is basically the same as in assembly, but it must be nicer to read because it uses all that stylish punctuation. Well, it’s not really nicer, maybe a string isn’t really just an array. So let’s look at Python:

` len(“Hello, world!”) `{lang=”python”}

Now that’s downright human-readable. And I’ll admit I’m fudging here by just calling the built-in len() instead of writing one, but it just works and there’s none of this messing around with null bytes.

Well, maybe there’s messing around with null bytes. I don’t have to know how Python implements len() and, more importantly, I don’t have to pretend a string is only an array or a small bit of sequentially-addressed memory.

To continue the example let’s look at regular expressions, a powerful way to search strings. We’ll write a pirate detector in Python:

` import re matches = re.search(“ar+g+h”, “Oim a poirate, arrrgh!”) if matches: print “There must be a pirate, I heard someone say ‘%s’.” % matches.group() else: print “No pirates detected.” `{lang=”python”}

This is important code, as pirates hide all over the web. But it’s pretty clunky, we have to import a library and call functions and evaluating responses and save objects... It’d sure be handy if regular expressions were part of the language like in Ruby:

` if “Oim a poirate, arrrgh!” =~ /ar+g+h/ then puts “There must be a pirate, I heard someone say ‘#{$&}’.” else puts “No pirates detected.” end `{lang=”ruby”}

This code is even nicer, our regexp is a first-class type and tightly integrated into the language. The increase in stylish puncutation might make for a higher learning curve, but we can express ourselves much more naturally.

A DSL is all about moving up in abstraction until your code directly reflects the high-level concepts you’re working in. You don’t have to peer through a haze of bits and pointers, your actions become synonymous with your intentions.