HTML is Your Markup Language Anyways

I hate Wiki markup. I hate BBCode. I hate markdown. I hate the million other custom markup languages that have infested the web.

When you’re using one of these markup languages, you’re using HTML anyways. Because they must be translated to HTML to serve as web pages, at best they’re poor translations of HTML. A lot of them are “parsed” with regular expressions, leading to all kinds of interesting bugs with nested tags. MediaWiki (the engine behind Wikipedia) uses two apostrophes for ‘‘bold’’ and three for ‘’‘italic’’’, and then just lets you use HTML for all the cases where these things break. Most markup languages have some passthrough like this to deal with ambiguity or complexity -- so while some markups (not BBCode, which is like HTML in brackets) can claim to be easier to start using, users always end up learning HTML in the end.

MediaWiki actually has a decent argument for custom markup in its ability to build templates and fill in particular fields with data. These custom tags aren’t available in HTML, but there’s nothing stopping a site from adding its own tags (which may or may not look much like HTML) that are compiled down to HTML in the exact same fashion.

Some will argue security, that allowing raw HTML would allow people to perform cross-site scripting and other attacks, or include obscene images, or break the page layout, etc. I’m not arguing for letting users use HTML without filtering, it’s absolutely necessary. It’s difficult to protect against the wide variety of attacks, but being difficult doesn’t mean it shouldn’t be done. There are excellent whitelisting libraries available that remove any excuse about rolling one’s own, like Ruby’s Sanitize and PHP’s HTML Purifier.

If you’re building a site, let your users use HTML (of course, replace newlines with <br> for them). A lot will already know it, there are a million free tutorials out there for them to learn it, and after using your site they’ll have a useful skill.