Client Logging to AWS

Over on the BBGameZone forums helderic asked how to deal with exploits:

Lets say player A finds a exploit to duplicate a item and decides to exploit it. He continues to exploit it for a few weeks then player B finds the bug and reports it.
How would you catch player A? And what are some systems to watch/catch exploits/bugs/cheaters.

I replied:

Logging, logging, logging.

When a player advances a level, log it. When a player buys something substantial, log it. When a player transfers items or money to another player, log it. When a player makes any significant achievement, log it. Then do things like count how many items the average player gets in a day or the average time it takes to go from level 9 to 10. Look at whoever’s doing things way faster or stronger.

On top of that, do a daily snapshot of a player’s stats (money, level, items, etc.) in case your other logs fail to notice where the resources are coming from. Then you do the same sort of analysis.

If you don’t have any logs and you want to find who abused an exploit you’ve only just realized, you’re going to have a hard time. If duping an item requires visiting a particular page, or visiting two pages in quick succession, your web server logs will have *something* you can look at, you can compare player IP to any suspicious entries in your access.log. If only a particular item can be duped, look in the db for who has lots of them -- actually, you can do this for *any* item.

What you always have in your favor is that cheaters quickly get greedy when they realize what they’ve got and think they’re getting away with it. They won’t just go up a level now and then, they’ll go up ten today. They won’t produce a handful of extra gold for the occasional purchase, they’ll produce a million. Look for the outliers.

Logging isn’t just useful for catching cheaters, it tells you a lot of important things about your game. Are players doing well? Do players use the newly-deployed feature? What signs are there before someone abandons their account?

There are a lot of great things you can learn from logging and analytics and I plan to get into them in the coming weeks. This post is about a smaller topic, though:

How do you deal with all those logs? Aside from storing them all, processing them could be a lot of work.

One common answer is Amazon S3 (with EC2 for processing) which stores ridiculous amounts of data if you don’t mind some network latency. That’s nicely suited for logs where you rarely actually want to see the damn things, you just want to see reports and compilations and other whatnot.

This is such a nice obvious idea that Amazon has fairly recently created Elastic MapReduce, which makes it fall-down easy to run Hadoop on your data in S3. Hadoop is, outside of the Google Mothership, quickly becoming the standard way to sift through terabytes of data for the useful bits. Elastic MapReduce looks like such a win that I’m not really looking hard at other options for my game anymore. (After looking at all the sysadmin work Amazon Elastic MapReduce saves my only surprise is that they charge 15% more for it than the regular EC2. Now that they’ve built EMR the cost is down to ongoing maintenance for a strong driver of S3/EC2 use, why give the big customers any more incentive to decide they’d save money moving it in-house?)

So I was kicking around the question of how you get this data up to S3 with another developer and it seems like after you import any existing data you’d probably want to minimize the precious outgoing bandwidth of your phenomenally successful game’s data center. It occurred to me that you could parallelize much of the logging of user actions.

The basic idea is to create logs on both the server and client, have the client use their bandwidth sending logs to S3, use checksums from the server to make sure clients aren’t doing anything funny, and fall back to uploading the server’s authoritative log if so. There are two basic strategies:

First, clients could log directly to S3 using Browser-Based POST. Your app gives the client a token that allows them to upload a specific amount of data to a specific place in S3 and the app does an HTTP POST of data. You’d probably want to do this on a regular basis (in case the client suddenly drops offline) and have a process compile them into larger logs.

Alternately, you host a web server or other daemon on EC2 (probably using a reserved instance to lower the cost) to process and store the logs as they’re delivered. I like this slightly less than the previous store-and-process model that deals with failure and fits Hadoop’s model a little better.

This seems like a handy way to cut down on one expense, though overly-complex for my initial needs. I’m nearly certain I’ll use Elastic MapReduce for log processing, though.