Saturday, January 12, 2008

Data Loss Prevention Tips


That's the sound of the second most common type of data loss! We've all done it at one point or another. No matter what industry you are in, whether it be software development, information technology, automotive repair, or homemaking, you've most likely experienced some form of data loss caused by some form of human error.

Most common types of data loss

According to several resources, this type of data loss accounts for 32% of all data losses and trails behind hardware failures, which account for 44% of all instances of data loss. (I'm not going to bother to cite my sources; just simply search for data loss statistics and you'll see numbers very close to these.)

The sad thing is, data loss can be prevented. In fact, many people have made a lot of money developing backup solutions to help protect consumers and industries in the event that a data loss does occur.

But I'm not promoting backup software. Yes, backup software is important, but I think that there are other factors that are important to consider. I've come up with a list of suggestions which can be utilized to help protect you from data loss. I work in software development, so many of these suggestions will apply to my field. However, the concept is the same. Set yourself up for success, and protect you from yourself!

Data Loss Prevention Tips

Data Loss Prevention Tips for writing SQL Statements

SQL is not designed for human consumption. In fact, the Linux command line is more user friendly than SQL. Most of the tools that we use on computers are designed with functionality to help protect us from ourselves, but not SQL.

SQL, or Structured Query Language, is a syntax used by developers and database administrators to modify the structure and data contained within a database. While SELECT statements are relatively harmless, most other types of statements can be extremely hazardous to the health of your data.

Consider this statement that may be used to update a balance in a bank account table:

update accounts.checking set balance='100.00' where accountnumber='12345678';

The above statement will change the balance in account 123456789 to $100.00. Let's assume that there are 1,000,000 accounts in this table. With this statement, we updated a single customer's account.

Next, consider the following statement:

update accounts.checking set balance='100.00';

The above statement is missing a very important piece. Without a where clause, the entire list of account balances is changed to $100.00! This is a major, catastrophic error!

No single customer has the same banking habits or account balances as another customer. Some people are constantly overdrawn while others have savings in excess of thousands of dollars. Some have balances that remain fairly constant while those with debit cards tend to have more dynamic account balances. Not only is a restore going to be necessary, but it must be coordinated with the fact that banks are a 24 hour a day 7 day per week business, and the restore will need to account for these changes as well.

In my experience, the single most common cause of these types of SQL errors is that the query is executed before its author finished writing it. A colleague of mine offers a very brilliant suggestion: Purposely embed a syntax error into your query until you are sure that all of the components are in place:

pdate accounts.checking set balance='100.00';

The above statement -- when executed -- will throw a syntax error. Syntax errors are good. They let the user know that some required component is missing in order to complete an operation. When a syntax error occurs, nothing happens! This occurs with a majority of tools and software that's currently in use, but not with SQL. With SQL, a developer or DBA has an executable statement at a very critical point in the process of writing the statement. I've never written an update SQL statement that didn't involve a where clause. They just aren't very common, but for a brief second when writing an update statement we have a fully executable statement capable of wreaking havoc.

If I were to redesign the SQL language, I would put the where clause first. It's perhaps the single most important piece of the statement, and if it's written first, accidents should happen much less. Here is my version of SQL:

update where accountnumber='12345678' set balance='100.00';

In my example above, if I execute before finishing the query, nothing happens! But the bottom line is that SQL isn't going to change anytime soon. There are people out there right now that are probably laughing at the fact that I've even considered rewriting SQL. "Just be careful", they'll say! "Just don't screw it up". Well, yeah! But sometimes things aren't that simple. Some people are built differently, and for some of us who are accident prone, we have learned techniques to adapt.

This is why I followed my colleagues advice and purposely write syntax errors into the beginning of my SQL statements. That SQL statement can't be executed until I'm ready to execute it and have thoroughly examined it. I keep all of my SQL update and insert statements saved in this format. Only SQL statements that I want to run will be executed. Yes, I am still very careful, but I also rely on this safety mechanism, just in case.

Data Loss Prevention Tips in Software Development and IT

We run a three-tier development system at work. Each application has its own development, staging, and live server. The development and staging servers are connected to a staging database that mirrors the live environment, and the live server is of course connected to the live database.

In any IT-related industry, the live server is the bread and butter. If something happens to it, expect to see a negative sign followed be several zeros on the balance sheet. The staging database isn't important. Sure, it can be corrupted. If something bad does happen to it, your friendly IT department probably won't be inviting you out to lunch for a few days because they'll have to add "restore Staging database" to their to-do list, but to external customers and the bottom line, it just doesn't matter.

The staging environment is a developer's sandbox to do whatever he or she pleases. When I'm working in a development environment, that part of my brain that gets real paranoid and makes me do crazy things (like write syntax errors into my SQL statements) shuts down. There's really nothing to break. Play, have fun, and if something breaks, find out why and move on.

However, this is an area where human error can occur. In small companies, developers sometimes configure their own development environments. In my case, I configured my environment to use the staging database. I set all of my environment variables to reflect development mode.

Data Loss Analogy

Data Loss Prevention TipsHave you ever walked out of a store into a packed parking lot, approached what you thought was your car with the key and tried to unlock the door? Dodge Caravans are good examples. They're everywhere, and they all look exactly the same. Here are some factors that differentiate them: Color, tires, hub caps, interior, window tinting, and many more.

Now, if you made the mistake of approaching the wrong minivan, you wouldn't walk up to a blue minivan if you owned a red one. That's easy to identify the difference. But another red minivan, with the same hub caps, the same window tinting... well, maybe this is enough to fool you.

However, no matter how close the other minivan is in comparison to yours, the bottom line is it's NOT yours, and the key just won't fit.

This analogy can be applied to configurations as well. IP addresses all look alike. They're numbers. They're red minivans with window tinting. Hostnames are different. Now you have a blue minivan and a red one.

Other criteria that can be used to help an individual quickly spot a configuration problem are passwords. Don't use the same passwords for your staging and live environments. You shouldn't anyway for security, but someone who tries password A on server B when they meant to login to server A is going to realize their mistake before any damage can be done. The IT specialist who looks at the development server configuration for the new developer and sees an obfuscated password instead of the hello world password that is used on the other development servers is going to realize something is amiss and correct the problem before the new mad scientist developer conducts science experiments on the live server.

It's like documenting code. The easiest way to document source code is to use variable names that convey information to other humans about what the code does. This way, you kill two birds with one stone by telling not only the compiler, but also people.

Example 1:
//Send the account balance to the user via email

Example 2:


Documenting code in this manner works extremely well, and if it works here, it should work in other areas as well.

Data Loss Prevention by Setting Read-only Flag

A read only file can still be deleted, but if you mark it as "read only", perhaps it will make you think before going through with actually deleting the file!

These suggestions are no different than things like setting that bill that has to be paid next to your car keys so that you won't leave the house without it. The only difference is that these suggestions are for situations that are way more technical.