Open Source Opportunities

Saturday, January 12, 2008

Data Loss Prevention Tips

Oops!

That's the sound of the second most common type of data loss! We've all done it at one point or another. No matter what industry you are in, whether it be software development, information technology, automotive repair, or homemaking, you've most likely experienced some form of data loss caused by some form of human error.

According to several resources, this type of data loss accounts for 32% of all data losses and trails behind hardware failures, which account for 44% of all instances of data loss. (I'm not going to bother to cite my sources; just simply search for data loss statistics and you'll see numbers very close to these.)

The sad thing is, data loss can be prevented. In fact, many people have made a lot of money developing backup solutions to help protect consumers and industries in the event that a data loss does occur.

But I'm not promoting backup software. Yes, backup software is important, but I think that there are other factors that are important to consider. I've come up with a list of suggestions which can be utilized to help protect you from data loss. I work in software development, so many of these suggestions will apply to my field. However, the concept is the same. Set yourself up for success, and protect you from yourself!

Data Loss Prevention Tips

Data Loss Prevention Tips for writing SQL Statements

SQL is not designed for human consumption. In fact, the Linux command line is more user friendly than SQL. Most of the tools that we use on computers are designed with functionality to help protect us from ourselves, but not SQL.

SQL, or Structured Query Language, is a syntax used by developers and database administrators to modify the structure and data contained within a database. While SELECT statements are relatively harmless, most other types of statements can be extremely hazardous to the health of your data.

Consider this statement that may be used to update a balance in a bank account table:

update accounts.checking set balance='100.00' where accountnumber='12345678';

The above statement will change the balance in account 123456789 to $100.00. Let's assume that there are 1,000,000 accounts in this table. With this statement, we updated a single customer's account.

Next, consider the following statement:

update accounts.checking set balance='100.00';

The above statement is missing a very important piece. Without a where clause, the entire list of account balances is changed to $100.00! This is a major, catastrophic error!

No single customer has the same banking habits or account balances as another customer. Some people are constantly overdrawn while others have savings in excess of thousands of dollars. Some have balances that remain fairly constant while those with debit cards tend to have more dynamic account balances. Not only is a restore going to be necessary, but it must be coordinated with the fact that banks are a 24 hour a day 7 day per week business, and the restore will need to account for these changes as well.

In my experience, the single most common cause of these types of SQL errors is that the query is executed before its author finished writing it. A colleague of mine offers a very brilliant suggestion: Purposely embed a syntax error into your query until you are sure that all of the components are in place:

pdate accounts.checking set balance='100.00';

The above statement -- when executed -- will throw a syntax error. Syntax errors are good. They let the user know that some required component is missing in order to complete an operation. When a syntax error occurs, nothing happens! This occurs with a majority of tools and software that's currently in use, but not with SQL. With SQL, a developer or DBA has an executable statement at a very critical point in the process of writing the statement. I've never written an update SQL statement that didn't involve a where clause. They just aren't very common, but for a brief second when writing an update statement we have a fully executable statement capable of wreaking havoc.

If I were to redesign the SQL language, I would put the where clause first. It's perhaps the single most important piece of the statement, and if it's written first, accidents should happen much less. Here is my version of SQL:

update where accountnumber='12345678' set balance='100.00';

In my example above, if I execute before finishing the query, nothing happens! But the bottom line is that SQL isn't going to change anytime soon. There are people out there right now that are probably laughing at the fact that I've even considered rewriting SQL. "Just be careful", they'll say! "Just don't screw it up". Well, yeah! But sometimes things aren't that simple. Some people are built differently, and for some of us who are accident prone, we have learned techniques to adapt.

This is why I followed my colleagues advice and purposely write syntax errors into the beginning of my SQL statements. That SQL statement can't be executed until I'm ready to execute it and have thoroughly examined it. I keep all of my SQL update and insert statements saved in this format. Only SQL statements that I want to run will be executed. Yes, I am still very careful, but I also rely on this safety mechanism, just in case.

Data Loss Prevention Tips in Software Development and IT

We run a three-tier development system at work. Each application has its own development, staging, and live server. The development and staging servers are connected to a staging database that mirrors the live environment, and the live server is of course connected to the live database.

In any IT-related industry, the live server is the bread and butter. If something happens to it, expect to see a negative sign followed be several zeros on the balance sheet. The staging database isn't important. Sure, it can be corrupted. If something bad does happen to it, your friendly IT department probably won't be inviting you out to lunch for a few days because they'll have to add "restore Staging database" to their to-do list, but to external customers and the bottom line, it just doesn't matter.

The staging environment is a developer's sandbox to do whatever he or she pleases. When I'm working in a development environment, that part of my brain that gets real paranoid and makes me do crazy things (like write syntax errors into my SQL statements) shuts down. There's really nothing to break. Play, have fun, and if something breaks, find out why and move on.

However, this is an area where human error can occur. In small companies, developers sometimes configure their own development environments. In my case, I configured my environment to use the staging database. I set all of my environment variables to reflect development mode.

Data Loss Analogy

Have you ever walked out of a store into a packed parking lot, approached what you thought was your car with the key and tried to unlock the door? Dodge Caravans are good examples. They're everywhere, and they all look exactly the same. Here are some factors that differentiate them: Color, tires, hub caps, interior, window tinting, and many more.

Now, if you made the mistake of approaching the wrong minivan, you wouldn't walk up to a blue minivan if you owned a red one. That's easy to identify the difference. But another red minivan, with the same hub caps, the same window tinting... well, maybe this is enough to fool you.

However, no matter how close the other minivan is in comparison to yours, the bottom line is it's NOT yours, and the key just won't fit.

This analogy can be applied to configurations as well. IP addresses all look alike. They're numbers. They're red minivans with window tinting. Hostnames are different. Now you have a blue minivan and a red one.

Other criteria that can be used to help an individual quickly spot a configuration problem are passwords. Don't use the same passwords for your staging and live environments. You shouldn't anyway for security, but someone who tries password A on server B when they meant to login to server A is going to realize their mistake before any damage can be done. The IT specialist who looks at the development server configuration for the new developer and sees an obfuscated password instead of the hello world password that is used on the other development servers is going to realize something is amiss and correct the problem before the new mad scientist developer conducts science experiments on the live server.

It's like documenting code. The easiest way to document source code is to use variable names that convey information to other humans about what the code does. This way, you kill two birds with one stone by telling not only the compiler, but also people.

Example 1:

//Send the account balance to the user via email
 s.s(a,u);

Example 2:


CustomerEmailManager.sendAccountBalanceToUserViaEmail(accountNumber,userId);

Documenting code in this manner works extremely well, and if it works here, it should work in other areas as well.

Data Loss Prevention by Setting Read-only Flag

A read only file can still be deleted, but if you mark it as "read only", perhaps it will make you think before going through with actually deleting the file!

These suggestions are no different than things like setting that bill that has to be paid next to your car keys so that you won't leave the house without it. The only difference is that these suggestions are for situations that are way more technical.

Saturday, December 8, 2007

Corporate Intranet Blogging

One of my coworkers recently came up with what I thought was a really good idea. He suggested that the company allocate time for us to blog. From a marketing standpoint, this could be good public relations in terms of finding both customers and employees, but from a business standpoint, there is a danger of compromising corporate intellectual property.

A lot of other companies have been successful with this idea, but our management is against the idea because of the fact that company secrets could inadvertently find their way into the hands of a competitor.

Therefore, my solution is corporate intranet blogging. We take the idea and we push forward with it, but only internally. This won't buy us any PR points, but it will provide everyone in the company with a platform where they can keep everyone up to date on any cool technologies, advice, bugs, or projects that they've worked on.

This would be a great extension of documentation. If I spend a few days fixing a nasty bug that I think someone else in the company may encounter, the fix will be documented in my intranet blog. If I know that coworker A has really good suggestions for user interface design, I can periodically review the blog for new advice. Sure, this may not be necessary when everyone works in the same office during the same hours, but suppose most employees work in another office? This is a great way to find out what everyone is working on without having to necessarily ask.

Tuesday, November 27, 2007

A Quarterback's Operating System

"Perhaps this is what needs to happen. Every Linux seller from Novell to Dell (via Canonical) buys air time during the Super Bowl and co-ops a website and a call center phone number for people interested in getting more information."

- Matt Hartley at Mad Penguin dot org

At the beginning of 2007, I was saying the same thing! I was extremely disappointed during the last Super Bowl! Well, not really. The Colts defeated the Bears in an exciting game where the combined efforts of Peyton Manning, Adam Vinatieri, and the Colts offense and defense put enough points up on the scoreboard to defeat the Bears. As one of the only two undefeated teams in the NFL this year, the Colts could end up playing at another Super Bowl game once again.

Although I wrote this article before the Colts loss to the Patriots, the Colts are still in the running and could still defeat the Patriots if they play as aggressively as the Eagles did!

This time, I'd like to see Peyton Manning take his acting career a step further. Last season, Peyton was the spokesman for Verizon. Instead of gaining sponsorship from Verizon, this year I'd like to see Peyton Manning in a Linux commercial. Linux is touted as a geeks' operating system. However, Linux distributions such as Ubuntu are clearly targeting your average home users. So what better way to advertise than to do exactly what Matt Hartley has suggested?

As I mentioned, I didn't see a single Linux commercial during the last Super Bowl, but I totally expected to see one! I was so sure that Linux would make its debut into mainstream media. With Vista looming on the horizon, now is the perfect opportunity for Ubuntu, SUSE, and other Linux distributions to spread the word during the largest televised event during the year.

Novell, you are a corporation. Throw some of those corporate dollars to Peyton Manning! Dell, you already advertise. Help out the open source movement and promote Ubuntu on your laptops! This is what makes you stand out from the crowd! Capitalize on it!

No one ever said it was illegal to make sales from open source software!

Saturday, November 10, 2007

Google Releases New Version of Gmail

Google has released a new version of Gmail. The ui parameter that appeared in the URL recently, the one that broke the HTML Reply Signatures for Gmail Signatures Firefox Extension and Greasemonkey, was part of the plan for allowing users to switch between the newer version and the older version.

How to Go Back to Gmail's Older Version explains more of the details regarding the new version features as well as how to switch back to the older version.

I just bought a 1968 Ford Thunderbird! I bought it yesterday actually. It hasn't even been 24 hours! It has a 429 cubic inch big block V8, suicide doors, a nice, well-preserved leather interior, and driving this well-engineered machine is like stepping back in time! It also has enough power to propel you into tomorrow!

And just how does this relate to the Gmail Reply Signatures Extension you ask? Well, it will be awhile before I will be able to fix this newly created Gmail bug, so for now, I'm going to recommend that you use the older version of Gmail so you can still use your HTML Signatures. Fortunately, Google is good at allowing us to enjoy backwards compatibility for awhile.

I'll post pictures of the car soon!

Thursday, November 1, 2007

Rebooting Linux

Rebooting Linux. These are two words that are seldom said in the same sentence. Its something that most Linux users just never think about; but recently, rebooting Linux was the topic of conversation between myself, and a coworker with a Windows background.

"Why are we using Linux as a server?", she asked. This is a very good question that is likely to evoke a myriad of different responses.

"I think that Linux is used because of it's modular nature. Updates, configuration changes, and other maintenance can be done to the server without needing to reboot it.", I stated.

"Well, when does it need to be rebooted?", asked my coworker. Wow! I never really thought about it like that before. When does it need to be rebooted? Of course, I knew from experience that Linux could run for months without requiring a restart ever since Professor James Caldwell of the University of Wyoming stated that his office computer reported over 100 days of uptime since the last reboot. But I never really thought about why Linux would require a reboot.

With Windows, it's fairly obvious. Rebooting a Windows box is done in all of the following situations, yet none of these situations require a Linux reboot:

1 - When updating or patching the operating system.
2 - When installing new software of any kind.
3 - When it locks up or slows down.
4 - As the first step in any troubleshooting procedure.
5 - As a secondary, tertiary, intermediary, or final step in any troubleshooting procedure.
6 - Automatically, when you're not looking.
7 - As a troubleshooting step.
8 - When an application conflicts with the system.
9 - As a troubleshooting step.
10 - As a trou...

Anyway, I've had the opportunity to install some software on the servers at work, and not once have I had to reboot the server. Not once has the server locked up, acted sluggish, or thrown the blue screen of death. I've heard our IT department speak of worms on Windows Servers, but laughter ensues when someone asks what kind of virus software is installed on the Linux servers.

The conversation between my coworker and I continued: "When do we need to reboot a server?", she asked.

"That's a good question. I'm not sure. I don't think it matters. But you've made me curious... Let's see how long it's been since svrXX was rebooted...". So at the SSH terminal, I type 'uptime' at the prompt, "162 days!", I exclaim.

"It has to be rebooted in 162 days?", she asks. Having primarily a Windows background, the concept of not having to reboot is a bit perplexing to her.

"No, that is how long it has been since it was last rebooted!", I explain. I try running 'uptime' on another server, "Check out this one, 315 days! This server was last rebooted almost a year ago! Now, there's no uptime command for Windows --", probably because hours are easier to remember than months, "but if you could type uptime at the prompt it would probably tell you that it had been running for only a few days."

"Oh! Can I try uptime?", she asks.

Another coworker overhears our conversation, "Actually,", he begins, "there is indeed a command you can use to view statistics from the DOS prompt. Type 'net statistics server'."

Upon discovering this information, my curious coworker types the command in her DOS window. "My computer has been running for eight(8) hours!", she said. "I just turned it on this morning."

I looked at my screen. "Mine shows a whopping six(6) days!", and it definitely showed. The PC was running slow. Clicking on an icon resulted in a delayed reaction, kind of like in cartoons where the character gets his foot stepped on and it takes a few minutes for the signal to reach his brain. There were 46 Firefox tabs open, 2 VMWare images running, a few Notepad windows, collaboration software, a database manager, a couple folders, and several Putty windows along with a mess of other open programs.

Like an old muscle car, I was pushing this thing to it's limit and it was still driving forward. It was definitely being taxed, so no wonder I have problems with my computers, but I wonder if I would have the same problems with Ubuntu or SUSE installed on my workstation PC? With SUSE or Ubuntu, I could "rip the process out of the wall" using 'kill -9', as another co-worker would say; but with Windows, eventually you come to a point where you have to just kill the motor and restart.

But why risk any downtime if you don't need to?

P.S. In all fairness to Windows, I did take note of the fact that Ubuntu 6.0.6 did require a restart after some updates. I was unpleasantly surprised by this and am not quite sure what to make of it. Of course, I think Ubuntu was designed more for home desktop users rather than for use as a server. More research will be needed in order to determine if Ubuntu Server suffers from this same requirement.

Saturday, October 27, 2007

Using VMWare as a Cross-Platform Virtualization Platform

As part of my Microsoft Windows XP Quit Date strategy, I have been looking into using VMWare as a replacement technology for my Microsoft Virtual PC images.

I started using Virtual PC as a replacement for my dual-boot setup. I still have a dual-boot (actually a tri-boot setup), but I rarely, if ever, use it. The problem is that everything that I do need is on my Windows XP, NTFS-formatted, C: drive. NTFS-formatted drives are not writable from Linux, so when I boot into SUSE or Ubuntu, I feel like I can't update the files that I need to update or use software that I need to use.

As a result, I set up a SUSE 10.1 Virtual PC image. I don't use the GUI. Although GNOME looks awesome when I boot directly into SUSE or Ubuntu instead of Windows, the GUI is somewhat restrictive in Virtual PC. Instead of 1280x1024 screen resolutions, I'm limited to 1024x768.

Since starting my new job back in February, I've become extremely comfortable working from a Linux command prompt. In fact, I'm so comfortable at the command prompt that I feel lost when I try to do something using the GUI. Come to think of it, I don't even think we have the GUIs installed on our servers at work!

The main advantage that this gives me is a powerful development environment. I run Apache on the Virtual PC image, and I use this configuration to locally host development environments for websites that I am working on.

The NavCalendar Application

For example, I'm working on a Calendar application for Edúcate Ya. I'm building it in PHP. I chose PHP because their website is already built using PHP, and by using this technology I can get this application to market quicker than I would using some other technology. I host the development environment at http://dev.educateya.org locally on my Virtual PC image, and I created an entry in my Windows host file that maps the local IP address of the Virtual PC image to that host name. (The host file is located at C:\WINDOWS\system32\drivers\etc, and if this is the first time opening the file, one entry will be present by default. The default entry maps the host name "localhost" to the loop back IP address 127.0.0.1).

I configured Apache using Virtual Hosting. I created a configuration file mapping the IP address of the VMImage and port 80 to the hostname dev.educateya.org. After restarting Apache in Linux, and flushing the Windows DNS cache by running "ifconfig /flushdns", I was able to view the development version of the website on my local network.

In addition, this configuration gives me the ability to view the website on any computer on my network as long as the host file is mapped with the IP address and host name. This means the Virtual PC image could run on a server instead of my local PC, which would free up some memory used by the Virtual PC image!

Migrating to another Operating System

I've strayed from my point. I have, however, provided some insight into how I use virtualization technology and why it would interfere with my productivity should I lose the ability to boot my SUSE image. Virtual PC only runs on Windows. It can run many operating systems. I've successfully run Windows 3.1, Windows XP Home, Windows XP Pro, Windows Server 2003, Ubuntu 6.0.6, SUSE 10.1, several Knoppix versions, and FreeDOS. Currently, I only depend on the SUSE image although I have used the other operating systems as test platforms on several occasions.

As I mentioned, Virtual PC can run several operating systems, but it can only run on Windows 2000, XP, and Vista. Since I want to move toward software that runs on multiple platforms, I've been looking at VMWare. I use VMWare Player at work to run a Linux environment on a Windows XP machine in much the same manner as I do at home, and it works great! One of my coworkers also ran a Linux VMWare image on his Ubuntu desktop. Therefore, I know firsthand that it's cross-compatible with both Windows and Linux, and I know how the software works.

But what I recently discovered this-morning was something that is quite common in many open source software packages. VMWare offers users the ability to convert Virtual PC images to VMWare images! This is awesome! I don't have to go through the time-consuming process of reinstalling SUSE 10.1 on a VMImage and configuring it the way I want! Using the VMWare Converter, Virtual PC users can convert an image to VMware format!

After getting more hard drive space, this will be the next step in preparing to quit using Microsoft Windows XP.

Running dual-boot setups concurrently

While writing this piece, I stumbled upon an article that describes how to Run an Existing Windows Installation on Ubuntu with VMware Player. The author does not describe why these steps work, but he does outline the steps required to configure VMWare Player and the Windows installation. Personally, I wouldn't try these steps on a production machine, as having a third-party tool control Windows is not supported by Microsoft. Of course, neither are dual-boot setups, but I've never had an issue with dual-boot setups. Also, with a dual-boot setup, the operating system is not controlled by a third-party process. However, by following these instructions, you'll be allowing a third-party tool to control the environment that Windows runs in.

The bottom line is that VMWare Player can run a non-virtual installation of Windows, according to the article. I'm not going to try this myself as I don't have a spare PC available, but if anyone has tried this I would love to hear your experiences!

In summary, I've included a list of cross-compatible open source software that I currently use:

Firefox
Thunderbird
Lightning
- A Calendar application installed as a Thunderbird Extension.
Vi

Soon, I'll add VMWare to the list.

UPDATE - 7/5/2008: I have configured VMware Player on Ubuntu 8.04 to share a Windows XP VMWare image with Windows XP Pro as an enhancement to a dual-boot setup. Read more about it! It's easy to setup and configure, and you'll be glad you did!

Sunday, October 21, 2007

PHP Navigational Calendar

I'm working on a project for Edúcate Ya that involves placing a calendar on the home page. The idea is that the calendar will be used to display any public events, classes, guided trips, fund raiser events, and anything else of importance that an organization may want to advertise on a web page.

NavCalendar Application - Navigational Calendar

Each event listed on the calendar would be a hyper link to another page on the website that provides more details about the event. I ran a few Google searches to see if I could find something that would fit. There are of course plenty of HTML, JavaScript, and many other types of Calendars to choose from. However, everything I found either was way more than what I was looking for, cost money, or did not have the navigational feature I was looking for.

Many of the calendars were similar to a web-based Microsoft Outlook, which an organization could use to help employees manage group schedules. Not only did these calendars have way too many features, but also they weren't the features we were looking for.

So, I decided to build a Calendar using PHP. I chose PHP because the Edúcate Ya website was built with PHP, so the server is already configured to use PHP. I started working on this towards the end of August. Yesterday, I was able to generate -- using PHP -- a calendar with a list of events that were pulled from an XML data file.

I chose XML because the list of events for Edúcate Ya is small. However, my plan is to retrofit the application with an abstract class that will allow developers to add different data sources, such as a database. I also designed the system so that a front-end controller loads a "view" based on a parameter passed when calling the controller. Anyone who wishes to use this application could simply replace the view with their own by using the API.

The back end functionality is somewhat in place. The implementation doesn't yet fit the architecture, but I wanted to get this to market for Edúcate Ya as soon as possible. So, as many software development projects go, I took some shortcuts. The abstract class representing the data sources is still visible only on paper, and the APIs required to build the view are somewhat tightly coupled. It's not perfect, but it works for me right now.

The actual HTML and CSS need some major overhauling. That is the next step in this process. However, my hope is that the Navigational Calendar may turn out to be useful to others who wish to have a Navigational Calendar on their website. In order to realize this goal, the final step will involve making it easier to build a view. Currently, a developer would have to first build a static HTML calendar, and then integrate it with the HTML-generating PHP code so that the correct month is displayed with the correct data. This also means that I will need to throughly document everything, which is something that I haven't done much of outside of my full-time employment.

I plan to use the open source cross-platform software Gimp to apply the "chrome" for the calendar.