Episode 12, Databases

Here are the show notes for episode 12.

Make sure to send us feedback so we can make the show even better.
PodCast Feed



Links:

Databases
Flat File Database
Hierarchical Database
Relational Database
Edgar F. Codd
A Relational Model of Data for Large Shared Data Banks
Turbogears Screencasts
Object Databases
MySQL
PostgreSQL
Oracle
Microsoft SQL Server
Berkeley DB
SQLite
Turbogears
SQL Injection
MyPHPMyAdmin
NaviCat


RYOS, Episode 12 - Databases



The Run Your Own Server podcast for December the sixth, 2006.

[music]
In this episode: databases, flat file, relational, object databases, and gerbil grinding.

This episode's reverse sponsor is PHP myadmin. PHP myadmin is a tool written in PHP intended to handle the administration of MySQL over the web. For more information, please visit
PHPmyadmin.net.

[music]

Announcer:
If you would like to send us feedback or have a question you'd like us to answer on the show please send an email to podcast att runyourownserver.org.

Thud:
So for databases, let's give some background to the different basic types of databases. Gek, what can you tell us about flat-file databases?

Gek:
The basic idea is the same as a spreadsheet. It's usually just a plain text file. Sometimes it's also referred to as just plain text database. The information is arranged in rows and columns, not necessarily visually, they could be broken up by commas or pipes.

Seg:
One example of a flat-file database would be a host file on any kind of Linux UNIX system or any file that has that kind of configuration data. In essence that is a flat file. If you look in a host file, usually you are going to see an IP address, and then one or more tabs, and then one or more host names. In essence that is a flat file database; it's a really good example of it.

Gek:
Absolutely, and its easy to work with, its human readable, that's one of the reasons why they're commonly used. A spreadsheet, something that isn't necessarily human-readable can also be a flat-file database. It's just where you have rows and columns and it's assumed that everything that's in the same column in the different rows is the same type of data.

Thud:
Gek, I know I'm going to mispronounce this and I'm sure a bunch of geeks out there are going to get mad at me for doing this, but this is the best I can do. What can you tell me about hierarchical databases?

Gek:
Basically the idea is, you think of a tree. You got something that starts at the root and then branches out. The pieces that are on the branches are related to each other, but not necessarily related to things on other levels, or on the same level of other branches. A really good example for this is, "XML." You can stick anything inside an XML container that you want, and it's assumed that the relationship is the thing that contains all of those items, not the fact that they're all on the same level as things in other containers.

Thud:
I guess what you mean is, if you have two different branches, and they both run 10 levels deep, than data on one branch at the 10th level isn't related to data on the other branch at the 10th level. But, they are related to themselves within their own branch. Is that the proper way to explain it?

Gek:
Right. Each branch what's on the 10th level is assumed that it's related, but it wouldn't be the same as the other branch, correct.

Thud:
Gotcha, OK. What's a relational database?

Seq:
A relational database is the same basic idea as a flat file, although it's not stored the same. It's a bunch of rows and columns in tables, and the main difference is that in a relational database there are relations between different fields, and different tables. So, you can have a User ID in one table, that would be like the user table, with all the user information, password, and things like that, and than another table that has a relation to that User ID. A very common example of this is, just about any sequel database is considered a relational database.

Thud:
I'd like to throw something in here. Relational database sounds like a whole lot of flat file databases, but in non-readable human format. The relationships you were talking about - let's say a User ID in one table, corresponding to some other piece of information in another table. The relationship isn't within the actual SQL database it's stored logically or theoretically in the application. The application says, pull this information from this table about this user ID and I know it means something else on this other table. But because it holds more than one table it's not a flat-file database.

Seq:
Exactly. You can kind of think of each table in a relational database as a separate flat-file database and they're kind of all put together in one big database.

Gek:
One thing also that I found out while researching for the show is that the term relational database was actually developed by a guy named Codd, I believe he worked for IBM and wrote this paper defining relational databases.

The paper that he wrote about relational databases actually defined some very specific rules and he has raised a stink about people using that term because what's currently accepted as relational databases don't match his criteria -- things like MySQL and Postgres. They match most of the rules, but not all of his rules.

Thud:
Ok, Gek, can you give us a description of what an object-database is?

Gek:
Well an object-database isn't really so much a database or doesn't have to be specifically a database as it is a way of interacting with databases.

The one I'm most familiar with, that I've played around with is TurboGears. TurboGears uses SQL objects to bring the database in as part of the programming language. You can access elements in a row just by writing a function that calls those elements and you can sit there and execute methods on it and work with it like its part of the program instead of just making queries and pulling in the data. It's a pretty neat concept but I'm not sure if there's a future for it becoming its own thing or if it's just going to stay as an interface.

If you haven't looked at it, there are some videos on the TurboGears web site that kind of display what it is we're talking about.

Seg:
I'd like to talk a little about common databases versus recommended databases. Some common databases that you find out on the Web or Net nowadays are MySQL, PostgreSQL, Oracle and Microsoft SQL. Now of those, I personally can recommend MySQL. I cannot recommend the rest of them although I'm sure PostgreSQL is a nice database, I just don't have any experience with it.

It seems to me, at least in my experience, most big websites boast big, big, big applications with millions and millions of daily users getting information. People either go with Microsoft's MySQL or Oracle, I'm not really sure why because I don't know that they're any better or worse when you get up to that kind of traffic. I've heard people talk about them and say "Well, Oracle can handle a lot more traffic than MySQL can". I don't know if they can or not. Maybe it can, it's a monster of a database program. But I've never had a problem with traffic on MySQL.

I almost want to think that big-money database programs like Oracle and Microsoft SQL get used so much because they are big-money. People who are not necessarily very tech-savvy are making the calls on that. They run into this a lot; they say well it's really expensive so it must be good. But as far as recommendations on either side of the fence, what do you think Gek?

Gek:
I personally only have experience with MySQL and I have to agree with you, for everything I've ever done MySQL is plenty. I'm also not sure what the performance specs are in Oracle, but, yeah, you do see a lot where people read about something in a magazine and say, "Oh, I have to have that."

Seg:
Yeah, yeah. What about you Thud?

Thud:
I pretty much agree with that. MySQL is what I use for all of my projects. I use it everywhere I can. I do have a little bit of experience dealing with Oracle and from what I can tell Oracle isn't so much better at doing queries per second as MySQL or any of the other databases. Oracle is really good with large datasets. If you have a 500-terabyte database, Oracle is definitely the way do go. But most people don't have that. Most people are lucky if they have 100 megs they have to deal with. And I have seen 1-terabyte MySQL databases run just fine.

Seg:
One important thing to remember when working with databases, and I can only speak to this as it pertains to MySQL, but database programs keep a lot of their database in RAM, and in fact, if it can keep the entire thing in RAM, it will. So if you're building a database box, RAM is very important. You're going to need a lot of hard drive storage if you've got a large database, but RAM is very important. RAM is more important on a database box, than it is on any other kind of box, be it an application box, webhead, anything else, NFS server. RAM is most important on a SQL box.

Thud:
Yeah, I have to agree with that. Whenever I'm doing a large scale of MySQL I always make sure I have at least as much RAM as I need to hold my entire database. So, if I have a 4GB database, I try to have 5-6 Gigs of RAM, because anytime I hit the swap file when I start running short of RAM, I'm just slowing down the database, and I try to avoid that.

OK, so we've been talking about some of the common Internet databases. There are also two common local databases - MySQL, Microsoft SQL, Oracle, PostgresSQL, they are all kind of used for web applications and things like that, where you're connecting to them over the Internet.

There are two other kinds of databases that are really for local file systems. There's an old one called Berkeley-DB that is used by a lot of local admin applications to store configurations in a database. It's kind of being phased out and replaced with SQL-lite which is the same basic idea. Its stored locally, there's not a network component to it at all, but it does talk SQL so its very easy to get an application working with that and then moving on to some other form of SQL, whether its MySQL or PostgreSQL or whatever. Have either one of you ever played with either one of those at all?

Gek:
The TurboGear stuff that I've worked with, like you said I started on SQL-lite. Eventually I do plan on figuring out how to make the transition from SQL-lite to MySQL. I don't know if TurboGear has some mechanism built in for that. SQL-lite is nice because it's fast since it's local. You're not fighting with network traffic and you can play with the development database with "dummy" data on your local SQL-lite instance, and then you know presumably copy it over to MySQL once you're ready to make it go live.

Thud:
Now, even though we're not covering TurboGears, I will throw this in. I happen to know that the transition from SQL-lite to MySQL is one configuration change. You just tell it using MySQL and TurboGears takes care of the rest.

Gek:
Well, you still have to do like the SQL create thing. I just don't know if you can migrate data over is what...

Thud:
Well you - there's actually dump scripts that do that separately. But, when you're going from like development on SQL-lite to production on MySQL, you still have to do the SQL create thing, but you don't have to change any of your code other than to tell it you're now using MySQL.

Gek:
That's pretty cool.

Thud:
Yeah, that's one of the coolest things about TurboGears.

OK, so for this episodes moment of Sec, we're going to talk about database security. What are some of the things we need to worry about with database security?

Seg:
One of the things you have to worry about with database security is - making sure that regular users, out on the Internet, cannot connect to your database directly. The only things that should be connecting to your database are your application. And what I mean by your application is - your website, is what usually people use for. But, there can be other things connected to databases that aren't websites.

You should make sure that the database you're using, is only listening on the IP's that it must listen on. And, under the best case scenario, you're listening on either the local host IP, which means that your application is running on the same box and connecting to it directly. Or, if you're doing a separation configuration, where you're separating the application database boxes, than you should set up a private network, with a private IP address at 10.172, or 196 and have the database only listened on that IP address, and so only the application can get to it directly. If for some reason you're in an environment where it has to listen on all IP addresses, or it has to listen on an IP address that's public - firewall it. Use a hardware firewall, use a software firewall like a packet filter, or PF, IP forward, IP tables, IP chains you know, anything, to block people from directly accessing - accessing it from the internet.

Thud:
OK, Gek, what are your suggestions for security for a database.

Gek:
I like what Seg was saying. I've seen in Postgres a configuration file where you can change to only allow specific users, from specific IP addresses. I'm not sure if MySQL has something similar, but that kind of granularity is great because than you can limit the access for different users. If you were using one database and three web heads you could use a different user for each web head. That way you could tell if somebody was connecting and some user did something funky you would know which web head had been popped. But there are also lots of other things you can do with that. Restrict which developers can actually touch the database, which ones can only touch the web heads. It just depends on what you are trying to protect. But I like having that sort of security in place.

Seg:
That kind of stuff does exist in MySQL where you can specify; let's say, user www @ IP address whatever. That kind of granularity is available in mySQL, I use that all the time. It is a great way to add security. I don't want to say it should be your online security, but it a great way to add security to your solution.

Thud:
I'd just like to add that you also have to think about encryption. If for some reason, your application does have to connect to your database server over a public network, just about every database I know of now supports some version of encryption, whether it's SSL or something built in. So you should definitely use that. The other thing you should do is make sure your application, if you can, should be encrypting any important data. You shouldn't be storing passwords in the database in plain text - they should be encrypted. Any other information - if it's a shopping application and its got credit card information in it, or personal information, mailing addresses, phone numbers, things like that - all of those fields in the database should be encrypted if your application supports it.

Now for the closing thoughts. Gek, what are your closing thoughts?

Gek:
Databases are a very complicated subject, but I think its something that every Sys Admin really does need to delve into a little bit. You're not going to learn everything, unless you just focus on one database. You're really not going to learn enough to fill the place of a DBA. But, if you know good general skills like; how to add users, how to change permissions, that sort of thing, that can really take you pretty far, and when you need to bring in a DBA, you'll know when that time comes. So I just, I think the databases - you just have to know how they work as a SysAdmin to support them. You really do need to understand what the application is doing.

Seg:
I just wanted to also add that it's important for SysAdmins to understand what SQL injections are. I know that we haven't gone over that at all in this episode, but if you're interested in securing your database, and your entire application from SQL injections, just do a Google Search on, "SQL injections." Those kinds of injection attacks can happen on a lot of different types of applications, not just databases. So, you definitely need to be aware of that when designing your application and writing the code, it's a very, very important thing to be mindful of.

Thud:
I would also like to point out that even though a lot of the database admin stuff that you read about on the internet for setting up databases, adding tables, doing users, is all from the command line, there are plenty of applications out there like phpMyAdmin or NaviCat or a variety of other programs that you can use to connect to your database to manage it and not just a particular database, but the whole database server, so you can do user adds and things like that, all with a pretty easy to understand and follow GUI. I found they've really helped me out in doing large-scale data conversion or bringing an application online, and they also have, at least all the ones I've used, also have built-in backup and restore functionalities that makes managing that kind of stuff a lot easier.

[music]

Announcer:
For show notes or other details, please visit our website at RunYourOwnServer.org. If you would like to send us feedback or have a question you'd like us to answer on the show, please send an email to podcast att runyourownserver.org. The intro music, I Like Caffeine, is by Tom Cody. This song, Down the Road, is by Rob Costlow. Please visit our website for links to their websites. This podcast is covered under a creative commons license. Please visit our website for more details.


Transcription by CastingWords