Episode 12, Databases
Make sure to send us feedback so we can make the show even better.
Links:
Databases
Flat File Database
Hierarchical Database
Relational Database
Edgar F. Codd
A Relational Model of Data for Large Shared Data Banks
Turbogears Screencasts
Object Databases
MySQL
PostgreSQL
Oracle
Microsoft SQL Server
Berkeley DB
SQLite
Turbogears
SQL Injection
MyPHPMyAdmin
NaviCat
RYOS, Episode 12 - Databases
The Run Your Own Server podcast for December the
sixth, 2006.
[music]
In
this episode: databases, flat file,
relational, object databases, and gerbil grinding.
This episode's reverse sponsor is PHP myadmin. PHP
myadmin is a tool written in PHP intended to handle
the administration of MySQL over the web. For more
information, please visit
PHPmyadmin.net.
[music]
Announcer: If you would like to
send us feedback or have a question you'd like us
to answer on the show please send an email
to podcast
att runyourownserver.org.
Thud: So for databases, let's
give some background to the different basic types
of databases. Gek, what can you tell us about
flat-file databases?
Gek: The basic idea is the
same as a spreadsheet. It's usually just a plain
text file. Sometimes it's also referred to as just
plain text database. The information is arranged in
rows and columns, not necessarily visually, they
could be broken up by commas or pipes.
Seg: One example of a
flat-file database would be a host file on any kind
of Linux UNIX system or any file that has that kind
of configuration data. In essence that is a flat
file. If you look in a host file, usually you are
going to see an IP address, and then one or more
tabs, and then one or more host names. In essence
that is a flat file database; it's a really good
example of it.
Gek: Absolutely, and its
easy to work with, its human readable, that's one
of the reasons why they're commonly used. A
spreadsheet, something that isn't necessarily
human-readable can also be a flat-file database.
It's just where you have rows and columns and it's
assumed that everything that's in the same column
in the different rows is the same type of data.
Thud: Gek, I know I'm going
to mispronounce this and I'm sure a bunch of geeks
out there are going to get mad at me for doing
this, but this is the best I can do. What can you
tell me about hierarchical databases?
Gek: Basically the idea is,
you think of a tree. You got something that starts
at the root and then branches out. The pieces that
are on the branches are related to each other, but
not necessarily related to things on other levels,
or on the same level of other branches. A really
good example for this is, "XML." You can stick
anything inside an XML container that you want, and
it's assumed that the relationship is the thing
that contains all of those items, not the fact that
they're all on the same level as things in other
containers.
Thud: I guess what you mean
is, if you have two different branches, and they
both run 10 levels deep, than data on one branch at
the 10th level isn't related to data on the other
branch at the 10th level. But, they are related to
themselves within their own branch. Is that the
proper way to explain it?
Gek: Right. Each branch
what's on the 10th level is assumed that it's
related, but it wouldn't be the same as the other
branch, correct.
Thud: Gotcha, OK. What's a
relational database?
Seq: A relational database
is the same basic idea as a flat file, although
it's not stored the same. It's a bunch of rows and
columns in tables, and the main difference is that
in a relational database there are relations
between different fields, and different tables. So,
you can have a User ID in one table, that would be
like the user table, with all the user information,
password, and things like that, and than another
table that has a relation to that User ID. A very
common example of this is, just about any sequel
database is considered a relational database.
Thud: I'd like to throw
something in here. Relational database sounds like
a whole lot of flat file databases, but in
non-readable human format. The relationships you
were talking about - let's say a User ID in one
table, corresponding to some other piece of
information in another table. The relationship
isn't within the actual SQL database it's stored
logically or theoretically in the application. The
application says, pull this information from this
table about this user ID and I know it means
something else on this other table. But because it
holds more than one table it's not a flat-file
database.
Seq: Exactly. You can kind
of think of each table in a relational database as
a separate flat-file database and they're kind of
all put together in one big database.
Gek: One thing also that I
found out while researching for the show is that
the term relational database was actually developed
by a guy named Codd, I believe he worked for IBM
and wrote this paper defining relational databases.
The paper that he wrote about relational databases
actually defined some very specific rules and he
has raised a stink about people using that term
because what's currently accepted as relational
databases don't match his criteria -- things like
MySQL and Postgres. They match most of the rules,
but not all of his rules.
Thud: Ok, Gek, can you give
us a description of what an object-database is?
Gek: Well an object-database
isn't really so much a database or doesn't have to
be specifically a database as it is a way of
interacting with databases.
The one I'm most familiar with, that I've played
around with is TurboGears. TurboGears uses SQL
objects to bring the database in as part of the
programming language. You can access elements in a
row just by writing a function that calls those
elements and you can sit there and execute methods
on it and work with it like its part of the program
instead of just making queries and pulling in the
data. It's a pretty neat concept but I'm not sure
if there's a future for it becoming its own thing
or if it's just going to stay as an interface.
If you haven't looked at it, there are some videos
on the TurboGears web site that kind of display
what it is we're talking about.
Seg: I'd like to talk a
little about common databases versus recommended
databases. Some common databases that you find out
on the Web or Net nowadays are MySQL, PostgreSQL,
Oracle and Microsoft SQL. Now of those, I
personally can recommend MySQL. I cannot recommend
the rest of them although I'm sure PostgreSQL is a
nice database, I just don't have any experience
with it.
It seems to me, at least in my experience, most big
websites boast big, big, big applications with
millions and millions of daily users getting
information. People either go with Microsoft's
MySQL or Oracle, I'm not really sure why because I
don't know that they're any better or worse when
you get up to that kind of traffic. I've heard
people talk about them and say "Well, Oracle can
handle a lot more traffic than MySQL can". I don't
know if they can or not. Maybe it can, it's a
monster of a database program. But I've never had a
problem with traffic on MySQL.
I almost want to think that big-money database
programs like Oracle and Microsoft SQL get used so
much because they are big-money. People who are not
necessarily very tech-savvy are making the calls on
that. They run into this a lot; they say well it's
really expensive so it must be good. But as far as
recommendations on either side of the fence, what
do you think Gek?
Gek: I personally only have
experience with MySQL and I have to agree with you,
for everything I've ever done MySQL is plenty. I'm
also not sure what the performance specs are in
Oracle, but, yeah, you do see a lot where people
read about something in a magazine and say, "Oh, I
have to have that."
Seg: Yeah, yeah. What about
you Thud?
Thud: I pretty much agree
with that. MySQL is what I use for all of my
projects. I use it everywhere I can. I do have a
little bit of experience dealing with Oracle and
from what I can tell Oracle isn't so much better at
doing queries per second as MySQL or any of the
other databases. Oracle is really good with large
datasets. If you have a 500-terabyte database,
Oracle is definitely the way do go. But most people
don't have that. Most people are lucky if they have
100 megs they have to deal with. And I have seen
1-terabyte MySQL databases run just fine.
Seg: One important thing to
remember when working with databases, and I can
only speak to this as it pertains to MySQL, but
database programs keep a lot of their database in
RAM, and in fact, if it can keep the entire thing
in RAM, it will. So if you're building a database
box, RAM is very important. You're going to need a
lot of hard drive storage if you've got a large
database, but RAM is very important. RAM is more
important on a database box, than it is on any
other kind of box, be it an application box,
webhead, anything else, NFS server. RAM is most
important on a SQL box.
Thud: Yeah, I have to agree
with that. Whenever I'm doing a large scale of
MySQL I always make sure I have at least as much
RAM as I need to hold my entire database. So, if I
have a 4GB database, I try to have 5-6 Gigs of RAM,
because anytime I hit the swap file when I start
running short of RAM, I'm just slowing down the
database, and I try to avoid that.
OK, so we've been talking about some of the common
Internet databases. There are also two common local
databases - MySQL, Microsoft SQL, Oracle,
PostgresSQL, they are all kind of used for web
applications and things like that, where you're
connecting to them over the Internet.
There are two other kinds of databases that are
really for local file systems. There's an old one
called Berkeley-DB that is used by a lot of local
admin applications to store configurations in a
database. It's kind of being phased out and
replaced with SQL-lite which is the same basic
idea. Its stored locally, there's not a network
component to it at all, but it does talk SQL so its
very easy to get an application working with that
and then moving on to some other form of SQL,
whether its MySQL or PostgreSQL or whatever. Have
either one of you ever played with either one of
those at all?
Gek: The TurboGear stuff
that I've worked with, like you said I started on
SQL-lite. Eventually I do plan on figuring out how
to make the transition from SQL-lite to MySQL. I
don't know if TurboGear has some mechanism built in
for that. SQL-lite is nice because it's fast since
it's local. You're not fighting with network
traffic and you can play with the development
database with "dummy" data on your local SQL-lite
instance, and then you know presumably copy it over
to MySQL once you're ready to make it go live.
Thud: Now, even though we're
not covering TurboGears, I will throw this in. I
happen to know that the transition from SQL-lite to
MySQL is one configuration change. You just tell it
using MySQL and TurboGears takes care of the rest.
Gek: Well, you still have to
do like the SQL create thing. I just don't know if
you can migrate data over is what...
Thud: Well you - there's
actually dump scripts that do that separately. But,
when you're going from like development on SQL-lite
to production on MySQL, you still have to do the
SQL create thing, but you don't have to change any
of your code other than to tell it you're now using
MySQL.
Gek: That's pretty cool.
Thud: Yeah, that's one of the
coolest things about TurboGears.
OK, so for this episodes moment of Sec, we're going
to talk about database security. What are some of
the things we need to worry about with database
security?
Seg: One of the things you
have to worry about with database security is -
making sure that regular users, out on the
Internet, cannot connect to your database directly.
The only things that should be connecting to your
database are your application. And what I mean by
your application is - your website, is what usually
people use for. But, there can be other things
connected to databases that aren't websites.
You should make sure that the database you're
using, is only listening on the IP's that it must
listen on. And, under the best case scenario,
you're listening on either the local host IP, which
means that your application is running on the same
box and connecting to it directly. Or, if you're
doing a separation configuration, where you're
separating the application database boxes, than you
should set up a private network, with a private IP
address at 10.172, or 196 and have the database
only listened on that IP address, and so only the
application can get to it directly. If for some
reason you're in an environment where it has to
listen on all IP addresses, or it has to listen on
an IP address that's public - firewall it. Use a
hardware firewall, use a software firewall like a
packet filter, or PF, IP forward, IP tables, IP
chains you know, anything, to block people from
directly accessing - accessing it from the
internet.
Thud: OK, Gek, what are your
suggestions for security for a database.
Gek: I like what Seg was
saying. I've seen in Postgres a configuration file
where you can change to only allow specific users,
from specific IP addresses. I'm not sure if MySQL
has something similar, but that kind of granularity
is great because than you can limit the access for
different users. If you were using one database and
three web heads you could use a different user for
each web head. That way you could tell if somebody
was connecting and some user did something funky
you would know which web head had been popped. But
there are also lots of other things you can do with
that. Restrict which developers can actually touch
the database, which ones can only touch the web
heads. It just depends on what you are trying to
protect. But I like having that sort of security in
place.
Seg: That kind of stuff does
exist in MySQL where you can specify; let's say,
user www @ IP address whatever. That kind of
granularity is available in mySQL, I use that all
the time. It is a great way to add security. I
don't want to say it should be your online
security, but it a great way to add security to
your solution.
Thud: I'd just like to add
that you also have to think about encryption. If
for some reason, your application does have to
connect to your database server over a public
network, just about every database I know of now
supports some version of encryption, whether it's
SSL or something built in. So you should definitely
use that. The other thing you should do is make
sure your application, if you can, should be
encrypting any important data. You shouldn't be
storing passwords in the database in plain text -
they should be encrypted. Any other information -
if it's a shopping application and its got credit
card information in it, or personal information,
mailing addresses, phone numbers, things like that
- all of those fields in the database should be
encrypted if your application supports it.
Now for the closing thoughts. Gek, what are your
closing thoughts?
Gek: Databases are a very
complicated subject, but I think its something that
every Sys Admin really does need to delve into a
little bit. You're not going to learn everything,
unless you just focus on one database. You're
really not going to learn enough to fill the place
of a DBA. But, if you know good general skills
like; how to add users, how to change permissions,
that sort of thing, that can really take you pretty
far, and when you need to bring in a DBA, you'll
know when that time comes. So I just, I think the
databases - you just have to know how they work as
a SysAdmin to support them. You really do need to
understand what the application is doing.
Seg: I just wanted to also
add that it's important for SysAdmins to understand
what SQL injections are. I know that we haven't
gone over that at all in this episode, but if
you're interested in securing your database, and
your entire application from SQL injections, just
do a Google Search on, "SQL injections." Those
kinds of injection attacks can happen on a lot of
different types of applications, not just
databases. So, you definitely need to be aware of
that when designing your application and writing
the code, it's a very, very important thing to be
mindful of.
Thud: I would also like to
point out that even though a lot of the database
admin stuff that you read about on the internet for
setting up databases, adding tables, doing users,
is all from the command line, there are plenty of
applications out there like phpMyAdmin or NaviCat
or a variety of other programs that you can use to
connect to your database to manage it and not just
a particular database, but the whole database
server, so you can do user adds and things like
that, all with a pretty easy to understand and
follow GUI. I found they've really helped me out in
doing large-scale data conversion or bringing an
application online, and they also have, at least
all the ones I've used, also have built-in backup
and restore functionalities that makes managing
that kind of stuff a lot easier.
[music]
Announcer: For show notes or other
details, please visit our website at
RunYourOwnServer.org. If you would like to
send us feedback or have a question you'd like us
to answer on the show, please send an email
to podcast
att runyourownserver.org. The intro music, I
Like Caffeine, is by Tom Cody. This song, Down the
Road, is by Rob Costlow. Please visit our website
for links to their websites. This podcast is
covered under a creative commons license. Please
visit our website for more details.
Transcription
by
CastingWords

