The identification of a few simple principles will take you a long way toward creating data bases for just about any application you want to do. Not 100%, but in the mid 90% along the lines of "everything you need to know to create data bases."
Consider the following tables:
CUSTOMERS SALES INVENTORY STATES
-------------- -------- ------------- ----------
cust# store# item# state
name pdate description statename
street :1 year quan_on_hand
street2 :1 mo unit_cost
city :1 dy reorder_point
state cust# order_quantity
zip item# mbcode
status quan_bot category
discount price :
date_of_birth
current_balance
:
These illustrate the three most common table types. They are:
Identification
Events
Translate Tables
IDENTIFICATION TABLES:
These are generally your "master files." They identify or define something. CUSTOMERS and INVENTORY are examples of "Identification" tables. Take Customers, for example: the data kept for customers is everything you know about the customer, not what they do, just what identifies them.
Identification tables are always characterized by the existence of a primary key. A primary key is a key that is unique in the table in which it occurs. CUST# is a primary key. No two records in the CUSTOMERS table have the same value for CUST#, as no two customers have the same customer number. In the INVENTORY table, ITEM# is the primary key. And again, the data that is maintained here describes what you know about a product. Not about what happens to products, just the important information to know and maintain about the product itself.
EVENTS:
Events are generally the things that happen that bring your Identification elements together. In our example case, the event is when a customer makes a purchase, so the event is telling you what customers do, and what happens to products. The data that is captured is the data that describes the event, and that data is always available to you all at once at the point of the event. The first clue that you may not have the best data base definition is if the above statement is not true for you. So events provide a "natural" table definition.
Generally, indentification records relate to event records on a one-to-many basis. In our case, for each Customer record there can be many Sales records that "belong" to it (they share the same CUST#). Also, for each Inventory record there can be many Sales records that "belong" to it (they share the same ITEM#).
Note that there is no primary key in the event table (SALES). This is quite common, even desirable. The event is often the thing you want to replicate as often as possible, and you don't want to discourage that by requiring that something must be unique about it. Some relational implementations require that there must be a primary key in every table. Our implementation is also relational, but we disagree with this requirement. If it is naturally there, fine. But to require a primary key in event tables may cause users to have to introduce some data abstraction, or in extreme cases, they must modify the way they do business in order to meet the needs of their computer system (such as having pre-printed sales tickets, which can be expensive). This is like the tail wagging the dog.
TRANSLATE TABLES:
Translate tables are generally used as a storage saving tactic. In our example, STATES is a translate table. Suppose you have hundreds of thousands or even millions of Customer records. In each record there is the requirement to store the state in which the customer lives. To store full state names would take a 13 character field.
Rather than having millions of records with a 13 character field, it makes more sense to store just the two character state code in the CUSTOMERS records, and then have a STATES table with just 51 records in it that has the full state names. The two character state code associated with each state name serves as the primary key in the STATES table.
Whenever you need the translation of state code to full state name, it is simple to get. At the point in the RELATE command where a reference to CUSTOMERS occurrs, you can immediately append the translation to it:
CUSTOMERS(STATES THRU STATE)
Translations are always a parenthetical insert following the table name that contains the code that you desire to translate. You can have more than one translate, it would look like:
tablename(translation)(translation)...
Or, for cases where the translate table has coded information in it, you can have translates within translates:
tablename(translation(translation))
Of course, it is perfectly legitimate to start with an Event table and form the relationship with Identification tables via translates. Sometimes this is the relationship you want for whatever process you have in mind, but most of the time it is not the best hierarchical relationship
for typical reporting.
Summary When we are asked for the best approach on how to create data bases, our advice is "Let the data base define itself." Look for the data the way it naturally exists, or naturally comes, as in the Event table type discussed above.
With this as the starting point, as you begin to develop applications, if a more fortunate way of storing the data exists, it will jump out at you. At that point, using standard data base maintenance techniques, it will be a simple matter to re-cast the data into this "better way."