Data Base Definition

Data Base Definition Data Base Definition

The Data Base Definition (DBD) is where you give Thunderbolt the "picture" of the data base.

The DBD is entered with a Text Editor and is saved as a pure ASCII text file, not a word processing file. You can use any text editor to enter the DBD. We recommend that you use what we provide, as the DBD is a tab-delimited file, and the assists that we provide when entereing a DBD honors those tab settings. The tabs are not required, they are just recommended.

Once the DBD is entered and saved, it is "compiled." The compile process analyzes the DBD text and creates the files that house the data base.

The Data Base Definition consists of:

1. The name of the data base.
2. Optional clauses, if any.
3. The name and size of each Table in the data base.
4. The description of the fields in each Table.
5. UNIQUE clauses, if any.

The DBD Syntax Skeleton:

DATA BASE NAME IS data-base-name

[SUPPRESS GI]
[DBOPTION option-letters]
[FILE TREE ON full-path-folder-name]
[FILE LIST ON full-path-folder-name]

TD tablename nnnnnn RECORDS [Temporary]
data-definition-statement
: (as many Table defns as required)
data-definition-statement

[UNIQUE fieldname count] (as many Unique stmnts as required)

The following hyperlinks take you to the indicated discussions, the syntax rules and discussion of each clause of the DBD occurs below.

An example: The DBD for the Example Data Base
Creating the Data Base Definition.
Compiling the Data Base Definition.

Syntax Rules

When entering text for the DBD, the following rules apply:

* At least one space between words (we use the tab)
* One statement per line
* No periods at the end of any line
* Blank lines, indentations, and extra spacing are allowed
* Upper or lower case allowed, that is, DBDs are case insensitive
* The @ sign indicates that everything after it is a comment, all
text after an @ sign is ignored by the DBD compile process.

The DATA BASE NAME IS statement

Syntax:
DATA BASE NAME IS data-base-name

This statement must come firstin the Data Base Definition.

This establishes the internal name of the data base. All Application Interfaces (AIs) refer to this name in order to establish their connection. When you compile the DBD, the output files that are created employ the first 8 characters of this name plus a 3 character extension.

The Optional Clauses

The following optional clauses, SUPPRESS, DBOPTION, and FILE, are placed after the DATA BASE NAME IS statement, and before the first Table Definition.

SUPPRESS GI clause

Syntax:
SUPPRESS GI

Without this clause the DBD compile process generates a General Interface (GI) that provides access to the entire data base. The name of this GI is the same as the first 8 characters of the data base name (with no extension).

When you are generating your own Application Interfaces (AIs) you have no particular need for a GI. The use of the SUPPRESS GI clause prevents the generation of a GI at compile time, and thus restricts access only to those who know the whereabouts of an AI.

DBOPTION clause

Syntax:
DBOPTION option-letter(s)

The purpose for the DBOPTION clause is to allow the data base developer to specify tuning or construction options. The only option defined so far is the U option which affects the size of an internal structure that we call the "update strand." Basically, if your data base is larger than a half million records, it is to your advantage to specify DBOPTION U as one of your optional clauses.

FILE clause

Syntax:
FILE TREE ON full-path-folder-name
FILE LIST ON full-path-folder-name

If you have more than one disk storage drive you can use the FILE clause to get drive separation of the three most often accessed files of a data base.
Drive separation reduces wear and tear on disk drives, and can improve overall efficiency. When we say drive separation we really mean head separation, if you have two drive letters mapped to the same physical drive it does you no good to use the FILE clause.

The three largest and most often accessed files of a Thunderbolt data base are the data space, the tree space, and the list space. The data space is going to be on the same drive with the DBD, but you can use the FILE clause to specify different drives for the tree and/or list space, or both if you have three drives.

When you specify the drive and folder for the tree or list (or both) files, you can use the UNC (Universal Naming Convention). Thus you avoid problems if your installation is the type that has different drive letter mappings for different workstations.

TABLE DEFINITIONS

Table Definitions are where you provide the "data picture" of each of the data record types in your data base.

Table definitions begin with the TD line, where you declare the name of the table, the number of data records that this table contains, and whether this table is permanent or temporary. Temporary tables are discussed further below.

Syntax:
TD tablename nnnn RECORDS [TEMPORARY]
data-definition-statement
:
data-definition-statement

And a data-definition-statement contains an optional level number, the name of the field, the field type, the size of the field in characters, and whether the field is keyed. And if the field is keyed, whether or not the key is to be case sensitive. The syntax is:

[level] fieldname field-description [key [sensitive]]

Example:

TD PEOPLE 200000 Records
name x(30) key
street x(30)
city x(15) key
state x(2) key
zip x(9) key
age n(3) key
gender x(1) key
hair_color x(10) key

TD is the syntactical element that says "what follows is a Table Definition."

PEOPLE is the name of the table. So from now on you will think of the records in this table as your "people" records.

200000 RECORDS tells how many records the PEOPLE table can contain.

NAME, CITY, AGE, and so on, are field names. Data records consist of one or more fields of data that are named and described.

X(30) specifies that the data in the field is alphanumeric (letters, numbers, and special characters), and that the maximum number of characters that can be entered into this field is 30.

N(3) specifies that the data in the field is numeric (only numbers, sign, and decimal point allowed), and that the maximum number of digits allowed for this field is 3.

KEY tells us that this field will be used in queries such as done with the FIND command.

Note the underscore character in HAIR_COLOR. Since field names cannot have blanks in them, underscore is used in place of a blank. Note that it is the underscore and not a dash.

Now for a little More Fomality

NAMES

Tablenames and Fieldnames
Tablenames and fieldnames can be up to 30 characters in length, but only the first 12 characters of names are kept internally, so make sure all names are unique from one another by the twelfth character.

Two tables in the same data base cannot have the same name. Within a table, two fields cannot be named the same. But two different tables can share common field names, these are global fields.

Global Fields
In the Example Data Base, note that CUST# is in both the CUSTOMERS and SALES tables. Any field name that exists in more than one table is known as a "global field." If it is a key, then it is called a "global key."

The natural relationships that exist between tables are determined by global keys. For example, all of the records in the data base associated with a particular customer share the same value for CUST#. You expect that exactly one of those records is in the CUSTOMERS table, the rest are the sales records in the SALES table.

Common fields supply the linking mechanism between naturally allied data.

The following examples of this can be found in the Example Data Base:

CUSTOMERS and SALES share the common key CUST#
CUSTOMERS and STATES share the common key STATE
SALES and INVENTORY share the common key ITEM#
INVENTORY and CATEGORIES share the common key TYPE
INVENTORY and ORDERS share the common key ITEM#
ORDERS and VENDORS share the common key VEND#
etc.

CUST#, STATE, ITEM#, TYPE, and VEND# are global keys.

Common fields must have a common description as well as the same name.

LEVEL NUMBERS

Levels are a way to break a field down into its component parts.

NAME x(31) KEY
:1 LAST x(15) KEY
:1 FIRST x(15)
:1 INITIAL x(1)
ADDRESS x(30)

Here we supply a breakdown for NAME into its three component parts LAST, FIRST, and INITIAL.

Any reference to the field called NAME refers to a 31 character field, as it is the sum of the parts (15) + (15) + (1).

The scope of any field in a data definition is delimited when the next field at the same level is encountered.

For example NAME is at the zero level. ADDRESS is the next field encountered that is at the same level as NAME. All of the fields described between them are the component parts of NAME. Thus, NAME is a 31 character field.

Grouping and Elementary Fields

In the example above, NAME is a "grouping" field, that is, a field that is subsequently broken down into its component parts. The breakdown is done with level numbers.

The fields LAST, FIRST, and INITIAL are elementary fields since they have no further breakdown under them.

Another example of the use of level numbers is

CLIENT# x(15) key
:1 SLSMN x(3) key
:1 CID x(4) key
:1 CDATE date key
:2 YEAR x(4) key
:4 CN x(2)
:4 YR x(2) key
:2 MO x(2) key
:2 DY x(2)
ADDRESS ...

In the above, CLIENT#, CDATE, and YEAR are grouping fields and the others are elementary fields. In this example, the CLIENT# is composed of the salesmans number who brought in the client's business, the client's ID number, and the date when the client was first put on the books.

A CLIENT# might be 404A23619980615 and breaks down as

SLSMN 404
CID A236
YEAR 1998
MO 06
DY 15

When you prepare bills you might use the CLIENT#. When you prepare commission statements you might use the SLSMN field. You might use the YEAR field to figure total income to date from a client, and so on.

Notice in the above that you do not have to use level numbers absolutely sequentially. Notice that the breakdown for YEAR (century and two-chr year) was done with :4 instead of :3. It is OK to skip level numbers as long as you are consistent. You can use numbers up to 99 for level numbers.

The use of "Alias" field names

An "alias" is simply an alternate name for a field. An alias is created by using level numbers. Consider:

CUST# x(6) KEY
:1 ACCOUNT x(6)
NAME ...

In this example the field named ACCOUNT does not really break CUST# into component parts, it is accounting for the same six characters as CUST# is, so what we have accomplished is two names for the same six characters.

It is handy to have a "local" alias for global fields. In this way, when you are doing reports, you can better control the appearance of the output.

For example, lets take a simple print statement involving data from CUSTOMERS and SALES which share the global field CUST#

PRINT A CUST# NAME DESCR

and you would see:

CUST# NAME DESCR
--------- -------------- ---------------
A05072 SMITH, BOB CLOCK
A05072 AUTOMOBILE
A05072 JET PLANE
A05072 CAMPER TRUCK

Below is the same report that makes use of the local alias for CUST# (the alias is "ACCOUNT") in the CUSTOMERS table. That is, this alias for CUST# is only known in the one table, so it makes it easy to make an unambiguous reference for the data to be retrieved only when it comes from the CUSTOMERS table. The fact that the alias is only known to one table is why we call it a "local" alias.

PRINT A ACCOUNT NAME DESCR

and you see:

ACCOUNT NAME DESCR
----------- -------------- ---------------
A05072 SMITH, BOB CLOCK
AUTOMOBILE
JET PLANE
CAMPER TRUCK

The redundant occurrences of the CUST# are eliminated from the output.

A good habit to develop is to always define a local alias for primary keys because sooner or later you will need it. For more information about primary keys see "Tips on Defining Data Bases ."

We probably should never say never, but we will say that we can't think of very many reasons why you would want to key an alias name when the original name is already declared as a key. When the original name is a key, and then you key the alias, you will be paying the overhead for two key index structures for the same data field.

KEYS

You should only key fields that you know will be used in queries, that is, in the FIND and MATCH commands. The SORT and RELATE commands do not make use of key index structure, so do not key a field just because you are going to be sorting by it. Only key fields that are used in FIND and MATCH .

Do not key MEMO fields. They are automatically queryable without having to declare them as keys.

Field types X, N, DATE, I, ID, F, FD can be declared as keys. There is no limit to the number of fields you can declare as keys. Also it is legitimate to have a table with no keys declared.

The maximum length for a field declared as a key is 255 characters.

Unless you specify otherwise, keys are case-insensitive. That means it finds answers regardless of whether the data is upper case, lower case, or a mixture of the two, and regardless of whether the query was expressed in upper case, lower case or a mixture of the two.

It is very rare that you would want to declare a key to be case sensitive. One example of data content that you would want to declare as case sensitive is chemical symbols. To declare a key as case sensitive just add the word SENSITIVE after the word KEY in the data definition statement.

DATA TYPES

Data types are:

X -Alphanumeric. Letters, numbers, all special characters.
N -Numeric. Only numbers, sign, and decimal point allowed.
Date -Special data type for dates. See DATEs for more info about this.
Doc -Special data type for documents. See DOCs for more info.
Memo -Special data type for memos. See MEMOs for more info.
Image-Ignore this for now. See IMAGEs for more info about this.
I -Binary short integer (16 bits).
ID -Binary long integer (32 bits).
F -Single precision floating point.
FD -Double precision floating point.

Examples:

x(1) -A one character alphanumeric field.
x(30) -A 30 character alphanumeric field.
x(500) -A 500 character alphanumeric field.
The maximum length for Type X fields is 4088 characters.

N(1) -A single digit numeric field.
N(12) -A 12 digit numeric field.
N(4.2) -6 digit numeric field composed of 4 interger and 2 fractional positions
N(9.2) -11 digit number field composed if 9 integer and 2 fractional positions.
N(3.7) -10 digit number field composed of 3 integer and 7 fractional positions.
The maximum length for Type N fields is 15 digits.

I -Can contain numbers up to plus or minus 32,767.
ID -Can contain numbers up to plus or minus 2,147,483,647.
F -Accommodates 7 significant digits, maximum exponent is +/- 38
FD -Accommodates 15 significat digits, maximum exponent is +/- 308
No parenthesis are used around binary data types as their sizes are known.

MEMO -Declares a memo field.
DOC -Declares a document field.
IMAGE -Do not use this yet.
No parenthesis are used for these data types.

When to Choose Type X or N for Numeric Data

In general, type N fields should be fields that you plan to use in calculations (such as quantities, prices, etc.) or fields that you want treated numerically (such as counts, age, etc.)

But there are many times when fields contain only numeric data but you should declare then as type X. Some examples are:

Zip codes. Phone numbers. Identifying fields such as account numbers, social security numbers, SIC codes, and so forth.

There are two good reasons to declare these kinds of fields as type X rather than type N:

-Type X fields lend themselves to the use of the SV and AV operators
in queries. Type N fields do not. This is a powerful capability that you
will want to use on fields that are used for identification and/or
classification.

-Thunderbolt automatically edits type N fields when they are presented
to you. This can be disconcerting when you are trying to display Zip
code 00102 and all you see is 102. High-order zeros are automatically
suppressed.

Of course, you can employ format overrides in every report that displays such a field and get the output you want. But that is a pain in the neck.

Even though a field is designated as type X, you can still enforce "numeric
only" at data entry time (via the Field Services).

The UNIQUE Clause

The UNIQUE clause is used to "tune" tree structure space, preventing wasted space. The placement for UNIQUE clauses in the Data Base Definition is last, after all Tables have been defined. See the Example Data Base's DBD for an example of UNIQUE clauses.

Syntax:
UNIQUE keyname number -or- SZ(tablename)[*factor]

To best see the action of the UNIQUE clause, lets look a a couple of examples. First, consider the following table definition:

TD CUSTOMERS 1000000 records
cust# x(8) key
name x(30) key
street x(30)
city x(15) key
state x(2) key
status x(6) key
gender x(7) key
:

The question is: How many unique values are going to occur for each of these keys?

Since customer numbers are unique, that is, no two customers have the same customers number, so in one million records we expect one million different values for CUST#.

Names are nearly unique, but they will not be completely unique since there will be some customers who share the same name. So lets say in a million records we expect 970,000 unique values for NAME.

How about CITY? Well, since we know that we only do business in about 5000 cities, probably 30,000 will cover the uniqueness quality of the CITY field.

And STATE? There is only 51 states. Throw in some territories and you still have a number less than 100.

STATUS? Lets say we know how we grade our customers, and there are only six (counting blank) possible values for STATUS.

And the possible values for GENDER are MALE, FEMALE, UNKNOWN, and blank. Four altogether.

Two things that you need to know: One is that trees keep track of the uniqueness quality of the data, and the other is that the Data Base Definition processor is really dumb.

When compiling, its logic is: "Here is a key, I know there are one million records in the table, so I better put up a tree for this key that can hold one million nodes." So, in the absence of further information, all six of the keys shown above will get a tree capable of holding one million nodes. Thats OK for the first two keys, but it is a terrible waste of space for the last four.

The purpose of the UNIQUE clause is to supply that "further information" so that the DBD processor will know what size tree to create for keys that have a low uniqueness quality (these are referred to as "low cardinality" keys). The following UNIQUE clauses would be appropriate to use for our example:

UNIQUE CITY 30000