GEOG 5223 Project 4 | |
Database Design | |
by Tom Wells - November 2003 | |
Fig.1 - Input Data |
Lesson 4 introduces the uninitiated student to the basics of relational
database design. The lesson describes a significant database design
example and then asks the student to apply the methodology to a simpler
case. The student must examine a sample hard-copy voter registration
list and then design an efficient ArcView table structure for its possible
replacement that is functional for data entry and meets the following database design criteria.
Project 4 deliverables include a "Personal Geodatabase" capable of holding the sample data plus this project page that describes the database and justifies the design decisions. |
The following scanned image of a hand copied (and fictional) voter registration list represents the kind of data that the required database must be capable of holding. | |||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||
Figure 1: | Example hand-written voter registration list. - This hard to read example demonstrates a reason for computerization of data lists; legibility. | ||||||||||||||||||||||||||||||||||||||||||
ArcCatalog and ArcMap (part of ArcView 8.3) make it possible to design a single table that can hold the above data with a minimum of redundant information (via "Attribute Domains" as discussed below). In a relational database, a table consists of a fixed number of data columns combined with a nearly endless number of potential data rows forming what is sometimes called a data-grid. (A database table is often presented in a spreadsheet like data-grid format but it can also be tied to a "form" which offers many presentation possibilities.) Each column (often called a database field) is designed to hold a certain type of data. The data definition can be very general allowing a wide range of data types (binary blobs) or very restrictive (True / False) to limit the data to very specific codes. For this replacement project, a table was designed with the specifications listed in Figure 2. The specifications are followed with a discussion of the design rational. Note: Relational databases are called relational because of the permanent or temporary relationships used to link multiple tables together into larger "logical" tables. In this case, it would be possible to break the example table into three tables; the main data table and two lists of valid values. However, making use of the ArcView Attribute Domain feature is considerably simpler and more efficient since only a single table is needed. |
|||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||
Figure 2: | Relational Database Table "VOTERS" Specifications - Note: Field # 1 was created by ArcCatalog, "NA" = Not Applicable, and * indicates specified but NOT effective. | ||||||||||||||||||||||||||||||||||||||||||
The data list shown in Figure 1 has four data columns whereas the specifications shown in Figure 2 describes 5 data columns / fields. The extra field, field # 1, was created by ArcCatalog when it was asked to create a new table within the Project # 4 personal geodatabase (Lesson4.mdb). As the data field # 1 name "OBJECTID" implies, this field is used by ArcView to keep track of data objects. By having ArcCatalog create the personal geodatabase mdb, instead of Microsoft Access, an extended database structure is created with ArcView specific tables. Twenty six ArcView specific tables are included in the personal geodatabase mdb file as indicated by their common "GDB_" table name prefixes. Note: mdb = Microsoft DataBase. Microsoft Access also "hides" some system tables from view (by default). The following four fields were included in the VOTERS table to correspond to the four data columns shown in Figure 1. The field specifications are listed in Figure 2. All four fields do NOT have a Default value specified. Each Voter ID must be unique so a default value is not useful. Similarly. a default PIN value does not make any sense because there are no common values. A default value could be specified for the Ethnicity and Party fields because there are relatively few possibilities and some values are common (i.e. "White" for Ethnicity). However, default values were not specified for these fields either to insure that the date entry personnel make a conscious choice.
Note: Restricting the text field size to less than 255 does NOT save space in a Microsoft Access database because "Microsoft Access does not reserve space for unused portions of a text field". Text field lengths have been specified to restrict the length of the input strings character fields to the required maximum length and thereby possibly prevent excessive data entry. Unique Index Too Restrictive The VoterID should be unique to help enforce the fundamental rule of one voter versus one vote per election. Uniqueness can be enforced in most relational databases by adding a unique index on the field of interest. Indeed ArcCatalog lets the operator design a new index and specify that it is unique. With a unique index, the database should block the data entry of any value already defined in the unique index. A significant limit with ArcMap is that it refuses to allow data entry with a user defined unique index in place. According to the class instructor (10/4/2003), this index feature is only available to speed up queries once a table is populated. It was interesting to note that the unique index, "FDO_OBJECTID", created by ArcCatalog for table OBJECTID is not a primary key index (as indicated in Access 2000) even thought it apparently behaves like one. Since the OBJECTID is an automatically incrementing long integer "AutoNumber" field, its unique index does NOT restrict duplicate VoterID usage. Domain Specifications ArcView 8.3 supports two classes of Domain specifications; Range Domains and Coded Value Domains. These domain classes can be applied to one or more fields to define an allowable numerical range or a list of valid coded values. Consistent with the Figure 2 specifications, a domain range was applied to the VoterID numeric field to theoretically prevent the entry of a negative number and to allow no more than 5 digits. This was accomplished by specifying a minimum value of zero and a maximum value of "99999" to a "VoterID" Range Domain and then assigning this Range Domain to the VoterID field. However, the range domain is NOT effective in ArcView for standalone tables (as stated by the instructor on the message board 11/4/2003) and verified by testing. Testing also showed that it possible to enter NULL field values via ArcMap even though No NULLs were specified during table construction and the specification is effective within Microsoft Access 2000. The Coded Value Domains are effective with standalone tables and they do more than just restrict the data input to a list of values. A coded value domain also acts as an automatic lookup table. The single character field appears to the user as the complete description as listed below. Two coded value Domains were defined and assigned to their like named table fields; Ethnicity and Party. |
|||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||
The following figure shows the Lesson4.mdb Personal Geodatabase properties form with the Domains tab selected within ArcCatalog. The three domain names are listed with Domain Properties and Coded Values also shown for the selected Domain. The "Ethnicity" coded domain properties are shown with a few of its single character codes and their descriptions listed below.
|
|||||||||||||||||||||||||||||||||||||||||||
Figure 3: |
Personal Geodatabase (Lesson4.mdb) "Database Properties" dialog form with some Domains' information shown via ArcCatalog |
||||||||||||||||||||||||||||||||||||||||||
The first 11 rows of the sample data was entered into ArcMap. How the data looked while it was being entered is shown in Figure 5. The following figure shows, in an Access datasheet view, how the data was actually stored. Note: When a data record is deleted in MS Access, the generated "AutoNumber" is NOT reused. In this case, OBJECTID # 11 was deleted. |
|||||||||||||||||||||||||||||||||||||||||||
Figure 4: |
How the sample data looks in Microsoft Access 2000 as actually entered and stored. |
||||||||||||||||||||||||||||||||||||||||||
The first 11 rows of the sample data are shown below in ArcMap. This is how the data looked while it was being entered even though only single characters were entered to indicate the voter's Ethnicity and Political Party affiliation. Since the data is in a single table, the data is very easy to query. |
|||||||||||||||||||||||||||||||||||||||||||
Figure 5: |
How the sample data looks in ArcMap with the Coded Value Domains in effect. |
||||||||||||||||||||||||||||||||||||||||||
ArcView 8.3 is a very impressive GIS system with a wide range of capabilities. For example, GEOG 5222 projects have demonstrated significant "desktop map publishing" capabilities that may be all most user need. (I.E. no need to purchase Adobe PageMaker and/or Corel Draw for example.) In contrast, this project has revealed multiple deficiencies regarding ArcView's use for data entry purposes. In my opinion, most users with significant data entry tasks or data management needs should also purchase Microsoft Access. The ArcView Personal Geodatabase technology utilizes the Microsoft Access native (mdb) file format so these tools are inherently compatible. Note: I have not studied ArcEdit which, as the name implies, is a "powerful data creation and editing environment". The ESRI product contains "all the functionality of ArcView and adds the power to edit features in a multi-user geodatabase." This product may have much better attribute table data entry capabilities then ArcView but I imagine that it costs much more to license than Microsoft Access. Furthermore, I have used multiple desktop databases including INFO, KMAN, Paradox for DOS, Paradox for Windows, FoxPro and Lotus Approach and much prefer Microsoft Access. (My biggest problem with Access are its differences relative to its big brother; Microsoft's own SQL/Server database technology. For example, Access uses double quotes around character strings in queries whereas SQL/Server uses single quotes.) Database limitations found in ArcView while performing this project include:
Microsoft Access does not have these limitations and can do much more. | |||||||||||||||||||||||||||||||||||||||||||
Sources | |||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||
Go To the Top of Page |