GEOG 5223 Project 4

Database Design

by Tom Wells -  November 2003

Fig.1 - Input Data

Fig.2 - Table Specs

Fig.3 - Domains

Fig.4 - Access View

Fig.5 - ArcMap View 

Comments 

Sources

Lesson 4 introduces the uninitiated student to the basics of relational database design.  The lesson describes a significant database design example and then asks the student to apply the methodology to a simpler case.  The student must examine a sample hard-copy voter registration list and then design an efficient ArcView table structure for its possible replacement that is functional for data entry and meets the following database design criteria.
  • Minimize disk storage space.
  • Minimize data entry effort.
  • Decrease the chance of data entry mistakes.
  • Allow for easy and efficient usage within ArcMap.

Project 4 deliverables include a "Personal Geodatabase" capable of holding the sample data plus this project page that describes the database and justifies the design decisions.


Sample Input Data

The following scanned image of a hand copied (and fictional) voter registration list represents the kind of data that the required database must be capable of  holding.  


Figure 1: Example hand-written voter registration list. - This hard to read example demonstrates a reason for computerization of data lists; legibility.  


Database Design

ArcCatalog and ArcMap (part of ArcView 8.3) make it possible to design a single table that can hold the above data with a minimum of redundant information (via "Attribute Domains" as discussed below).  In a relational database, a table consists of  a fixed number of data columns combined with a nearly endless number of potential data rows forming what is sometimes called a data-grid.  (A database table is often presented in a spreadsheet like data-grid format but it can also be tied to a "form" which offers many presentation possibilities.)

Each column (often called a database field) is designed to hold a certain type of data.  The data definition can be very general allowing a wide range of data types (binary blobs) or very restrictive (True / False) to limit the data to very specific codes.  For this replacement project, a table was designed with the specifications listed in Figure 2.  The specifications are followed with a discussion of the design rational. 

Note: Relational databases are called relational because of the permanent or temporary relationships used to link multiple tables together into larger "logical" tables.  In this case, it would be possible to break the example table into three tables; the main data table and two lists of valid values.  However, making use of the ArcView Attribute Domain feature is considerably simpler and more efficient since only a single table is needed.



Field Order

Field Name

Field Type

Nulls Allowed

Allow Zero Length

Domain

Indexed

1

OBJECTID

Long Integer (Increment)

No

NA

 

Yes (no duplicates)

2

VoterID

Long Integer

No

NA

Range*

Not Supported

3

PIN

Text(11)

No

No

 

 

4

Ethnicity

Text(1)

No

No

Coded

 

5

Party

Text(1)

No

No

Coded

 

 

Figure 2: Relational Database Table "VOTERS" Specifications - Note: Field # 1 was created by ArcCatalog, "NA" = Not Applicable, and * indicates specified but NOT effective.


Database Table Design

The data list shown in Figure 1 has four data columns whereas the specifications shown in Figure 2 describes 5 data columns / fields.  The extra field, field # 1, was created by ArcCatalog when it was asked to create a new table within the Project # 4 personal geodatabase (Lesson4.mdb).  As the data field # 1 name "OBJECTID" implies, this field is used by ArcView  to keep track of data objects.  By having ArcCatalog create the personal geodatabase mdb, instead of Microsoft Access, an extended database structure is created with ArcView specific tables. Twenty six ArcView specific tables are included in the personal geodatabase mdb file as indicated by their common "GDB_" table name prefixes.  Note: mdb = Microsoft DataBase.  Microsoft Access also "hides" some system tables from view (by default).

The following four fields were included in the VOTERS table to correspond to the four data columns shown in Figure 1.  The field specifications are listed in Figure 2.  All four fields do NOT have a Default value specified.  Each Voter ID must be unique so a default value is not useful. Similarly. a default PIN value does not make any sense because there are no common values.  A default value could be specified for the Ethnicity and Party fields because there are relatively few possibilities and some values are common (i.e. "White" for Ethnicity).  However, default values were not specified for these fields either to insure that the date entry personnel make a conscious choice.

VOTERS Data Fields

 VoterID The lesson specifically states that the VoterID is "just an arbitrary number".   An examination of the data supports the statement and shows that the numbers are integers with as many as 5 digits.  The example data would fit in a Short Integer (2 bytes) field type but a Long Integer (4 bytes) is considered safer since 5 digits can exceed the capacity of a (signed) Short Integer (-32,768 to 32,767).  Each voter must have a VoterID to uniquely identify each voter (people can share the same legal name).  Therefore, "Null Allowed" is specified as "No" to insure that data is entered. (Unfortunately, it does NOT work in ArcMap, see the Comments below.)
 PIN The Property Identification Number (PIN) example data always has two dashes separating three sets of digits/characters.  The character position of the dashes varies because there may be 1, 2 or 3 characters grouped together on either side of the dashes.  Therefore, a 11 character text field was specified to contain a maximum of three sets of 3 digit numbers separated by two dashes ( 3*3 + 2 = 11) .  No Nulls or zero length strings are allowed to ensure that the property tax parcel ID number of the voter's residence is entered.  (The tax parcel ID can theoretically be tied to the voting district.)
 Ethnicity Ethnicity classifications are limited to a supplied list of only 6 possibilities.  Therefore it is possible to specify a voter's ethnicity with a single character via a "coded value domain" as discussed below.  No Nulls or zero length strings are allowed to ensure that the voter's ethnicity is entered (there is an "Other" category).
 Party Likewise, political party affiliations are limited to only 6 possibilities.  A single character can be used to specify 26 possibilities even if restricted to the regular alphabet.  No Nulls or zero length strings are allowed to ensure that the voter's political party is entered.  Again, there is an "Other" category for fringe political parties.

Note: Restricting the text field size to less than 255 does NOT save space in a Microsoft Access database because "Microsoft Access does not reserve space for unused portions of a text field".  Text field lengths have been specified to restrict the length of the input strings character fields to the required maximum length and thereby possibly prevent excessive data entry.

Unique Index Too Restrictive

The VoterID should be unique to help enforce the fundamental rule of one voter versus one vote per election.  Uniqueness can be enforced in most relational databases by adding a unique index on the field of interest.  Indeed ArcCatalog lets the operator design a new index and specify that it is unique.  With a unique index, the database should block the data entry of any value already defined in the unique index.  A significant limit with ArcMap is that it refuses to allow data entry with a user defined unique index in place.  According to the class instructor (10/4/2003), this index feature is only available to speed up queries once a table is populated.

It was interesting to note that the unique index, "FDO_OBJECTID", created by ArcCatalog for table OBJECTID is not a primary key index (as indicated in Access 2000) even thought it apparently behaves like one.  Since the OBJECTID is an automatically incrementing long integer "AutoNumber" field, its unique index does NOT restrict duplicate VoterID usage. 

Domain Specifications

ArcView 8.3 supports two classes of Domain specifications; Range Domains and Coded Value Domains.  These domain classes can be applied to one or more fields to define an allowable numerical range or a list of valid coded values.  

Consistent with the Figure 2 specifications, a domain range was applied to the VoterID numeric field to theoretically prevent the entry of a negative number and to allow no more than 5 digits.  This was accomplished by specifying a minimum value of zero and a maximum value of "99999" to a "VoterID" Range Domain and then assigning this Range Domain to the VoterID field.  However, the range domain is NOT effective in ArcView for standalone tables (as stated by the instructor on the message board 11/4/2003) and verified by testing.  Testing also showed that it possible to enter NULL field values via ArcMap even though No NULLs were specified during table construction and the specification is effective within Microsoft Access 2000.

The Coded Value Domains are effective with standalone tables and they do more than just restrict the data input to a list of values.  A coded value domain also acts as an automatic lookup table.  The single character field appears to the user as the complete description as listed below.  Two coded value Domains were defined and assigned to their like named table fields; Ethnicity and Party.

 

Ethnic Groups (Domain "Ethnicity")

Political Parties (Domain "Party")

  • American Indian

  • Asian

  • Black

  • Hispanic

  • White

  • Other

  • Extremist

  • Federalist

  • Mugwumps

  • Purple

  • Whigs

  • Other

 
Both domain lists are unique with respect to the first character with one exception.  The Ethnicity domain has two entries that begin with the letter "A".  Since one of the these two entries is also the only two word description, "American Indian", the second word was chosen to make the list unique and thereby make data entry a simple one character entry.  I.E., in both Coded Value Domains, the character code entered to represent the category matches the first character of the description with that one (relatively rare) exception.

The following figure shows the Lesson4.mdb Personal Geodatabase properties form with the Domains tab selected within ArcCatalog.  The three domain names are listed with Domain Properties and Coded Values also shown for the selected Domain.  The "Ethnicity" coded domain properties are shown with a few of its single character codes and their descriptions listed below. 

 


Figure 3:

Personal Geodatabase (Lesson4.mdb) "Database Properties" dialog form with some Domains' information shown via ArcCatalog


Microsoft Access Data View

The first 11 rows of the sample data was entered into ArcMap.  How the data looked while it was being entered is shown in Figure 5.  The following figure shows, in an Access datasheet view, how the data was actually stored.

Note: When a data record  is deleted in MS Access, the generated "AutoNumber" is NOT reused.  In this case, OBJECTID # 11 was deleted.


Figure 4:

How the sample data looks in Microsoft Access 2000 as actually entered and stored.


Microsoft ArcMap Data View

The first 11 rows of the sample data are shown below in ArcMap.  This is how the data looked while it was being entered even though only single characters were entered to indicate the voter's Ethnicity and Political Party affiliation.  Since the data is in a single table, the data is very easy to query.


Figure 5:

How the sample data looks in ArcMap with the Coded Value Domains in effect.


Comments

ArcView 8.3 is a very impressive GIS system with a wide range of capabilities.  For example, GEOG 5222 projects have demonstrated significant "desktop map publishing" capabilities that may be all most user need.  (I.E. no need to purchase Adobe PageMaker and/or Corel Draw for example.)    In contrast, this project has revealed multiple deficiencies regarding ArcView's use for data entry purposes.  In my opinion, most users with significant data entry tasks or data management needs should also purchase Microsoft Access.  The ArcView Personal Geodatabase technology utilizes the Microsoft Access native (mdb) file format so these tools are inherently compatible. 

Note: I have not studied ArcEdit which, as the name implies, is a "powerful data creation and editing environment".  The ESRI product contains "all the functionality of ArcView and adds the power to edit features in a multi-user geodatabase."  This product may have much better attribute table data entry capabilities then ArcView but I imagine that it costs much more to license than Microsoft Access.

Furthermore, I have used multiple desktop databases including INFO, KMAN, Paradox for DOS, Paradox for Windows, FoxPro and Lotus Approach and much prefer Microsoft Access.  (My biggest problem with Access are its differences relative to its big brother; Microsoft's own SQL/Server database technology.  For example, Access uses double quotes around character strings in queries whereas SQL/Server uses single quotes.)

Database limitations found in ArcView while performing this project include:

  • Unable to specify an Attribute Domain Range that is functional with a standalone data table.  Furthermore, Attribute Domain Ranges are apparently NOT linked to the table's Microsoft Access "Validation Rule" settings. 

  • Unable to specify a unique index that supports data entry in a data table. 

  • It is possible to specify field level "Allow Null Values" in ArcCatalog that are saved as "Required" properties in MS Access (i.e. these properties are linked) but ArcMap does not obey the settings.

  • ArcCatalog does not offer a field level "Allow Zero Length" string setting like Microsoft access does (it is always set to "no" by ArcCatalog). NULL values can indicate that a value is unknown whereas a zero length string can indicate that there is no value.

Microsoft Access does not have these limitations and can do much more.



Sources


ESRI web site, http://www.esri.com/software/arcgis/arceditor, Accessed November 5, 2003 

GEOG 5223 Lesson 4, https://cms.psu.edu, Accessed November 2003 

GEOG 5223 Lesson 4 Message Board, Accessed November 4-5, 2003 (asked questions to verify ArcView limitations)

Microsoft Access 2000 Help file, ACMAIN9.CHM, dated March 18, 1999

Go To the Top of Page