ADO.NET: DataTables, In-depth
DataTables are designed to store data. Where the data comes from is not a concern of the DataTable, because it is completely autonomous, being independent of, and disconnected from it’s data source. So, whether the data is coming from MS Access, SQL Server, Oracle, Csv Files, or some other data source, it doesn’t matter: A DataTable is going to perform the same.
DataTables are very similar to tables in any other standard database, such as MS Access, SQL Server, Oracle, etc., because they all have Columns, and store rows of data.
Before we look at the commonly used properties of a DataTable, let’s look at two of the most important ones:
Columns: This is a collection of DataColumns that are added to the DataTable.
Rows: This is a collection of the DataRows that are added to the DataTable.
Columns are not designed to store data, but to define the data that is stored in them. For example, is the data going to a string? If so, how long can the string be? Is the data going to be a number? If so, how big will the number be? Will the number contain decimals? Is the data going to be a date? Is the data required? What if the user doesn’t enter a value, what will be the default value?
Here’s a look at the commonly used properties that we can configure when we create a DataColumn for a DataTable:
Allow DB Null: This is a True or False property that indicates whether a column is allowed to store a null value in it. If a column is not allowed to store a null value, then a value must be provided by the user, the DefaultValue property, or programmatically.
AutoIncrement: This is a True of False property that indicates whether the numeric value of the field will increment automatically for each new record that is added. This is especially useful for an ID column, or a column that will serve as they key value for new rows. If this property is set to True, the DataType will automatically be set to Integer (Int32).
AutoIncrementSeed: This is a numeric property that indicates the number to be used to begin automatic incrementing at. For example, if the value is “0”, then the auto increment pattern will be “0, 1, 2, 3…” If the value is set to “100”, then the auto increment pattern will be “100, 101, 102, 103…”
AutoIncrementStep: This is a numeric property that indicates the numeric pattern to use when automatically incrementing values. For example, if the value is “1”, then incrementation will be by 1: “0, 1, 2, 3…” If the value is “2”, then incrementation will be by 2: “0, 2, 4, 6…” If the value is “-1”, then the incrementation will be by –1: “0, –1, –2, –3…”
Caption: This is a String property that stores the default text that will be displayed by user interface controls that display column header information.
DataType: This property defines what type of data can be stored in this column, and can be any valid data type, such as String, Integer, DateTime, Boolean, etc. The default data type is String.
Surprisingly, although the DataSet designer limits the data types that can be entered, any data type can be specified programmatically.
DateTimeMode: This property controls the serialization DateTime format.
DefaultValue: This property indicates the default value that will be used to populate this column in a new row. This value can be anything as long as it corresponds to the selected DataType. DBNull (null) is the standard default value.
Expression: This property allows you to enter an expression that can be used to filter rows, calculate column values, or aggregate related data. For example, if we had a table with 2 integer columns, “A” and “B”, we could add a new column “C” and set the Expression to insert the Sum of columns “A” and “B” into column “C”:
More on Expressions to come later…
ExtendedProperties: This property is a collection (HashTable) that can be used to store custom information of any data type. However, if the table is going to be written to XML, any data that is not of the String data type will not be persisted.
MaxLength: This is an Integer property that allows you to define the maximum length of the data that can be stored in this column. A value of “-1” indicates there is no maximum, while any positive number limits the length of the data. An attempt to add data to the column that is longer than the MaxLength property results in an Error.
Name: This is a String property that sets the name of the DataColumn. This property is used to retrieve the DataColumn from the DataTable’s Columns collection.
NullValue: This property indicates what value will be returned if the column is null: Empty, Nothing and Throw Exception.
ReadOnly: This is a True of False property that indicates whether changes can be made to the data in the column after it’s row has been added to the DataTable.
Unique: This is a True of False property that indicates whether the column should allow duplicates (False) or not (True).
So now that we’ve looked at the standard properties of a DataColumn, let’s look at some examples of how to use them:
Now that we have an example of working with DataColumns in the Employee DataTable, let’s look at working with DataRows.
Rows are designed to actually store the data that gets entered into a DataTable. Rows can be added to a DataTable, retrieved from a DataTable, be evaluated, updated and even removed from a DataTable.
The structure of a row is defined by the DataTable that it belongs. For example, a row that belongs to the Employee table, that we created in the DataColumn section above, will have an EmployeeId column, and FirstName, LastName, FullName and HireDate columns as well.
To start with, let’s look at the commonly used properties of a DataRow.
HasErrors: This is a True or False property that indicates whether there are any errors in the row.
A DataRow does not automatically identify errors, but rows can be programmatically evaluated, and errors can be set and cleared when necessary.
Item: This property provides access to the content of each of the DataColumns in the row. The values in the columns can be both retrieved and set using this property.
ItemArray: This property provides access to the array of content of each of the DataColumns in the row. The values of all of the columns can be retrieved as an array, and set as an array.
RowError: This String property is used to set or clear an error for a row.
A RowError applies to the whole row itself, which means that it is not column specific. The SetColumnError method can be used to set an error that is specific to a column in the row, while the GetColumnError method can be used to retrieve errors that are specific to a column in a row. The GetColumnsInError method can be used to retrieve all of the DataColumns in the row that have errors.
RowState: This property is read-only, and indicates the current state of the Row. A DataRow can have one of the following DataRowStates: Detached, which is the state of a new row that has been created, but has not actually been added to the DataTable; New, which is the state of a new row that has been created and added to the DataTable; Unchanged, which is the state of the row after any pending changes have been committed; Modified, which is the state of a row that has been modified; and Deleted, which is the state of a row that has been deleted (because rows are not actually removed from the DataTable when they are deleted).
If thinking about the RowState confuses you, don’t despair. As we take a deeper look at how DataTables handle rows, the RowState will begin to make a lot more sense.
Table: This property returns a reference to the DataTable that the DataRow belongs to.
Now that we have looked at the standard properties of a DataRow, let’s take a look at how to work with them.
These examples will use the Employees table that we created in the DataColumn section above.
Add a new row to the table, set the column values using the Item property, while examining the auto-calculated columns, default values, and the RowState.
Add a new row to the table, set the column values using the ItemArray property.
Add a new row to the table, commit changes to the new row, retrieve the row from the table’s Rows collection, and modify the row, while analyzing the different states of the RowState property.
Now that we have learned about the properties of a DataRow, let’s take a look at the methods of a DataRow.
AcceptChanges: Commits all pending changes stored in the DataRow and sets the RowState to “Unchanged”.
BeginEdit: Begins the editing process for a DataRow. Once the editing process begins, it can be commited by calling EndEdit, or cancelled by calling CancelEdit.
CancelEdit: Cancels all changes that were made since the BeginEdit method was called, restoring the row to its original state.
ClearErrors: Clears the row of all errors. This includes errors set with the DataRow’s RowError property and the SetColumnError method.
Delete: Sets the RowState property to “Deleted”.
The Delete method does not actually remove the row from the table, it only sets the RowState to “Deleted”. This is necessary, because when a database (such as MS Access, SQL, etc.) needs to be updated with the changes, it needs to know which rows to remove from its table.
EndEdit: Commits all changes that were made since the BeginEdit method was call.
GetChildRows: Retrieves an array of all of the child rows, for the current row, from a related table using the respective DataRelation.
GetColumnError: Retrieves all of the error information for the specified column.
GetColumnsInError: Retrieves an array of all of the columns that have errors.
GetParentRow: Retrieves the parent row, for the current row, from a related table using the respective DataRelation.
GetParentRows: Retrieves the parent rows, for the current row, from a related table using the respective DataRelation.
HasVersion: A True or False value that indicates whether the row is able to return a version of data. The available DataRowVersions are: Current, Default, Original and Proposed.
IsNull: A True or False value that indicates whether the value of a column in a row is null (DBNull) or not.
RejectChanges: Rejects all pending changes stored in the DataRow and restores the RowState to its former state. RejectChanges can be executed after changes have been made, but before AcceptChanges is called.
SetAdded: Changes the DataRow’s RowState to “Added”.
SetColumnError: Sets error text for a particular column.
SetModified: Changes the DataRow’s RowState to “Modified”.
SetParentRow: Changes the parent row, for the curent row, from a related table using the respective DataRelation.
Ok, so now we’ve had a chance to see various methods for the DataRow. Let’s actually take a look at how to use some of the methods (we’re not going to examine working with parent or child rows in this article, because those include working with DataRelations, which we haven’t covered).
Let’s begin by looking at the HasVersion method. This is an excellent place to begin, because in order to fully understand what the other methods are doing, we need to fully understand how row versioning works.
A DataRow can have up to 4 copies of it existing at the same time. Each copy of the row is a complete copy of the entire row, including all columns and their respective data. A DataRow does not always have 4 copies, that’s just the maximum number of copies that can exist at one time. Each copy of the row is not necessarily exactly the same as the other copies of the row, but rather is a copy of the row respective to the types of actions that have been performed on the row.
As mentioned above, the four possible row versions are Current, Default, Original and Proposed. Let’s look at them, but in a different order than listed:
Original: This version of the row contains the original values of each column. This version is created or updated after AcceptChanges is called. This version does not exist when a new row has been created, and added to the table, but AcceptChanges has not been called.
Current: This version of the row contains the current values of each column, and always exists.
Proposed: This version of the row contains proposed values of each column, and only exists between the time that BeginEdit and either EndEdit or CancelEdit are called.
Default: This version of the row depends on the RowState of the row: if the RowState is Added, Modified, Deleted or Unchanged, then the default version is the same as the Current version; however, if the RowState is detached, then the default version is the same as the Proposed version.
Here’s an example of a row with 4 different versions:
Although you could think of each version of the row as a separate row, I hesitate to actually present it that way, because it could become confusing.
Now let’s take a look at a programming example that demonstrates row versions:
Here is the ShowRowVersions method that is being called from the code above:
Looking at the example above, you will notice that there is no Proposed row version. This is because the row has been added, AcceptChanges has been called on the row, but there have been no modifications to the row. To create a Proposed row version, all we need to do is modify the contents of an existing row.
This is a great point to briefly talk about BeginEdit, EndEdit, and CancelEdit. Although these methods are not required when modifying data to a row, they do provide an ability to “roll back” changes that have been made to a DataRow, which discards all pending changes, or the ability to commit changes once they have been finished.
The BeginEdit method begins the editing process, creating the Proposed version of the row, where all changes to the row are temporarily stored in a pending state (awaiting to be committed or discarded). The CancelEdit method discards those pending changes, while the EndEdit method commits them.
Now that we understand that, let’s look at an example of editing a row:
Since we have a table with a row in it that contains changes, and is now in a “Modified” RowState, it’s a perfect time to talk about AcceptChanges and RejectChanges:
Calling AcceptChanges on a DataRow completely commits the changes, copying the values from the Current row version to the Original row version, and then changing the RowState of the Current row version from “Modified” to “Unchanged”.
Calling RejectChanges on a DataRow completely discards the changes, rolling them back by copying the values from the Original row version to the Current row version, and then changing the RowState of the Current row version from “Modified” to “Unchanged”.
Rows can be added, then edited without calling BeginEdit or EndEdit, all we have to do is add the row, then change the row’s Item property. When the value of a column in a row is changed using the Item property, the changes are committed, and the Current row version is automatically updated (although they can still be discarded if RejectChanges is called).
A DataRow also has a Delete method that can be called. The Delete method does not actually remove the row from the table, but rather changes the RowState to “Deleted”, and and removes all versions of the row.
The reason the row is not removed from the table, is because, if the row came from a database, then the database needs to be notified that the row has been deleted. If the row had been completely removed from the table when the delete method was called, the database would have no way of knowing the row was deleted, and should be removed from the database.
** We haven’t discussed how to work with databases at all, that will be covered in a separate blog later, when DataAdapters are discussed. **
Here’s an example:
DataRow Column Errors
A DataRow not only has the ability to store row specific error information in the DataRow’s RowError property, but it also has the ability to store information about errors that occur in particular columns. The error information can be set, retrieved and even cleared using the SetColumnError, GetColumnError, and ClearErrors methods, and the actual array of the columns containing errors can be retrieved using the GetColumnsInError method.
Here’s an example of working with column errors, and working with the IsNull method of the DataRow:
After a DataRow has beed added to a table, and AcceptChanges has been called, setting the RowState to “Unmodified”, the RowState can be programmatically changed to either Added or Modified.
A simple example of why this would be useful is, if you are moving data from one database to another: As soon as you fill the DataTable using a DataAdapter, you can loop through each row in the table and change its RowState to “Added”, then change the DataAdapter’s connection string, and call it’s Update method to insert the records into the destination database.
We haven’t covered DataAdapters in this article, so I’m being very brief in this example. But hopefully you’ll get an idea of why it would be useful to be able to programmatically change the RowState.
Having said, that, here’s an example:
As of this point, we have pretty thoroughly covered the DataColumn and the DataRow, and are now ready to dive into the DataTable.
There are 2 types of DataTables: The “Typed” DataTable that gets added to a “Typed” DataSet that you add to your project and configure in a Designer (in an open project: Project Menu | Add New Item… | DataSet), and the “UnTyped” DataTable that you create and configure programmatically. So far, we have only been working with “UnTyped” DataTables.
Whether we work with a “Typed” or “UnTyped” DataTable, the properties, methods and events remain the same. Let’s start with looking at the properties.
CaseSensitive: A True or False value that indicates whether string comparisons within the table are case sensitive.
An example of this is the DataTable has a Find method (we’ll cover later) that searches for a specified key. If CaseSensitive is True and you are searching for “GARY”, and you enter “gary”, then it will not be found.
ChildRelations: Retrieves a collection of the child relations for the table.
Columns: Retrieves a collection of the table’s DataColumns.
Constraints: Retrieves a collection of the constraints for the table.
DataSet: Returns the DataSet that the table belongs to. If the table doesn’t belong to a DataSet, then Nothing is returned.
DefaultView: Returns the Default DataView for the table.
A DataView is simply a “view” of the data that can be sorted, filtered, and even bound to the controls of a user interface.
DesignMode: True or False value that indicates whether the DataTable is in design more or not. (This will always be False for a “UnTyped” DataTable).
ExtendedProperties: This property is a collection (HashTable) that can be used to store custom information of any data type. However, if the table is going to be written to XML, any data that is not of the String data type will not be persisted.
The ExtendedProperties property of the DataTable works the exact same as the ExtendedProperties property of the DataColumn, except it applies to the table and not to columns. See the ExtendedProperties property of the DataColumn for an example.
HasErrors: True or False value that indicates whether there are errors in any of the rows that are in the table.
IsInitialized: True or False value that indicates whether the DataTable is initialized.
Locale: This property returns the CultureInfo information that is used to compare strings within the table.
MinimumCapacity: An Integer value that indicates how many rows the table will allocate memory for before it is filled with data. The default value is 50.
This property can increase the performance of a DataTable being filled with large amounts of rows from a database, if it is set to the approximate number of rows before being filled. If the data filling the table has more rows than the MinimumCapacity, then ADO.NET will automatically request more memory for the table.
Namespace: This property contains the namespace that will be used when the contents of the DataTable are written to XML (as with the WriteXml method), or loaded from XML into the DataTable.
ParentRelations: Retrieves a collection of the parent relations for the table.
Prefix: This property contains the prefix for the namespace that will be used when the contents of the DataTable are written to XML, or loaded from XML into the DataTable. The DataTable Prefix applies to the DataTable only, while the DataColumn Prefix applies to the DataColumn only.
PrimaryKey: An array of DataColumns that are used as primary keys for the DataTable.
RemotingFormat: The serialization format that indicates whether the DataTable is remoted to Binary or Xml format. The default is XML.
Rows: Retrieves a collection of the table’s DataRows.
TableName: The name of the table.
Now that we are familiar with the common properties of the DataTable, we can begin to look at how to use them.
Since we’ve already looked at the DataColumn and the DataRow above, we’ll demonstrate adding Columns and Rows, then looping through them to access information.
A PrimaryKey is a special “key” for a DataTable that is used to identify rows in a table. When a table has a PrimaryKey, each row in the table has a unique key. Because this key is unique, no other row can have the same exact key.
Many times developers choose to use AutoIncrementing numbers as the PrimaryKey for a table, but anything can be used, as long as it’s unique. For example, you may choose to use First Names, Last Names, Phone Numbers, Social Security Numbers, etc. in a table containing employee information.
However, sometimes First Names are not unique enough, because you may have several people in your table that have the same name. Therefore, you may consider using a combination of First Names and Last Names.
PrimaryKey Combinations of First Names and Last Names are possible because the PrimaryKey can be set to one or more DataColumns within the table.
Here’s an example of working with PrimaryKeys:
The CaseSensitive property is False by default, which is the best of the two possibilities for standard string comparisons that are performed within the DataTable, because it lends itself to more flexibility, and more expected results. It is possible, however, to set this property to True, if you desire string comparisons to perform an exact match.
If CaseSensitive is False, then a search for a FirstName of “gary” will be a direct match for a row containing a FirstName of “GARY”. Which is pretty much what you would expect.
If CaseSensitive is True, then a search for a FirstName of “gary” will return no results, even if there is a row containing a FirstName of “GARY”.
This example will use the Find method of the DataTable to demonstrate CaseSensitive String Comparisons within the DataTable. In order to use the Find method of the DataTable, the table must have a PrimaryKey configured.
The Locale property of the DataTable provides access to the culture info that is used when performing string comparisons within the DataTable. By default, the culture info will be the same as your system’s. However, you may desire to change this.
Although we have not covered DataSets in this article, the DataTable has a DataSet property that returns a reference to the DataSet that the DataTable belongs to, if any, or it returns Nothing if it does not belong to a DataSet.
In brief, a DataSet is conceptually similar to a database, in that it is the “container” that contains DataTables and DataRelations, etc. We’ll cover DataSets in a separate article.
For now, we’ll give a brief example of the DataSet property of the DataTable, and the TableName property of the DataTable:
The DefaultView property of the DataTable returns a DataView, which is a manipulative “view” of the data that can be searched, sorted, and filtered by expression or RowState. We will not embark on a thorough analysis of the DataView here, but will briefly demonstrate accessing it an using some of it’s features.
The Namespace and Prefix properties can be used to set an XML namespace and prefix for your DataTable. This namespace and prefix will be used when the contents of the DataTable are written to XML.
As a note, Namespaces are often URI’s. These URI’s look like they point to a location on the web, but they do not. The purpose of the Namespace is not to point to a location on the web, but to give the namespace a unique name. The URI’s are not even used to look up information. Organizations may use their website url as a namespace, but essentially, it can be anything.
The Prefix is kind of like a variable that points to the Namespace, and allows the Namespace to be easily referenced in the XML document. However, the DataTable has it’s own Prefix, and each DataColumn in the table has it’s own Prefix.
Also notice how the “v” prefixes the table name: “<v:Employee>”
And how the “v” prefixes the column name: “<v:FirstName>”
The example above included in it’s demonstration an example of writing the contents of the DataTable out to an XML file. Not only can the contents of a DataTable be written out in XML format, they can also be written out in a compact binary format. To make this happen, just change the RemotingFormat property to binary, then use a FileStream and BinaryFormatter to write the data out to a file.
The binary output file “data.bin” looks like this:
Here is a list of the commonly used DataTable methods:
AcceptChanges: Commits all pending changes withing a DataTable.
BeginLoadData: Turns off error notifications, index maintenance and constraints while loading.
Clear: Removes all rows from the DataTable, while leaving all columns in place.
Clone: Copies the structure of the DataTable, including schema info and constraints, but not the data.
Compute: Calculates the provided expression on the current rows that meet the filter criteria.
Copy: Copies the structure and data of the DataTable.
CreateDataReader: Creates a read-only, forward-only DataTableReader for the DataTable, which contains the current version all of the data in the table, except for deleted rows.
EndLoadData: Turns on error notifications, index maintenance and constraints while loading.
GetChanges: Retrieves a copy of the DataTable containing all changes made to it since AcceptChanges was called.
GetErrors: Retrieves an array of DataRows that have errors.
ImportRow: Copies a DataRow into the DataTable, including original and current values, while preserving any property settings.
Load: Fills the DataTable with data from a DataReader.
LoadDataRow: Finds and updates a specific row, or creates a new row if now row was found.
NewRow: Creates a new DataRow that has the same schema (columns) as the DataTable.
ReadXml: Reads XML schema and data from an XML stream into the DataTable.
ReadXmlSchema: Reads XML schema only (no data) from an XML stream into the DataTable.
RejectChanges: Rolls back all pending changes that have been made to the data in the table since AcceptChanges was called.
Reset: Resets the DataTable to it’s original state. If you programmatically created the DataTable, then it will remove all columns and rows from the table.
Select: Retrieves an array of DataRows that match the filter criteria.
WriteXml: Writes the contents of the DataTable to an XML stream.
WriteXmlSchema: Write the DataTable’s structure as an XML schema to an XML stream.
So now that we’ve had a chance to learn about the methods of the DataTable, let’s take a look at how to use them.
To begin with, the DataTable has a BeginLoadData and a EndLoadData methods. As indicated above, they turn Off and On (respectively) error notifications, index maintenance and constraints.
Note: this table will be used for all examples below, unless specified.
The Clone method of the DataTable copies the structure, schema info and constraints, but not the data.
Here’s an example of using the Clone method, the ImportRow method, the Select method, and the Clear method:
The Copy method can be used to create a copy of the DataTable, including the structure of the table, and all rows of data.
Also, the Compute method can be used to perform an aggregate calculation for a single column on rows of data based off of the provided expression.
This example demonstrates the Copy method, Compute method and the Reset method:
A DataTable also has the ability to create a DataReader, which is a read-only, forward-only copy of the data. Many times DataReaders provide better performance than a DataTable, because DataReaders move quickly through the data, discarding each row after it has passed it. A DataTable can also use a DataReader to load all of the data.
Here’s an example of using the CreateDataReader method and the Load method:
The DataTable’s GetErrors method can be used to retrieve all of the rows in the table that have errors (including row errors and column errors), then each error can be analyzed and processed as desired.
When changes occur in a DataTable, the GetChanges method can be used to retrieve an array of rows containing those changes. These changes can be looped through, analyzed and processed, or they can be submitted to a database for updates, such as when there is a hierarchical structure in place and updates need to be made in a particular order.
Here’s an example of how to use the GetChanges method to retrieve all of the rows in the DataTable that have been modified, plus the AcceptChanges and RejectChanges methods:
The final methods to look at are methods that support the DataTable’s abilities to read and write from XML stream. Those methods are: ReadXml, ReadXmlSchema, WriteXml, and WriteXmlSchema.
These are some of my favorite aspects of a DataTable! If you have a DataTable, and you want to output its contents to a file to save to hard-disk, or to send to someone else, it can be accomplished with 1 line of code!