Examine DataTable data

User avatar
agibsonsw
SilverLounger
Posts: 2403
Joined: 05 Feb 2010, 22:21
Location: London ENGLAND

Examine DataTable data

Post by agibsonsw »

Hello. Visual Basic 2010.

I've obtained a DataTable which contains the data from a csv file, and I can see that it's populated from its Count property. But I'm unable to get at/display the data in, for example, the Console. I can see it's there (by debugging and spotting it in an ItemList), but I've been going around in circles for ages now :groan:

1) How can I display data from any particular row and/or column? I've been trying to bung it into some sort of array, but this isn't working (as yet).
2) How can I next transfer all the data into an array? And can I do this in one go?

Hoping someone can push me in the right direction :cheers:

Code: Select all

        Dim strConn As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:" _
            & "\Users\Andrew\Documents\;Extended Properties='text;HDR=Yes;FMT=Delimited'"

        Dim objConn As New System.Data.OleDb.OleDbConnection(strConn)
        objConn.Open()
        Dim dt As New DataTable()
        Dim cmd As New OleDb.OleDbCommand("SELECT * FROM Employees.csv", objConn)
        Dim dba As New OleDb.OleDbDataAdapter
        dba.SelectCommand = cmd
        dba.Fill(dt)
        Dim arrData As New ArrayList(dt.Rows.Count)
        For Each aRow As DataRow In dt.Rows
            arrData.Add(aRow)
        Next
        'Dim arr = arrData.ToArray() ??
        For Each item In arrData
            Console.Write(CType(item, DataRow).ToString)    'no workee
        Next 'item

        objConn.Close()
"I'm here to save your life. But if I'm going to do that, I'll need total uninanonynymity." Me Myself & Irene.

User avatar
agibsonsw
SilverLounger
Posts: 2403
Joined: 05 Feb 2010, 22:21
Location: London ENGLAND

Re: Examine DataTable data

Post by agibsonsw »

Hi. After many failed attempts, blind alleys (and a few hours!) I've got this to work :fanfare:

I push the data into a DataReader, then Load it into a DataTable which can be read like an array, or iterated:
The DataReader should be fast, as it's forward- and read-only;
A DataTable is an in-memory representation of the data, so the Connection can quickly be closed;
DataTables also preserve the data types for the field values;
I no longer need to create and populate a separate array structure.

But I need to make this as fast as possible and would welcome any advice, particularly as I'm quite new to VB.Net. (There is a v.large set of data - 480,000 rows - and a large number of calculations are performed against it.)

1) Is VB.Net constantly performing type conversions when reading a DataTable? If so, might it be beneficial to copy the data into some other structure?
2) It just occurred to me that I could specify the data types for the DataTable (I'm guessing). Presumably this is advisable?
3) Do I need to explicitly Clear and Dispose the DataTable, or will the GC take care of this?
4) Is there a better way?

I know the best advice would be to modify the SQL and not retrieve all the records :laugh:. Andy.

Code: Select all

 Dim strConn As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:" _
            & "\Users\Andrew\Documents\;Extended Properties='text;HDR=Yes;FMT=Delimited'"
        Dim objConn As New System.Data.OleDb.OleDbConnection(strConn)
        objConn.Open()
        Dim cmd As New OleDb.OleDbCommand("SELECT * FROM Employees.csv", objConn)
        Dim dr As OleDb.OleDbDataReader = cmd.ExecuteReader()
        Dim dTbl As New DataTable
        dTbl.Load(dr)
        dr.Close()
        objConn.Close() 'Close the connection as soon as we can

        Dim intRows As Integer = dTbl.Rows.Count - 1
        Dim intCols As Integer = dTbl.Columns.Count - 1

        For i As Integer = 0 To intRows     'or dTbl.Rows.Count - 1
            For j As Integer = 0 To intCols     'or dTbl.Columns.Count - 1
                Console.Write(dTbl.Rows(i)(j).ToString & ",")
            Next 'j
            Console.WriteLine()
        Next 'i
        'or could use For..Each, if we want to iterate the whole table:
        For Each row As DataRow In dTbl.Rows
            For Each col As DataColumn In dTbl.Columns
                Console.Write(row(col).ToString & ", ")
            Next 'col
            Console.WriteLine()
        Next 'row
        Console.WriteLine(dTbl.Rows(0)(0).GetType)  'preserves type;
        'that is, a DataTable is a "special" array.
        dTbl.Clear()    'necessary?
        dTbl.Dispose()
"I'm here to save your life. But if I'm going to do that, I'll need total uninanonynymity." Me Myself & Irene.

User avatar
agibsonsw
SilverLounger
Posts: 2403
Joined: 05 Feb 2010, 22:21
Location: London ENGLAND

Re: Examine DataTable data

Post by agibsonsw »

Well for anyone who might be interested.. :grin: I suspect that the code below is the best way to grab, and manipulate, a very large .csv file in VB.Net.

I (suspect) the only way this might be done faster would be to use older technology such as ADO, or possibly ODBC. However, although this might retrieve the data faster, the advantage would (probably) be lost because the data would have to be fed manually into an array. The ADO.Net/DataTable's Load method does this in one fell swoop, and also allows us to immediately then close the database connection.

But I wouldn't be offended if someone knew of a faster method :cheers:

Code: Select all

 Dim strConn As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:" _
            & "\Users\Andrew\Documents\;Extended Properties='text;HDR=Yes;FMT=Delimited'"
        Dim objConn As New System.Data.OleDb.OleDbConnection(strConn)
        objConn.Open()
        'Dim cmd As New OleDb.OleDbCommand("SELECT * FROM Employees.csv", objConn)   '* returns all fields
        'It's more efficient to return specific fields, and only those that are necessary:
        Dim cmd As New OleDb.OleDbCommand("SELECT EmployeeID, FirstName, LastName, Address, " & _
                                          "City, StateOrProvince, PostalCode, Country, PhoneNumber, " & _
                                          "EmailAddress, Salary, DepartmentID FROM Employees.csv", objConn)
        'If possible, add a criteria to reduce the number of rows returned:
        ' ".. FROM Employees.csv WHERE Salary > 20000"
        Dim dr As OleDb.OleDbDataReader = cmd.ExecuteReader(CommandBehavior.CloseConnection)
        'CloseConnection - the Connection will close automatically when the reader is closed.
        If dr.HasRows = False Then
            Console.WriteLine("No rows to process.")
            Exit Sub
        End If

        Dim dTbl As New DataTable("Emp")
        With dTbl.Columns
            .Add("EmployeeID", GetType(Integer))
            .Add("FirstName", GetType(String))
            .Add("LastName", GetType(String))
            .Add("Address", GetType(String))
            .Add("City", GetType(String))
            .Add("StateOrProvince", GetType(String))
            .Add("PostalCode", GetType(String))
            .Add("Country", GetType(String))
            .Add("PhoneNumber", GetType(String))
            .Add("EmailAddress", GetType(String))
            .Add("Salary", GetType(Double))
            .Add("DepartmentID", GetType(String))
        End With
        dTbl.Load(dr)
        dr.Close()
        'objConn.Close() 'Close the connection as soon as we can - the CloseConnection 
        'argument, specified for the reader, achieves this automatically.

        Dim intRows As Integer = dTbl.Rows.Count - 1
        Dim intCols As Integer = dTbl.Columns.Count - 1 'but we know there are 12 columns, 
        'so just set this to 12.

        For i As Integer = 0 To intRows     'or dTbl.Rows.Count - 1
            For j As Integer = 0 To intCols     'or dTbl.Columns.Count - 1
                Console.Write(dTbl.Rows(i)(j).ToString & ",")
            Next 'j
            Console.WriteLine()
        Next 'i
        Console.WriteLine("First Salary is " & dTbl.Rows(0)("Salary").ToString)

        'or could use For..Each, if we want to iterate the whole table:
        For Each row As DataRow In dTbl.Rows
            For Each col As DataColumn In dTbl.Columns
                Console.Write(row(col).ToString & ", ")
            Next 'col
            Console.WriteLine()
        Next 'row

        'we can filter/slice the data using the Select() method:
        Dim expRows As DataRow() = dTbl.Select("Salary > 40000", "FirstName DESC")
        If expRows.Length <= 0 Then
            Console.WriteLine("None over 40k.")
            Exit Sub
        End If
        For Each row As DataRow In expRows
            For Each col As DataColumn In row.Table.Columns
                Console.WriteLine(row(col).ToString)
            Next 'col
        Next 'row

        Console.WriteLine(dTbl.Rows(0)(0).GetType)  'preserves type;
        'that is, a DataTable is a special/typed array.
        'dTbl.Clear()    'not necessary if we use ..
        dTbl.Dispose()  'but this is not necessary as there are no unmanaged 
        'resources with a DataTable. But for a large DataTable we can call 
        'Dispose() when we've finished with it.
"I'm here to save your life. But if I'm going to do that, I'll need total uninanonynymity." Me Myself & Irene.