Identifying Corrupt Workbooks

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Identifying Corrupt Workbooks

Post by jstevens »

I'm having a challenge identifying corrupt workbooks with this bit of code. I know that the files on my Desktop are corrupt and if I try to open them I receive one of these error messages. The second error message is not being picked up in the VBA code.
EL_110.png

Code: Select all

Sub CheckCorruptFiles2()
    Dim folderPath As String
    Dim fileName As String
    Dim wb As Workbook
    
    ' Set the folder path where your Excel files are located
    folderPath = "C:\Users\Admin\Desktop\"
    
    ' Disable screen updating for faster execution
    Application.ScreenUpdating = False
    
    ' Loop through each file in the folder
    fileName = Dir(folderPath & "*.xls*")
    Do While fileName <> ""
        ' Open the file in read-only mode
        Set wb = Workbooks.Open(folderPath & fileName, ReadOnly:=True)
        
        ' Check if the file generates an error upon opening
        If Err.Number <> 0 Then
            Debug.Print "Corrupt File: " & fileName
            Err.Clear
        End If
        
        ' Close the file without saving changes
        wb.Close SaveChanges:=False
        
        ' Move to the next file
        fileName = Dir
    Loop
    
    ' Enable screen updating
    Application.ScreenUpdating = True
    
    ' Inform that the process is complete
    MsgBox "Corrupt file check completed.", vbInformation
End Sub
Your thoughts are appreciated.
You do not have the required permissions to view the files attached to this post.
Regards,
John

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

jstevens wrote:
25 Jun 2023, 16:59
I'm having a challenge identifying corrupt workbooks with this bit of code. I know that the files on my Desktop are corrupt ...
Hello jstevens.
(1) How many workbook files do you have on your system?
(2) How many do you suspect (but do not yet know) are corrupt.
A quick look at your code suggests that you are trying to open each workbook in turn, and of course you need to sit by the machine to confirm/close the pop-up box that appears for a corrupt workbook.
Am I right?
A better strategy might be to loop on each file, attempt to open it, and report if you were able to open it.

If you start with a list of all your files and produce a list of all the files you were able to open, subtract the second list from the first list, you have a list of corrupt files.

To deal with the pop-up box, I suggest that you initiate a MSWord job before the loop of opening workbooks in Excel. The MSWord job merely checks every three seconds to see if an Excel log file has been updated.
If the log file has been updated, then the excel job is running fine.
If the log file has NOT been updated, then the excel job has run into a undefinable (to MSWord) problem, and MSWord can record that file name.

I did this back in the days of WinXP very easily.
The job was restartable, so the monitor (in your case Word monitoring Excel; in my case Excel monitoring Word) just caused a reboot. The startup invoked the appropriate Excel/Word job, which picked up from the file AFTER the problem.

This strategy applies to a great many situations where we need to test files, but would rather not sit up all night waiting for a pop-up box.

Cheers, Chris
Last edited by ChrisGreaves on 25 Jun 2023, 18:37, edited 1 time in total.
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

Hi Chris,

I have a DMG file containing a good number of corrupt files: Excel, Word, PDF and etc.. I'm not sure which ones are bad but want to start with the Excel files. I do have backups of the DMG file and its contents but it appears they are corrupt as well. I utilize backup software but was experimenting at one time with DMG.

My strategy here is to purge the corrupt files via an automated process. The VBA code I provided was my first attempt in identifying the Excel files that are corrupt.

The MSWord job is an interesting approach. Would you kindly provide the steps involved and I'll give it a try.
Regards,
John

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

jstevens wrote:
25 Jun 2023, 18:05
The MSWord job is an interesting approach. Would you kindly provide the steps involved and I'll give it a try.
Hi jstevens.
Give me a day and I'll make up a simple general-purpose utility to test a variety of files - you mentioned Excel, Word, PDF ...
If you can use it to test your Excel workbooks, where you know some of your workbooks are corrupt, then you should be able to adapt it to other types of file.

The basic premise is that you try to open a file under VBA program control, and if the file doesn't open, then you have a problem-file, correct?
Cheers, Chris
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

Chris,

You are correct..if the file doesn't open then it most likely is corrupt.
Regards,
John

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

jstevens wrote:
25 Jun 2023, 18:50
You are correct..if the file doesn't open then it most likely is corrupt.
I define "corrupt" in a loose form as "can't be opened".
This is defined as "anything that causes a pop-up to appear, and in doing so suspends execution of the Word program".

Now I do have several self-starting processes, so a Word document that auto-starts a process that runs for more than five seconds would be detected by an Excel monitor (with the default interval set to five seconds) to classify the word document as "corrupt".


"2023-06-25 CG 11116: First version, works its way through a list of files and then halts normally 001".
Now to write the Excel Monitor.xls. More tomorrow.
Cheers, Chris
There's nothing heavier than an empty water bottle

User avatar
Jay Freedman
Microsoft MVP
Posts: 1318
Joined: 24 May 2013, 15:33
Location: Warminster, PA

Re: Identifying Corrupt Workbooks

Post by Jay Freedman »

The second error message can occur if a workbook is saved as .xlsx and the extension is later changed to .xls. That doesn't mean that the content of the file is necessarily damaged; changing the extension back to .xlsx may enable Excel to open it.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

Jay Freedman wrote:
25 Jun 2023, 23:29
The second error message can occur if a workbook is saved as .xlsx and the extension is later changed to .xls. That doesn't mean that the content of the file is necessarily damaged; changing the extension back to .xlsx may enable Excel to open it.
Hello Jay.
I agree that the second message identifies a discrepancy in "the naming of the file"(?) rather than in "the corruption of the file", and furthremore we have not defined "corrupt".
Since both popup messages indicated a hurdle in opening a file without user intervention (to the extent of choosing "OK" to dismiss the message) I thought that both messages indicated to the user (jstevens) a form of corruption to the user.

Back in my document-conversion days, anything that inhibited a run of 10,000 documents (such as happened when the WP converters of Win97 could not locate an image file) counted - to my mind - as a corruption, since at 6am the next morning I had NOT managed to convert 10,000 documents.

That too is why I asked jstevens
(1) How many workbook files do you have on your system?
(2) How many do you suspect (but do not yet know) are corrupt.

in order to get some idea of the magnitude of the problem.

If only one file in a folder-tree of 20 files is corrupt, then automation is a waste of time, but if (say) one in every ten files is corrupt, then a list of those files goes a long way to resolving the problem.

In the document conversion task, a client with a collection of WP51 documents with "failed images" would be asked if they were serious about "flawless conversion (and cleansing)", and if they were I might ask them to first clean up their image problem. (In a manner of speaking!)

Cheers, Chris
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

ChrisGreaves wrote:
26 Jun 2023, 06:30

That too is why I asked jstevens
(1) How many workbook files do you have on your system?
(2) How many do you suspect (but do not yet know) are corrupt.

in order to get some idea of the magnitude of the problem.
Chris,

(1) Workbook count is greater than 30,000
(2) Based on my sampling those that I opened manually failed to open encountering one of the two error messages. I was in the process of determining which files are corrupt based on the VBA code I provided.

None of the file extensions have been changed.
Regards,
John

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

jstevens wrote:
26 Jun 2023, 14:02
(2) Based on my sampling those that I opened manually failed to open encountering one of the two error messages. I was in the process of determining which files are corrupt based on the VBA code I provided.
Thank you John,
I understand that you are looking for a mechanical method of counting corrupt files, but based on your experience to this minute, what would be your best estimate of the proportion?
For example, if you have tried to open manually 100 workbooks and had problems with 10 of them, you would estimate (but of course, not guarantee) a failure rate of ten percent.
I am putting together a package for the first part of the system.
Cheers, Chris
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

Chris,

I have opened 100 random files over various folder-tree structures and my corrupt run rate is 100%.
Regards,
John

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

jstevens wrote:
26 Jun 2023, 18:42
(1) Workbook count is greater than 30,000
I have opened 100 random files over various folder-tree structures and my corrupt run rate is 100%.
Hello John, so right now your best estimate (and it is only an estimate) would be that 100% of your 30,000 workbooks are corrupt?
If so then we don't need a program to identify corrupt workbooks, do we?

(I will keep on working, though)
Cheers, Chris
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

Chris,

I have two other copies of the DMG file which may or may not contain the exact corrupt file structure. I will continue with my code and run it on all three DMG files. It will be automated so I don't plan on watching the process work its way through the folder-tree structure.

I"ll let you decide on your general-purpose utility. You have a better idea of the time it would take to put the finishing touches on the utility and an approximate processing duration since you did something like this before.
Regards,
John

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

Hi John.
A day of interruptions here!
The file at http://www.chrisgreaves.com/Downloads/Deploy.zip holds the MSWord side of the two-program system.

(1) Build yourself a file list OF MSWORD DOCUMENTS (as shown in "TesterFiles.txt", and store it on a memory key in drive D: or wherever.
(2) Open the Testr002.dot and change the value of Public Const strcFileList
(3) Still in Testr002.dot, use Tools, References and hook up the supplied library UW.dot.
(4) Save your installed version under a new name and run it, rather than the unmodified original.

"Sub Tester" should run until it hits a "corrupt" file (one that stalls on a pop-up message box of some kind)
In the full system, Excel will cut in, Kill MSWord and then restart the MSWord job.
But for today you will use Task Manager to kill MSWord, then reload your modified version of Testr.dot, and rerun "Sub Tester".

This second time, "Sub Tester" ought to pick up where it left off, skipping over the corrupt file until it stalls again.
and so on ...

Once the Excel application is plugged in, that Excel application will deliver to you a list of the corrupt documents.

Finally we will make the two applications open Excel files and report on those.

I discarded my original code ten years ago; that actually rebooted a dedicated machine; this rewrite allows me to continue using the laptop, browser, email etc, while the two MS applications slog it out.

You will find logging files in a folder similar to "C:\Users\Chris077\AppData\Roaming\Greaves\Testr"

More tomorrow.
Cheers, Chris
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

Chris,

I was able to run "Sub Tester" during the first pass and it encountered a corrupt Excel file (the first file in the list). I then used the Task Manager to kill MSWord, opened my copy of the file again and reran "Sub Tester". This time I encountered an error message. I'm using Office 2021.
EL_112.png

Your thoughts are appreciated.
You do not have the required permissions to view the files attached to this post.
Regards,
John

snb
4StarLounger
Posts: 575
Joined: 14 Nov 2012, 16:06

Re: Identifying Corrupt Workbooks

Post by snb »

I'd use:

Code: Select all

Sub M_snb()
   On Error Resume Next

   For Each it In CreateObject("scripting.filesystemobject").getfolder(Application.DefaultFilePath).Files
      x3 = IsError(GetObject(it).Name)
      If x3 Or Err.Number <> 0 Then c00 = c00 & vbLf & it
      If Not x3 Then GetObject(it).Close 0
      Err.Clear
   Next
   
   Msgbox c00, ,"Faulty Towers"
End Sub
Last edited by snb on 27 Jun 2023, 13:20, edited 1 time in total.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

jstevens wrote:
27 Jun 2023, 01:17
Your thoughts are appreciated.
Hi John, and thanks for your appreciation!

(1) Try snb's suggestion before going any further. My MSOffice programming "thinking and design" is based on Word2.0 Basic, then Word6.0 Basic, and then Word97/VBA. My solution for corrupt documents was based on work I began in 1996, and that I was using WinXP gives you a clue. My "reboot and continue" method possibly pre-dates the extensive knowledge and (mental) libraries available through the internet!

(2) When you get an Compile-time error or a Run-time error, choosing "OK" on the pop-up box will most times cause the VBE to display the offending line/statement, highlighting it in yellow. If you provide a screen-shot of that line as well as the error we can probably help you more.
The example below captures the error popup and the offending line (highlighted by VBE in blue),
Untitled.png
Cheers, Chris
You do not have the required permissions to view the files attached to this post.
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

Chris/snb,

I was able to utilize snb's code and tweaked it somewhat to meet my needs. I still receive the two error messages and have tried including a "Application.SendKeys "{ENTER}" bit of code but that didn't work.

Any suggestions?

Code: Select all

Sub M_jstevens()

Dim filePath As String

filePath = "C:\Users\Admin\Desktop"

    On Error Resume Next

    For Each it In CreateObject("scripting.filesystemobject").getfolder(filePath).Files

        x3 = IsError(GetObject(it).Name)
      
        If x3 Or Err.Number <> 0 And it.Name Like "*.xls*" Then    'Applied filter to .xls* files
            c00 = c00 & vbLf & it
        Else
            GetObject(it).Close 0
        End If
        
        Err.Clear
        
    Next
   
    MsgBox c00, , "Faulty Towers"

End Sub

I ran it on my Desktop where I had placed a few corrupt files and it worked.

I appreciate your time and efforts.
Regards,
John

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15621
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Identifying Corrupt Workbooks

Post by ChrisGreaves »

jstevens wrote:
27 Jun 2023, 19:17
I was able to utilize snb's code and tweaked it somewhat to meet my needs. I still receive the two error messages and have tried including a "Application.SendKeys "{ENTER}" bit of code but that didn't work. ... I appreciate your time and efforts.
John, I am glad that snb's code works for you. My brain was originally programmed in FORTRAN II, so my mind runs along "For I = 1,N" structures, little removed from machine language. Modern programmers, more recently trained than I, can adopt modern structures and produce better code.

Still and all I am puzzled. I thought that we had an estimate that 100% of your files are corrupt so that identification of corrupt workbooks is now completed?

Are you still looking for a way to deal with the different forms of corruption, that is, how to automatically classify the pop-up boxes that suspend operations waiting for user-intervention?

The Excel module is parked here, so let me know if you want to explore that further once this original problem is completed.

Cheers, Chris
There's nothing heavier than an empty water bottle

jstevens
GoldLounger
Posts: 2628
Joined: 26 Jan 2010, 16:31
Location: Southern California

Re: Identifying Corrupt Workbooks

Post by jstevens »

ChrisGreaves wrote:
28 Jun 2023, 11:07
Still and all I am puzzled. I thought that we had an estimate that 100% of your files are corrupt so that identification of corrupt workbooks is now completed?
Chris, I sampled 100 random excel files of which 100% of them are corrupt. Could one extrapolate that > 30,000 Excel files are corrupt? Sure but won't know for sure unless all the files are tested.

My question revolves around the suspended operation waiting for user-intervention. Is it possible to automate the clicking on the "Enter" key? Regardless of which two messages pop-up theyrequires a response by pressing "Enter".
Regards,
John