Calculate letters weight
-
- 5StarLounger
- Posts: 742
- Joined: 27 Jun 2021, 10:46
Re: Calculate letters weight
That's a different issue, ands is down to the fact that the function prototypes effectively differ between Application.aFunction and WorksheetFunction.aFunction. In summary, the WorksheetFunction functions are generally more strongly typed., but this isn't properly shown by Intellisense (which doesn't even work for Application.aFunctions), nor is it documented in the Excel VBA documentation (but then, neither is the ability to process arrays the way this particular example is leveraging).
Specifically, as far as I am aware, for WorksheetFunction.aFunction any parameter described as a _value (e.g. Lookup_Value) needs to be a single value of a type that can be held in an Excel cell. It cannot be an array.
Specifically, as far as I am aware, for WorksheetFunction.aFunction any parameter described as a _value (e.g. Lookup_Value) needs to be a single value of a type that can be held in an Excel cell. It cannot be an array.
-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
Appliction.x v Application.WorksheetFunction.x
Hi
I had another Google around on this, and as usual got flooded with information telling me words to the effect of … …its all to do with how errors are handled… but I did also find a few other bits of info I probably missed previously.
Based on that, and what I have picked up here, I think the following might be coming closer to the full story.
Once upon a time, not quite sure when , but before 1997, Excel got made. At that time the priority was on the table like organisation of lists and calculations on them, the so called “spreadsheet”.
( I personally took no interests, I was a real physicist and only used a computer if I had to, when trying to do clever stuff with Mathcad, on my colleagues computer that cost more than my House did back then. )
Things like Match and Index were quite important stuff for Excel spreadsheets, and some low level or background coding had these things in some form or other maybe not working too different from what we now know as VBA
Two important things to mention here:
_1 It was useful at the time for the people developing Excel if the coding behind things like Match and Index gave some info about why, if something did not work properly as intended. Hence they somehow return an Error thing, giving information, rather than actually errorong.
_ 2 Whether by design or accident, these things worked in the way I like to use them, (with arrays as arguments where more usually single values are used). This may have helped with the development of array type calculations in the spreadsheet. Once again, I don’t know if they were planned from the outset, or whether it was noticed that it could be done, possibly requiring a bit of later extra stuff to be added, what we know as the CSE stuff.
In the early days, the emphasis was on Excel Spreadsheet, but as time went on the coding we now know as VBA got made available to us and got more popular. For us to use the Excel things like Match and Index from within VBA, we called them as properties from the Application object, hence
Application.x
Around 1997, it was decided to tidy things up a bit in VBA. The WorksheetFunction object was introduced. What it has available was added to intellisense, (Application.x, where x is a worksheet function is not in intellisense.)
Taking as an example, the Application.WorksheetFunction.Match , it was given some coding behind it looking like this pseudo coding
Application.WorksheetFunction.Match =
<{[
_ check the arguments. Raise an appropriate error if wrong type are given. ( Hence the array stuff I like don’t work anymore)
_ Use Application.Match to do it
_ Have a look at what comes back and if it is the error thing, then raise an appropriate error
]}>
I expect at the time, and still now, the policy from Microsoft is that a “normal” or average user should be spared more advanced stuff, and so the WorksheetFunction version fits that policy better.
Application.x was kept for backward compatibility, it was likely intended to be made obsolete at some time. ( That may contribute to why its not documented. ) Some people, including some prominent clever people , rather liked and preferred the Application.x way it deals with errors. That may and might influence it staying around a bit longer.
The second advantage of the array stuff I like working with, was maybe just coincidentally not noticed so much. Personally I think it’s at least as important. But just a personal opinion, that’s all. Please don’t hate me for it.
I think the usual explanation when asked about Appliction.x v Application.WorksheetFunction.x , of …its all to do with how errors are handled… cropped in because someone clever regarded as a God by other smart people said it once and the other smart people took it as Holy
(Thankfully I am not smart, I don’t worship the same Gods, so I occasionally get the better or fuller answer, (mostly with a lot of help from the smart people), which is all I am interested in).
I think my explanation is at least closer than any other I have seen so far…
Alan
Ref
https://www.pcreview.co.uk/threads/work ... st-2861101
https://www.pcreview.co.uk/threads/prob ... st-2734072
https://www.pcreview.co.uk/threads/appl ... st-9344487
Thanks, this is all helping me to fill the puzzle....SpeakEasy wrote: ↑27 Jul 2022, 14:51.... the WorksheetFunction functions are generally more strongly typed., but this isn't properly shown by Intellisense (which doesn't even work for Application.aFunctions), nor is it documented in the Excel VBA documentation (but then, neither is the ability to process arrays the way this particular example is leveraging)....as far as I am aware, for WorksheetFunction.aFunction any parameter described as a _value (e.g. Lookup_Value) needs to be a single value of a type that can be held in an Excel cell. It cannot be an array.
I had another Google around on this, and as usual got flooded with information telling me words to the effect of … …its all to do with how errors are handled… but I did also find a few other bits of info I probably missed previously.
Based on that, and what I have picked up here, I think the following might be coming closer to the full story.
Appliction.x v Application.WorksheetFunction.x
Once upon a time, not quite sure when , but before 1997, Excel got made. At that time the priority was on the table like organisation of lists and calculations on them, the so called “spreadsheet”.
( I personally took no interests, I was a real physicist and only used a computer if I had to, when trying to do clever stuff with Mathcad, on my colleagues computer that cost more than my House did back then. )
Things like Match and Index were quite important stuff for Excel spreadsheets, and some low level or background coding had these things in some form or other maybe not working too different from what we now know as VBA
Two important things to mention here:
_1 It was useful at the time for the people developing Excel if the coding behind things like Match and Index gave some info about why, if something did not work properly as intended. Hence they somehow return an Error thing, giving information, rather than actually errorong.
_ 2 Whether by design or accident, these things worked in the way I like to use them, (with arrays as arguments where more usually single values are used). This may have helped with the development of array type calculations in the spreadsheet. Once again, I don’t know if they were planned from the outset, or whether it was noticed that it could be done, possibly requiring a bit of later extra stuff to be added, what we know as the CSE stuff.
In the early days, the emphasis was on Excel Spreadsheet, but as time went on the coding we now know as VBA got made available to us and got more popular. For us to use the Excel things like Match and Index from within VBA, we called them as properties from the Application object, hence
Application.x
Around 1997, it was decided to tidy things up a bit in VBA. The WorksheetFunction object was introduced. What it has available was added to intellisense, (Application.x, where x is a worksheet function is not in intellisense.)
Taking as an example, the Application.WorksheetFunction.Match , it was given some coding behind it looking like this pseudo coding
Application.WorksheetFunction.Match =
<{[
_ check the arguments. Raise an appropriate error if wrong type are given. ( Hence the array stuff I like don’t work anymore)
_ Use Application.Match to do it
_ Have a look at what comes back and if it is the error thing, then raise an appropriate error
]}>
I expect at the time, and still now, the policy from Microsoft is that a “normal” or average user should be spared more advanced stuff, and so the WorksheetFunction version fits that policy better.
Application.x was kept for backward compatibility, it was likely intended to be made obsolete at some time. ( That may contribute to why its not documented. ) Some people, including some prominent clever people , rather liked and preferred the Application.x way it deals with errors. That may and might influence it staying around a bit longer.
The second advantage of the array stuff I like working with, was maybe just coincidentally not noticed so much. Personally I think it’s at least as important. But just a personal opinion, that’s all. Please don’t hate me for it.
I think the usual explanation when asked about Appliction.x v Application.WorksheetFunction.x , of …its all to do with how errors are handled… cropped in because someone clever regarded as a God by other smart people said it once and the other smart people took it as Holy
(Thankfully I am not smart, I don’t worship the same Gods, so I occasionally get the better or fuller answer, (mostly with a lot of help from the smart people), which is all I am interested in).
I think my explanation is at least closer than any other I have seen so far…
Alan
Ref
https://www.pcreview.co.uk/threads/work ... st-2861101
https://www.pcreview.co.uk/threads/prob ... st-2734072
https://www.pcreview.co.uk/threads/appl ... st-9344487
Last edited by DocAElstein on 01 Aug 2022, 09:55, edited 1 time in total.
Regards , Ālan , DocÆlstein
, 


-
- Administrator
- Posts: 7338
- Joined: 15 Jan 2010, 22:52
- Location: Middle of England
Re: Appliction.x v Application.WorksheetFunction.x
To be honest, I haven't the time to read through your post fully, but do you really mean "Appliction.x"?DocAElstein wrote: ↑30 Jul 2022, 09:01Appliction.x v Application.WorksheetFunction.x
...
I think the usual explanation when asked about Appliction.x v Application.WorksheetFunction.x , of …
Leif
-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
Re: Appliction.x v Application.WorksheetFunction.x
ah, typo I missed , thanks
should be
Application.x v Application.WorksheetFunction.x
(the problem with spelling checker - it don't work if you add something like .x)
should be
Application.x v Application.WorksheetFunction.x
(the problem with spelling checker - it don't work if you add something like .x)
Regards , Ālan , DocÆlstein
, 


-
- 5StarLounger
- Posts: 742
- Joined: 27 Jun 2021, 10:46
Re: Calculate letters weight
>So maybe that is telling me that Alan is represents internally by such an array
Yep, pretty much. Internally VB represent strings as Unicode 16 - which for all the ANSI characters is two bytes: the 8-bit ANSI code and a 0. But that Unicode string is just a memory buffer so we can hack it. Which is what StrConv does! e.g
Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode)
converts the underlying buffer from Unicode 16 to ANSI (which it does by simply stripping out the 0 bytes. VB of course then no longer knows what to do with that string, as it is expecting Unicode 16
So if you do
Dim arry() As Byte
Let arry() = StrConv("Alan", vbFromUnicode) ' coerce string to array of (ANSI) bytes
MsgBox MsgBox arry
you'll get VB's "No idea how to display these characters as I don't even know what they are" (sometimes you might get a character, if the unicode byte pattern coincidentally matches a character in the current code page)
Yep, pretty much. Internally VB represent strings as Unicode 16 - which for all the ANSI characters is two bytes: the 8-bit ANSI code and a 0. But that Unicode string is just a memory buffer so we can hack it. Which is what StrConv does! e.g
Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode)
converts the underlying buffer from Unicode 16 to ANSI (which it does by simply stripping out the 0 bytes. VB of course then no longer knows what to do with that string, as it is expecting Unicode 16
So if you do
Dim arry() As Byte
Let arry() = StrConv("Alan", vbFromUnicode) ' coerce string to array of (ANSI) bytes
MsgBox MsgBox arry
you'll get VB's "No idea how to display these characters as I don't even know what they are" (sometimes you might get a character, if the unicode byte pattern coincidentally matches a character in the current code page)
-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
Re: Calculate letters weight
This is all making, ( a bit), more sense,.
But I wonder if this
Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode)
Is not really coercing. … or maybe it is but just a bit ****…
So StrConv("Alan", vbFromUnicode) has changed what is internally representing Alan to what internally represents a field of things of Byte type. Or what we would call a array of Byte elements. That is assigned as syntaxly correctly to a Byte type array, so all is well, …and also I was getting somethings mixed up here:
Dim Splitalan() As String: Let Splitalan() = Split("alan", "a")
However, its apparently not quite that simple. In the case of the split, I could also do this, and I will get the same array
Dim VSplitalan As Variant: Let VSplitalan = Split("alan", "a")
But if I try that with this
Dim VarrAlan As Variant: Let VarrAlan = StrConv("Alan", vbFromUnicode)
then I get returned ?? , which is also incidentally what I get from
MsgBox prompt:=arrAlan()
**** So maybe there is some coercing going on with Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode) -
_._____________________________________________________________________________________________________________________
I am still not sure what to make of this….
The array Harry() looks like this
65 0 108 0 97 0 110 0 38 32
I am not sure what that means, although I did not expect it to look like this
65 0 108 0 97 0 110 0 8230 0
because 8230 is too big for a Byte element. I am not sure what the significance or insignificance is of all that.
_.____________________________________
Edit Just testing here later - ignore this
WUNICODE AASI (Win CP 1252)
Œ 338 152 140
œ 339 153 156
Š 352 160 138
š 353 161 154
Ÿ 376 178 199
Ž 381 17D 142
ž 382 17E 158
ƒ 402 192 131
ˆ 710 2C6 136
˜ 732 2DC 152
… 8230 2026 133
But I wonder if this
Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode)
Is not really coercing. … or maybe it is but just a bit ****…
So StrConv("Alan", vbFromUnicode) has changed what is internally representing Alan to what internally represents a field of things of Byte type. Or what we would call a array of Byte elements. That is assigned as syntaxly correctly to a Byte type array, so all is well, …and also I was getting somethings mixed up here:
- Of course it only works if we declare the array as Byte type, because as I just said, StrConv("Alan", vbFromUnicode) is chucking out a field of elements of Byte type. Just like Split(“alan”,”a”) returns me an array of string types so I must do thisDocAElstein wrote: ↑27 Jul 2022, 11:12....
_.... This only works if the array elements type is Byte
Declare the type as Long or Double, for example, and it won’t work.....
Dim Splitalan() As String: Let Splitalan() = Split("alan", "a")
However, its apparently not quite that simple. In the case of the split, I could also do this, and I will get the same array
Dim VSplitalan As Variant: Let VSplitalan = Split("alan", "a")
But if I try that with this
Dim VarrAlan As Variant: Let VarrAlan = StrConv("Alan", vbFromUnicode)
then I get returned ?? , which is also incidentally what I get from
MsgBox prompt:=arrAlan()
**** So maybe there is some coercing going on with Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode) -
– well it does in the situation of you trying to assign it to a byte array, - it has maybe been written to try and put those Bytes in a byte array
Code: Select all
Sub Stuff()
Dim Splitalan() As String: Let Splitalan() = Split("alan", "a") ' Watch : + : Splitalan() : : String(0 to 2) : Module1.Stuff {"", "l", "n"}
Dim VSplitalan As Variant: Let VSplitalan = Split("alan", "a") ' Watch : + : VSplitalan : : Variant/String(0 to 2) : Module1.Stuff {"", "l", "n"}
Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode) ' Watch : - : arrAlan() : : Byte(0 to 3) : Module1.Stuff {65, 108, 97, 110}
Dim VarrAlan As Variant: Let VarrAlan = StrConv("Alan", vbFromUnicode) ' ??
MsgBox prompt:=arrAlan() ' ??
Stop
End Sub
_._____________________________________________________________________________________________________________________
I am still not sure what to make of this….
I am not sure what is going on here, either…DocAElstein wrote: ↑27 Jul 2022, 11:12.......
This limitation to a Byte array might be inconvenient to someone wanting to play around with the AscW stuff?
Or I could be missing something more fundamental
Code: Select all
Sub ChrWAscW()
Dim Harry() As Byte
Let Harry() = "Alan" & ChrW(8230) ' … ChrW(8230) is a single character that looks like 3 small dots close together ( You can do some naughty tricks with it in some forums and mess things up a bit... )
Stop
End Sub
65 0 108 0 97 0 110 0 38 32
I am not sure what that means, although I did not expect it to look like this
65 0 108 0 97 0 110 0 8230 0
because 8230 is too big for a Byte element. I am not sure what the significance or insignificance is of all that.
_.____________________________________
Edit Just testing here later - ignore this
WUNICODE AASI (Win CP 1252)
Œ 338 152 140
œ 339 153 156
Š 352 160 138
š 353 161 154
Ÿ 376 178 199
Ž 381 17D 142
ž 382 17E 158
ƒ 402 192 131
ˆ 710 2C6 136
˜ 732 2DC 152
… 8230 2026 133
Code: Select all
WUNICODE AASI (Win CP 1252)
Œ 338 152 140
œ 339 153 156
Š 352 160 138
š 353 161 154
Ÿ 376 178 199
Ž 381 17D 142
ž 382 17E 158
ƒ 402 192 131
ˆ 710 2C6 136
˜ 732 2DC 152
… 8230 2026 133
Last edited by DocAElstein on 10 Feb 2025, 11:03, edited 3 times in total.
Regards , Ālan , DocÆlstein
, 


-
- Administrator
- Posts: 80088
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Calculate letters weight
Unicode characters take up 2 bytes: the so-called low-end byte and high-end byte. The total value is 256 * high-end + low-end.
Windows is low-endian: it writes the low-end byte first, then the high-end one.
The letter A has code 65. This can be written as 256 * 0 + 65, so the high-end byte is 0 and the low-end one is 65. Windows writes this as
65 0
The Unicode character with code 8230 can be written as 256 * 32 + 38
The high-end bye is 8230 \ 256 = 32
The low-end byte is 8230 Mod 256 = 38
So Windows writes it as
38 32
Windows is low-endian: it writes the low-end byte first, then the high-end one.
The letter A has code 65. This can be written as 256 * 0 + 65, so the high-end byte is 0 and the low-end one is 65. Windows writes this as
65 0
The Unicode character with code 8230 can be written as 256 * 32 + 38
The high-end bye is 8230 \ 256 = 32
The low-end byte is 8230 Mod 256 = 38
So Windows writes it as
38 32
Best wishes,
Hans
Hans
-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
Re: Calculate letters weight
Ah, great, thanks Hans. That’s low level stuff out of my depths but it is good to know, as I was otherwise thinking that the numbers 32 and 38 were just something erroneous chucked out when it got confused.
( That information could be useful in the future, and is relevant to this Thread and the original requirement since we are talking about manipulating the characters in a string, and it would not be unheard of for the OP to want to extend the solution to a much larger character set outside the simple Standard 0-255 ones. ( For my solution it probably will mean that the first code line I had, ( the extra one I had to use because it would not coerce inside the main long one liner code line ) , would itself mutate into a mega beautiful one liner code line. Maybe when I am in the mood and have the time I might add that…. )
_.________________________________________________________________
Just something quick, while I am here, which I missed out, when we were on the other tangent we went into .. - VBA strings looking like numbers, or not…
One example I know about where a bit of care is needed when , if like me, you like to store your numbers sometimes in string variables is. ….._
_.... You need to be careful if wanting to reference a worksheet by its item number rather than its name, because the code line can be the same for either referring to item number or string tab name, and VBA only knows whether you want to refer to its item number or its tab name by virtue of the type of thing or variable type you give it.
( That information could be useful in the future, and is relevant to this Thread and the original requirement since we are talking about manipulating the characters in a string, and it would not be unheard of for the OP to want to extend the solution to a much larger character set outside the simple Standard 0-255 ones. ( For my solution it probably will mean that the first code line I had, ( the extra one I had to use because it would not coerce inside the main long one liner code line ) , would itself mutate into a mega beautiful one liner code line. Maybe when I am in the mood and have the time I might add that…. )
_.________________________________________________________________
Just something quick, while I am here, which I missed out, when we were on the other tangent we went into .. - VBA strings looking like numbers, or not…
One example I know about where a bit of care is needed when , if like me, you like to store your numbers sometimes in string variables is. ….._
_.... You need to be careful if wanting to reference a worksheet by its item number rather than its name, because the code line can be the same for either referring to item number or string tab name, and VBA only knows whether you want to refer to its item number or its tab name by virtue of the type of thing or variable type you give it.
Code: Select all
Sub WatchOutYaWorksheetNames() ' https://eileenslounge.com/viewtopic.php?p=297502#p297502
Let ThisWorkbook.Worksheets.Item(1).Name = "Sheet1" ' Just for demo purposes - any typical string name will do for this demo
On Error GoTo TellMeAboutIt ' Tell me about any error, rather than erroring
' Here we go, referring to worksheets in different ways .......
ThisWorkbook.Worksheets.Item(1).Activate ' All is well - refering to first tab counting from the left
ThisWorkbook.Worksheets(1).Activate ' Same again - (Item is often the default property)
Dim Wun As Long: Let Wun = "1" ' VBA is happy to coerce the string "1" into a number, so Wun is 1 which is a number
ThisWorkbook.Worksheets.Item(Wun).Activate ' All is well - still refering to first tab counting from the left
Dim strWun As String: Let strWun = Wun ' VBA is happy to coerce the number 1 which is in Long variable Wun, into a string of value "1" So strWun is a string of "1"
ThisWorkbook.Worksheets.Item(strWun).Activate ' oops!, unless you have your first tab name of "1" , this will error
Let ThisWorkbook.Worksheets.Item(1).Name = 1 ' VBA is happy to coerce the number 1 into a the string tab name of "1" So now you will see your first tab name in your workbook shown as 1 rather than Sheet1 ( But it is still a text - a string of "1" , because the thing shown on the tab is always the string name, not the item nunber )
ThisWorkbook.Worksheets.Item(strWun).Activate ' this will not error anymore because the previous line has made the first tab string name "1"
Exit Sub ' Main coding end
' ________________________________________________________________________________
' Error handling code section
TellMeAboutIt:
MsgBox prompt:="Error number: " & Err.Number & vbCr & vbLf & Err.Description
Resume Next
End Sub ' ( You should never come here )
Regards , Ālan , DocÆlstein
, 


-
- Administrator
- Posts: 12949
- Joined: 16 Jan 2010, 15:49
- Location: London, Europe
Re: Calculate letters weight
This is NOT coercion, it is converting. The difference is that the programmer has explicitly identified the data types and the conversion that is needed.DocAElstein wrote: ↑30 Jul 2022, 22:49I wonder if this
Dim arrAlan() As Byte: Let arrAlan() = StrConv("Alan", vbFromUnicode)
Is not really coercing. … or maybe it is but just a bit ****…
StuartR
-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
Re: Calculate letters weight
Good to have that extra input. I have been going backward and forward in my head with whether I think it’s coercing or not. (Might depend a bit I suppose on what coercing is, perhaps it’s not clearly defined).
I expect my brain will oscillate a bit more on this one.
There are some good enlightenments coming out here, at least for me.
All useful interesting stuff
( I once thought coercing is just doing something that puts us on a different dependency route in the sequential chain of events. Like diverting a train to a route where things are done a bit differently. I have no idea what that means. I just made it up.
Sometimes crazy thoughts steer me in the right direction. Mostly they don’t. )
( I think I mentioned in the scuttlebutt a few times, I recently discovered an old forgotten railway siding still connected with a working junction to a main line. I am seriously trying to organise diverting a slow moving Goods train along it one evening. A crazy idea, but it might reveal some interesting thoughts****. Didn’t Einstein think of something important when sleeping on a train? )
**** Edit: although could be on the wrong track there
I expect my brain will oscillate a bit more on this one.
There are some good enlightenments coming out here, at least for me.
All useful interesting stuff
( I once thought coercing is just doing something that puts us on a different dependency route in the sequential chain of events. Like diverting a train to a route where things are done a bit differently. I have no idea what that means. I just made it up.
Sometimes crazy thoughts steer me in the right direction. Mostly they don’t. )
( I think I mentioned in the scuttlebutt a few times, I recently discovered an old forgotten railway siding still connected with a working junction to a main line. I am seriously trying to organise diverting a slow moving Goods train along it one evening. A crazy idea, but it might reveal some interesting thoughts****. Didn’t Einstein think of something important when sleeping on a train? )
**** Edit: although could be on the wrong track there

Last edited by DocAElstein on 31 Jul 2022, 14:23, edited 1 time in total.
Regards , Ālan , DocÆlstein
, 


-
- 5StarLounger
- Posts: 742
- Joined: 27 Jun 2021, 10:46
Re: Calculate letters weight
>Unicode characters take up 2 bytes
Not so. The Unicode 16 used by Windows (and in which VBA strings are stored behind the scenes) uses 2 bytes only for code points U+0000 to U+D7FF and U+E000 to U+FFFF, and stores them as direct correlating values - i.e codepoint 0041 - "A" - is stored as 0041 or, more accurately in VB which is Unicode 16 LE (little endian) as 4100
Codepoints U+010000 to U+10FFFF however have a mathematical transform applied to them and are represented by two 16bit words with no direct numerical correlation between codepoint and the stored values, e.g U+10000 is not stored as 00010000 (or in LE-speak as 00000100) but as D800DC00
However VB, albeit having Unicode 16 behind the scenes, only understands ANSI characters, so (incorrectly) interprets D800DC00 as ØÜ rather than the correct LINEAR B SYLLABLE B008 A character https://decodeunicode.org/en/u+10000, treating anything it finds in the underlying buffer as two-byte characters.
Given that we can 'hack' the underlying buffer directly with ChrW, ChrB , havoc can ensue. But in general use and for most VBA programmers, we don't have to worry about it;' VBA does the heavy lifting for us. But you deserve everything you get if you try an lift VBA's skirts to see what is underneath
Not so. The Unicode 16 used by Windows (and in which VBA strings are stored behind the scenes) uses 2 bytes only for code points U+0000 to U+D7FF and U+E000 to U+FFFF, and stores them as direct correlating values - i.e codepoint 0041 - "A" - is stored as 0041 or, more accurately in VB which is Unicode 16 LE (little endian) as 4100
Codepoints U+010000 to U+10FFFF however have a mathematical transform applied to them and are represented by two 16bit words with no direct numerical correlation between codepoint and the stored values, e.g U+10000 is not stored as 00010000 (or in LE-speak as 00000100) but as D800DC00
However VB, albeit having Unicode 16 behind the scenes, only understands ANSI characters, so (incorrectly) interprets D800DC00 as ØÜ rather than the correct LINEAR B SYLLABLE B008 A character https://decodeunicode.org/en/u+10000, treating anything it finds in the underlying buffer as two-byte characters.
Given that we can 'hack' the underlying buffer directly with ChrW, ChrB , havoc can ensue. But in general use and for most VBA programmers, we don't have to worry about it;' VBA does the heavy lifting for us. But you deserve everything you get if you try an lift VBA's skirts to see what is underneath
-
- Administrator
- Posts: 80088
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
-
- 5StarLounger
- Posts: 742
- Joined: 27 Jun 2021, 10:46
Re: Calculate letters weight
>This is NOT coercion, it is converting. The difference is that the programmer has explicitly identified the data types and the conversion that is needed.
My computer science course taught different. It taught that if the programmer hasn't explicitly told the program (or compiler) to do the conversion then it is coercion. Identifying the data type elsewhere doesn't, for me, meet that requirement. Mind you my degree was over 35 years ago, so maybe I'm out of date!
So, for me:
Dim i as integer
dim l as long
l=500
i = clng(i) ' explicit type conversion, I've told VBA what to do so: not coercion
l = i ' implicit type conversion, I've not told VBA what to do, so: coercion
But in reality the point is moot. Coercion is really just a jargon term for one specific ... uh ... type of type conversion
My computer science course taught different. It taught that if the programmer hasn't explicitly told the program (or compiler) to do the conversion then it is coercion. Identifying the data type elsewhere doesn't, for me, meet that requirement. Mind you my degree was over 35 years ago, so maybe I'm out of date!
So, for me:
Dim i as integer
dim l as long
l=500
i = clng(i) ' explicit type conversion, I've told VBA what to do so: not coercion
l = i ' implicit type conversion, I've not told VBA what to do, so: coercion
But in reality the point is moot. Coercion is really just a jargon term for one specific ... uh ... type of type conversion
-
- Administrator
- Posts: 12949
- Joined: 16 Jan 2010, 15:49
- Location: London, Europe
Re: Calculate letters weight
My studies are about as old as yours SpeakEasy, so I'm certainly not going to argue the point
StuartR
-
- 5StarLounger
- Posts: 826
- Joined: 24 Jan 2010, 15:56
Re: Calculate letters weight
8209 means you have an array of type Byte. 8192 for the vbArray plus 17 for the Byte.DocAElstein wrote: ↑27 Jul 2022, 11:11
_ also, some documentation mislead me a bit: I have not used VarType very much for anything other than the most common simple variable, and then when I used that once on an array, it gave me the number 8209 which means you have an array. VarType does not tell you what variable types are in the array;
Regards,
Rory
Rory
-
- 5StarLounger
- Posts: 742
- Joined: 27 Jun 2021, 10:46
Re: Calculate letters weight
> VarType does not tell you what variable types are in the array
It sure does. As rory says.
All documented here:
https://docs.microsoft.com/en-us/office ... e-function
It sure does. As rory says.
All documented here:
https://docs.microsoft.com/en-us/office ... e-function
-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
VarType array is 8192+value of elements type
Thanks Rory, thanks SpeakEasy.
Indeed it is documented at the link from SpeakEasy
( https://docs.microsoft.com/en-us/office/vba/language/reference/user-interface-help/vartype-function )
vbArray 8192 Array (always added to another constant when returned by this function)
_...... The VarType function never returns the value for vbArray by itself. It's always added to some other value to indicate an array of a particular type. For example, the value returned for an array of integers is calculated as vbInteger + vbArray, or 8194.
I always hit this link
( https://docs.microsoft.com/en-us/office/vba/language/concepts/getting-started/vartype-constants )
, which is totally lacking all that information and simply says
vbArray 8192 Array
Perhaps a more thorough search, or more careful with a better search string, may have found that info, but I doubt it: The internet is getting so swamped you need to know the answer to have half a chance of finding it. :(
Thanks again for setting me straight.
Indeed it is documented at the link from SpeakEasy
( https://docs.microsoft.com/en-us/office/vba/language/reference/user-interface-help/vartype-function )
vbArray 8192 Array (always added to another constant when returned by this function)
_...... The VarType function never returns the value for vbArray by itself. It's always added to some other value to indicate an array of a particular type. For example, the value returned for an array of integers is calculated as vbInteger + vbArray, or 8194.
I always hit this link
( https://docs.microsoft.com/en-us/office/vba/language/concepts/getting-started/vartype-constants )
, which is totally lacking all that information and simply says
vbArray 8192 Array
Perhaps a more thorough search, or more careful with a better search string, may have found that info, but I doubt it: The internet is getting so swamped you need to know the answer to have half a chance of finding it. :(
Thanks again for setting me straight.
Regards , Ālan , DocÆlstein
, 


-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
(Calculate letters weight) ANSI/Ascii vs Unicode Issues
Hi, ....... :xsmile: ... maybe later...
I am trying to get my head around something in a more recent thread, ( https://eileenslounge.com/viewtopic.php?f=30&t=41659 ) , I am getting there…. slowly
I have been back to this Thread here, from July/Aug 2022, many times over the last few days, as there is a lot of useful and relevant stuff concerning VB / VBA string stuff and Unicode etc.. which I am tryibg to understand…
_._______
I am unsure of a few things, and would like to ask for some clarity, if I may, :)
For examples,
_1 related to this discussion point….
I was a bit nervous of perusing things, but as things are looking, it seems to be my destiny to suffer the consequences of crawling down rabbit holes or lifting skirts and other things I maybe shouldn’t…
…Now, following a lot of research, I am thinking that in the common ways of doing Unicode, (talking whole world Unicode and not just Microsoft ) (en)coding, ( As in Unicode encoding = UTF-8, UTF-16, and UTF-32 ) the main distinguishing characteristic between them is the minimum number of Bytes that would be used for a character, ( where a Byte is a group of 8 of some bits inside a computer’s innards that can be switched between two states ).
Briefly summarised:-
UTF-8 is variable 1 to 4 bytes. – This can be efficient to use for more simple text, (and this has had a bit of a resurgence in recent years, due to some simple text stuff associated with the SmartPhone short message & co.)
UTF-16 is variable 2 or 4 bytes, - but mostly 2, and mostly what Microsoft use***.
UTF-32 is fixed 4 bytes.
(Microsoft tend to use Wide when they talk about the Unicode (en)coding they use, to distinguish it from ASCII/ANSI stuff. Perhaps it is a bit silly and imprecise really, as they sometimes use the single Byte version of UTF-8.)
The question _1
***Is it perhaps that Microsoft, in Windows, or in Excel, or in VB / VBA, or in something, limit themselves to 2 Bytes in UTF-16. I notice, for example, that Chr(x) won’t take x > 65535
If Microsoft ( , in Windows, or in Excel, or in VB / VBA, or in something ) want Unicode character numbers higher than that, they use a pair of UTF-16 in some way? Is that like a pair of 2 Bytes, rather than 4 Bytes which, (maybe?) is not the same thing? Is that the bit Mike (SpeakEasy) said, that I have not quite understood yet….< U+010000 to U+10FFFF however have a mathematical transform applied to them and are represented by two 16bit words with no direct numerical correlation between codepoint and the stored values >
_.________________________
_2a
Should that not be … Unicode A is hexadecimal(65) = 41, written in the Unicode 2 byte low-endian as 41 00
( I am very confused as Steven Roman says it should be 41 00 , but he never considers examples outside the ANSI/Ascii number range when talking Unicode stuff. That is frustrating.
( I do realise that unicode also encompases the ANSI/Ascii number range, but it would have been helpfull if he had included some much higher placed characters in his examples ) )
_2b
Now, the hexadecimal of 8230 is 2028. So would the Unicode … in the 2 byte low-endian not be 28 20 ?
_.______________
_3 ChrW(x) only works for x up to 65535. Is this
_(i) somehow related to the stuff I don’t yet understand yet, – ( I note that the last 4 digit Hex is FFFF for dec 65535 )
or
_(ii) is it one of those things related to things having been made originally to work on the older .xls files with max things/rows 65,536 ( 0, 1, 2 …… 65535 is = 65536 things)
or
_(iii) is it perhaps some combination of (i) and (ii) ? – Did Microsoft mess with their Unicode encoding innards around 2007 to allow them to go above the 2 Byte hex maximum of FF FF, and have the insane number of rows in Excel 2007 +
_._______________
_4 Max ANSI( and maybe some sort of extended Ascii sometimes ) is decimal 255, which is the last 2 digit hexadecimal FF
Does hexadecimal come about, or was invented, because it somehow in computing makes things nicely ordered – For example, 2 Byte UTF-16 is 2 lots of 8 Bits, making 2 Bytes and that look pretty if its limited to a fixed 4 columns?
Thanks
Alan
Ref (examples)
_a https://stackoverflow.com/questions/496 ... and-utf-32
_b web archive "Win32 API Programming with Visual Basic", Chap 6 VB Strings, Steven Roman
_c Translation of _b to English (not finished yet …… )
I am trying to get my head around something in a more recent thread, ( https://eileenslounge.com/viewtopic.php?f=30&t=41659 ) , I am getting there…. slowly
I have been back to this Thread here, from July/Aug 2022, many times over the last few days, as there is a lot of useful and relevant stuff concerning VB / VBA string stuff and Unicode etc.. which I am tryibg to understand…
_._______
I am unsure of a few things, and would like to ask for some clarity, if I may, :)
For examples,
_1 related to this discussion point….
………………SpeakEasy wrote: ↑31 Jul 2022, 14:03>Unicode characters ......
....... < and lots of other stuff I am still trying to understand >
......
we can 'hack' the underlying buffer directly with ChrW, ChrB , havoc can ensue........ you deserve everything you get if you try an lift VBA's skirts to see what is underneath
I was a bit nervous of perusing things, but as things are looking, it seems to be my destiny to suffer the consequences of crawling down rabbit holes or lifting skirts and other things I maybe shouldn’t…
…Now, following a lot of research, I am thinking that in the common ways of doing Unicode, (talking whole world Unicode and not just Microsoft ) (en)coding, ( As in Unicode encoding = UTF-8, UTF-16, and UTF-32 ) the main distinguishing characteristic between them is the minimum number of Bytes that would be used for a character, ( where a Byte is a group of 8 of some bits inside a computer’s innards that can be switched between two states ).
Briefly summarised:-
UTF-8 is variable 1 to 4 bytes. – This can be efficient to use for more simple text, (and this has had a bit of a resurgence in recent years, due to some simple text stuff associated with the SmartPhone short message & co.)
UTF-16 is variable 2 or 4 bytes, - but mostly 2, and mostly what Microsoft use***.
UTF-32 is fixed 4 bytes.
(Microsoft tend to use Wide when they talk about the Unicode (en)coding they use, to distinguish it from ASCII/ANSI stuff. Perhaps it is a bit silly and imprecise really, as they sometimes use the single Byte version of UTF-8.)
The question _1
***Is it perhaps that Microsoft, in Windows, or in Excel, or in VB / VBA, or in something, limit themselves to 2 Bytes in UTF-16. I notice, for example, that Chr(x) won’t take x > 65535
If Microsoft ( , in Windows, or in Excel, or in VB / VBA, or in something ) want Unicode character numbers higher than that, they use a pair of UTF-16 in some way? Is that like a pair of 2 Bytes, rather than 4 Bytes which, (maybe?) is not the same thing? Is that the bit Mike (SpeakEasy) said, that I have not quite understood yet….< U+010000 to U+10FFFF however have a mathematical transform applied to them and are represented by two 16bit words with no direct numerical correlation between codepoint and the stored values >
_.________________________
_2a
I am getting mixed up with decimals and hexadecimals, but I think I am almost there…HansV wrote: ↑31 Jul 2022, 05:49Unicode characters …… 2 bytes: the so-called low-end byte and high-end byte. The total value is 256 * high-end + low-end.
Windows is low-endian: it writes the low-end byte first, then the high-end one.
The letter A has code 65. This can be written as 256 * 0 + 65, so the high-end byte is 0 and the low-end one is 65. Windows writes this as
65 0
Should that not be … Unicode A is hexadecimal(65) = 41, written in the Unicode 2 byte low-endian as 41 00
( I am very confused as Steven Roman says it should be 41 00 , but he never considers examples outside the ANSI/Ascii number range when talking Unicode stuff. That is frustrating.

_2b
I may not have been clear that 8230 is the decimal used in ChrW( ) ( to get a single character that looks like 3 dots, but much closer together and smaller "…" )
Now, the hexadecimal of 8230 is 2028. So would the Unicode … in the 2 byte low-endian not be 28 20 ?
_.______________
_3 ChrW(x) only works for x up to 65535. Is this
_(i) somehow related to the stuff I don’t yet understand yet, – ( I note that the last 4 digit Hex is FFFF for dec 65535 )
or
_(ii) is it one of those things related to things having been made originally to work on the older .xls files with max things/rows 65,536 ( 0, 1, 2 …… 65535 is = 65536 things)
or
_(iii) is it perhaps some combination of (i) and (ii) ? – Did Microsoft mess with their Unicode encoding innards around 2007 to allow them to go above the 2 Byte hex maximum of FF FF, and have the insane number of rows in Excel 2007 +
_._______________
_4 Max ANSI( and maybe some sort of extended Ascii sometimes ) is decimal 255, which is the last 2 digit hexadecimal FF
Does hexadecimal come about, or was invented, because it somehow in computing makes things nicely ordered – For example, 2 Byte UTF-16 is 2 lots of 8 Bits, making 2 Bytes and that look pretty if its limited to a fixed 4 columns?
Thanks
Alan
Ref (examples)
_a https://stackoverflow.com/questions/496 ... and-utf-32
_b web archive "Win32 API Programming with Visual Basic", Chap 6 VB Strings, Steven Roman
_c Translation of _b to English (not finished yet …… )
Regards , Ālan , DocÆlstein
, 


-
- 5StarLounger
- Posts: 742
- Joined: 27 Jun 2021, 10:46
Re: Calculate letters weight
>the hexadecimal of 8230 is 2028
It isn't, you know.
>Unicode encoding innards around 2007 to allow them to go above the 2 Byte hex maximum of FF FF, and have the insane number of rows in Excel 2007
Er ... Unicode is nothing to do with the number of rows Excel has
>Does hexadecimal come about, or was invented, because it somehow in computing makes things nicely ordered
No
It isn't, you know.
>Unicode encoding innards around 2007 to allow them to go above the 2 Byte hex maximum of FF FF, and have the insane number of rows in Excel 2007
Er ... Unicode is nothing to do with the number of rows Excel has
>Does hexadecimal come about, or was invented, because it somehow in computing makes things nicely ordered
No
-
- 5StarLounger
- Posts: 773
- Joined: 18 Jan 2022, 15:59
- Location: An Englishman, illegally re-routing rivers, in Hof, Beautiful Bavaria. Rule, Britannia!
(Calculate letters weight) ANSI/Ascii vs Unicode Issues
Thx for the reply, (Sorry, I missed it,….. even though I was here a few posts back for most of yesterday – probably I missed your reply because I was spending some time re reading things in this thread, and clicking around posts or refreshing probably caused the red notification thing up top ( which is reliable ) to vanish, ( and Email notifications are erratic and unreliable )
_._________________-
But that does not affect the question much. It still is nothing like 38 32 ?
_._______________________
I will try again…( question 3(iii) )
I am seeing that one of the forms of Hexadecimal representation used in computer expert language for the decimal number 65535 is
FF FF
That representation, is often shown, by some people, when talking about UTF-16 2 Byte
I am wondering if this is somehow , in some way related to, something like this
Go above 65535 to 65536 and, in some peoples conventions, you go to 10000, when explaining unicode things (sometimes as part of some other notation I don’t understand) I have not yet seen more than 4 characters in peoples explanation of UTF-16 2 Byte, when given in tabular form like that above. (But I would not be surprised if someone can give me a few, and then chuckle to themselves knowing its going to confuse me even more, lol)
So the lateral thought of mine, just in some vain attempt to get some understanding, was that perhaps Microsoft needed UTF-16 4 Byte to aid them in stuff above that 65535 / 65536 number
An acceptable answer could be:
The use of UTF-16 2 Byte or UTF-16 4 Byte by Microsoft in any back end low level stuff is ( or alternatively is not ) related to issues to do with being above or below that 65535 / 65536 number in some workings, for example workings involved with the row number
But of course, possibly no one can know the answer to that, as it would involve perhaps knowing Microsoft Office trade secrets.
But there will likely be many people that believe they do know, and I suspect that number will be higher than those that can know.
As Physicist by trade, I am happy to use a working theory, but am open minded enough to think it can be wrong even though I am totally convinced I am right. Einstein and many top Physicist are quoted as saying that a lot of Einstein’s stuff is absurd rubbish and likely wrong. Never the less it is very useful to know and understand the Theories.
I confess I am not quite sure what I am asking. It is difficult when one person says Unicode UTF-16 2 Byte A is 65 00 and the other says its 41 00
But often, as in the case of a religion, they cannot give simple proof. I have to accept their belief, as I am happy in a democracy to accept when two people say their God is the only one
I guess it's an inevitable consequence sometimes, when we are left to guess with some of these computer things, especially with Microsoft, and two of the top people say something different, so you have to choose your God and hope it does not lead to war, as it usually does unfortunately.
Perhaps when I know the answer I can ask the question better. That is not as absurd as it sounds. Many smart people have said to me that often to find the answer in Microsoft stuff, you need to know the answer
_._________________-
Ahh, OK, yes – sorry, small typo 2026
But that does not affect the question much. It still is nothing like 38 32 ?
_._______________________
I was not intending to imply that Unicode was to do with the number of rows Excel has. I perhaps worded it unclearly, partly as I am so totally confused with all the different number formats people sometimes use when explaining the same thing. (This makes it extra difficult to even word the question clearly, as I speak English and German fluently, but not the computer expert language too well yet…. Apologies for my ignorant ignorance… its difficult when you are so ignorant that you can’t even word the question correctly… It's like a double level pointer of ignorance making everything 10x , or maybe 8x, harder )
I will try again…( question 3(iii) )
I am seeing that one of the forms of Hexadecimal representation used in computer expert language for the decimal number 65535 is
FF FF
That representation, is often shown, by some people, when talking about UTF-16 2 Byte
I am wondering if this is somehow , in some way related to, something like this
Code: Select all
Byte Byte
FF FF
Go above 65535 to 65536 and, in some peoples conventions, you go to 10000, when explaining unicode things (sometimes as part of some other notation I don’t understand) I have not yet seen more than 4 characters in peoples explanation of UTF-16 2 Byte, when given in tabular form like that above. (But I would not be surprised if someone can give me a few, and then chuckle to themselves knowing its going to confuse me even more, lol)
So the lateral thought of mine, just in some vain attempt to get some understanding, was that perhaps Microsoft needed UTF-16 4 Byte to aid them in stuff above that 65535 / 65536 number
An acceptable answer could be:
The use of UTF-16 2 Byte or UTF-16 4 Byte by Microsoft in any back end low level stuff is ( or alternatively is not ) related to issues to do with being above or below that 65535 / 65536 number in some workings, for example workings involved with the row number
But of course, possibly no one can know the answer to that, as it would involve perhaps knowing Microsoft Office trade secrets.
But there will likely be many people that believe they do know, and I suspect that number will be higher than those that can know.
As Physicist by trade, I am happy to use a working theory, but am open minded enough to think it can be wrong even though I am totally convinced I am right. Einstein and many top Physicist are quoted as saying that a lot of Einstein’s stuff is absurd rubbish and likely wrong. Never the less it is very useful to know and understand the Theories.
I confess I am not quite sure what I am asking. It is difficult when one person says Unicode UTF-16 2 Byte A is 65 00 and the other says its 41 00
But often, as in the case of a religion, they cannot give simple proof. I have to accept their belief, as I am happy in a democracy to accept when two people say their God is the only one
I guess it's an inevitable consequence sometimes, when we are left to guess with some of these computer things, especially with Microsoft, and two of the top people say something different, so you have to choose your God and hope it does not lead to war, as it usually does unfortunately.
Perhaps when I know the answer I can ask the question better. That is not as absurd as it sounds. Many smart people have said to me that often to find the answer in Microsoft stuff, you need to know the answer
Regards , Ālan , DocÆlstein
, 

