Just depends on the base the numbers are being given in.
So a byte, which is 8 binary bits, can represent (in decimal) any integer number from 0 to 255 inclusive.
So let's pretend that we have one byte that currently has the bit pattern that represents decimal 38
And let's pretend that we have another byte that currently has the bit pattern that represents decimal 32
But in hexadecimal (which is nothing more magic than a name for base 16, in the same way that decimal is no more than a name for base 10) those two bytes are (drum roll): 26 and 20
Some code may help. Or maybe not ...
Code: Select all
Option Explicit
Public Type mike
mychar As Integer
End Type
Public Type wombat
byteHi As Byte
byteLo As Byte
End Type
Public Sub Example()
Dim m As mike
Dim w As wombat
m.mychar = 8230
LSet w = m ' use a hack to copy the underlying storage of the integer into two bytes
Debug.Print "The hex of the number: " & Hex(m.mychar)
Debug.Print "How it is stored; " & Right("00" & Hex(w.byteHi), 2) & " " & Right("00" & Hex(w.byteLo), 2)
End Sub
Both right - but one is decimal the other is hexadecimal. But yes, can be confusing if you are not aware of what base is in use.
>Go above 65535 to 65536 and, in some peoples conventions, you go to 10000
Where 'some peoples conventions' = hexadecimal
>I have not yet seen more than 4 characters in peoples explanation of UTF-16 2 Byte
So we perhaps get to the core of some of your misunderstanding. A Unicode value (whether in UTF-8, UTF-16 or UTF-32) represents a single character, no matter how many bytes are used for that representation. Unlike ASCII/ANSI, where we can safely say that 1 byte = 1 char, we cannot do the same for Unicode. (except for the first 256 characters of UTF-8 which - by design - overlap ANSI)
UTF-16 2 byte (not really a thing, to be honest) is a 16 bit "code unit" and represents a single Unicode code point (although some code points may represent what we humans might consider more than 1 character, for example ligatures). And do not make the mistake of assuming the hexadecimal digits that represent the bytes are themselves characters (i.e FFFF is not 4 characters)
But to get to what I guess is your point - why do you never see more than FFFF when discussing Unicode 16 2 byte. well, because hex FFFF is the largest number that 2 bytes can directly represent If you started talking about hex 10000, then you are talking about 3 (or in reality 4) bytes. See below for a quick primer on binary numbers
I should point out that above code point FFFF UTF-16 get's somewhat messy to deal with (Unicode code point 10000 is not internally represented by hex 10000 for example), and is mostly irrelevant to most people, so most of the time MS seem to like to restrict access to higher code points (especially in VB(A)).
>perhaps Microsoft needed UTF-16 4 Byte to aid them in stuff above that 65535 / 65536
>for example workings involved with the row number
No, you are still seem to mixing Unicode up with some fundamental features of binary computers.
You know all this, but:
Modern binary computers based on 8-bit bytes (and yes, there were indeed computers that used different sized bytes). By setting the bits on and off, a byte can represent 256 different states (2^8) - and we can choose to interpret those states as the numbers 0 to 255 (hex 0 to hex FF). If we want to represent the number 256 we need to start using a second byte. And once we start using that second byte, we can represent 65536 states - or the numbers 0 to 65535 (hex 0 to hex FFFF), which from the 16-bit days is the range of an unsigned integer (what Microsoft now refer to as a short). If we want to represent the next number, 65536, then we need yet another byte (but three bytes is a pain for a binary computer to handle, so we actually use another 2 bytes. And that allows us to represent 4294967296 states - or the numbers 0 TO 4294967295 (hex 0 to hex FFFFFFFF). Which, again not coincidentally, is the range of a Long (or as MS now call it a uint, since the size of a Long has changed ...).
Now as you can see none of this relies on Unicode either directly or indirectly
So it isn't rocket science - older versions of Excel had a limit of 65536 rows because internally they uses an Integer/short to represent the row number.
Oddly, the new limit (where the value is now stored in a long/uint) is 2^20, not 2^32 - but I suspect the view was that sheets larger than this might be limited by memory and/or performance