Digits int() doesn’t dig¶
Some time ago I wrote in More Than Just ASCII Digits about all the unicode characters Python’s int() function converts to numbers.
Recently I’ve discovered a discrepancy between those characters and the isdigit() method — there are characters where isdigit() returns True but int() does not convert them.
To make a list of those characters the program from the other blog entry was first ported to Python 3 and then changed to list all characters that are digits according to the isdigit() method, but raise a ValueError with int():
#!/usr/bin/env python3
import sys
import unicodedata
def main():
for i in range(sys.maxunicode + 1):
character = chr(i)
try:
int(character)
except ValueError:
if character.isdigit():
name = unicodedata.name(character).lower()
print(f'{character},{i:x},{name}')
else:
assert character.isdigit()
if __name__ == '__main__':
main()
The assert
in there isn’t triggered, which tells us all characters that
can be converted actually have the „is digit“ property.
Here is the result:
Character |
Codepoint (hex) |
Name |
---|---|---|
² |
b2 |
superscript two |
³ |
b3 |
superscript three |
¹ |
b9 |
superscript one |
፩ |
1369 |
ethiopic digit one |
፪ |
136a |
ethiopic digit two |
፫ |
136b |
ethiopic digit three |
፬ |
136c |
ethiopic digit four |
፭ |
136d |
ethiopic digit five |
፮ |
136e |
ethiopic digit six |
፯ |
136f |
ethiopic digit seven |
፰ |
1370 |
ethiopic digit eight |
፱ |
1371 |
ethiopic digit nine |
᧚ |
19da |
new tai lue tham digit one |
⁰ |
2070 |
superscript zero |
⁴ |
2074 |
superscript four |
⁵ |
2075 |
superscript five |
⁶ |
2076 |
superscript six |
⁷ |
2077 |
superscript seven |
⁸ |
2078 |
superscript eight |
⁹ |
2079 |
superscript nine |
₀ |
2080 |
subscript zero |
₁ |
2081 |
subscript one |
₂ |
2082 |
subscript two |
₃ |
2083 |
subscript three |
₄ |
2084 |
subscript four |
₅ |
2085 |
subscript five |
₆ |
2086 |
subscript six |
₇ |
2087 |
subscript seven |
₈ |
2088 |
subscript eight |
₉ |
2089 |
subscript nine |
① |
2460 |
circled digit one |
② |
2461 |
circled digit two |
③ |
2462 |
circled digit three |
④ |
2463 |
circled digit four |
⑤ |
2464 |
circled digit five |
⑥ |
2465 |
circled digit six |
⑦ |
2466 |
circled digit seven |
⑧ |
2467 |
circled digit eight |
⑨ |
2468 |
circled digit nine |
⑴ |
2474 |
parenthesized digit one |
⑵ |
2475 |
parenthesized digit two |
⑶ |
2476 |
parenthesized digit three |
⑷ |
2477 |
parenthesized digit four |
⑸ |
2478 |
parenthesized digit five |
⑹ |
2479 |
parenthesized digit six |
⑺ |
247a |
parenthesized digit seven |
⑻ |
247b |
parenthesized digit eight |
⑼ |
247c |
parenthesized digit nine |
⒈ |
2488 |
digit one full stop |
⒉ |
2489 |
digit two full stop |
⒊ |
248a |
digit three full stop |
⒋ |
248b |
digit four full stop |
⒌ |
248c |
digit five full stop |
⒍ |
248d |
digit six full stop |
⒎ |
248e |
digit seven full stop |
⒏ |
248f |
digit eight full stop |
⒐ |
2490 |
digit nine full stop |
⓪ |
24ea |
circled digit zero |
⓵ |
24f5 |
double circled digit one |
⓶ |
24f6 |
double circled digit two |
⓷ |
24f7 |
double circled digit three |
⓸ |
24f8 |
double circled digit four |
⓹ |
24f9 |
double circled digit five |
⓺ |
24fa |
double circled digit six |
⓻ |
24fb |
double circled digit seven |
⓼ |
24fc |
double circled digit eight |
⓽ |
24fd |
double circled digit nine |
⓿ |
24ff |
negative circled digit zero |
❶ |
2776 |
dingbat negative circled digit one |
❷ |
2777 |
dingbat negative circled digit two |
❸ |
2778 |
dingbat negative circled digit three |
❹ |
2779 |
dingbat negative circled digit four |
❺ |
277a |
dingbat negative circled digit five |
❻ |
277b |
dingbat negative circled digit six |
❼ |
277c |
dingbat negative circled digit seven |
❽ |
277d |
dingbat negative circled digit eight |
❾ |
277e |
dingbat negative circled digit nine |
➀ |
2780 |
dingbat circled sans-serif digit one |
➁ |
2781 |
dingbat circled sans-serif digit two |
➂ |
2782 |
dingbat circled sans-serif digit three |
➃ |
2783 |
dingbat circled sans-serif digit four |
➄ |
2784 |
dingbat circled sans-serif digit five |
➅ |
2785 |
dingbat circled sans-serif digit six |
➆ |
2786 |
dingbat circled sans-serif digit seven |
➇ |
2787 |
dingbat circled sans-serif digit eight |
➈ |
2788 |
dingbat circled sans-serif digit nine |
➊ |
278a |
dingbat negative circled sans-serif digit one |
➋ |
278b |
dingbat negative circled sans-serif digit two |
➌ |
278c |
dingbat negative circled sans-serif digit three |
➍ |
278d |
dingbat negative circled sans-serif digit four |
➎ |
278e |
dingbat negative circled sans-serif digit five |
➏ |
278f |
dingbat negative circled sans-serif digit six |
➐ |
2790 |
dingbat negative circled sans-serif digit seven |
➑ |
2791 |
dingbat negative circled sans-serif digit eight |
➒ |
2792 |
dingbat negative circled sans-serif digit nine |
𐩀 |
10a40 |
kharoshthi digit one |
𐩁 |
10a41 |
kharoshthi digit two |
𐩂 |
10a42 |
kharoshthi digit three |
𐩃 |
10a43 |
kharoshthi digit four |
𐹠 |
10e60 |
rumi digit one |
𐹡 |
10e61 |
rumi digit two |
𐹢 |
10e62 |
rumi digit three |
𐹣 |
10e63 |
rumi digit four |
𐹤 |
10e64 |
rumi digit five |
𐹥 |
10e65 |
rumi digit six |
𐹦 |
10e66 |
rumi digit seven |
𐹧 |
10e67 |
rumi digit eight |
𐹨 |
10e68 |
rumi digit nine |
𑁒 |
11052 |
brahmi number one |
𑁓 |
11053 |
brahmi number two |
𑁔 |
11054 |
brahmi number three |
𑁕 |
11055 |
brahmi number four |
𑁖 |
11056 |
brahmi number five |
𑁗 |
11057 |
brahmi number six |
𑁘 |
11058 |
brahmi number seven |
𑁙 |
11059 |
brahmi number eight |
𑁚 |
1105a |
brahmi number nine |
🄀 |
1f100 |
digit zero full stop |
🄁 |
1f101 |
digit zero comma |
🄂 |
1f102 |
digit one comma |
🄃 |
1f103 |
digit two comma |
🄄 |
1f104 |
digit three comma |
🄅 |
1f105 |
digit four comma |
🄆 |
1f106 |
digit five comma |
🄇 |
1f107 |
digit six comma |
🄈 |
1f108 |
digit seven comma |
🄉 |
1f109 |
digit eight comma |
🄊 |
1f10a |
digit nine comma |