Digits int() doesn’t dig
Some time ago I wrote in More Than Just ASCII Digits about all the unicode characters Python’s int() function converts to numbers.
Recently I’ve discovered a discrepancy between those characters and the isdigit() method — there are characters where isdigit() returns True but int() does not convert them.
To make a list of those characters the program from the other blog entry was first ported to Python 3 and then changed to list all characters that are digits according to the isdigit() method, but raise a ValueError with int():
#!/usr/bin/env python3
import sys
import unicodedata
def main():
for i in range(sys.maxunicode + 1):
character = chr(i)
try:
int(character)
except ValueError:
if character.isdigit():
name = unicodedata.name(character).lower()
print(f'{character},{i:x},{name}')
else:
assert character.isdigit()
if __name__ == '__main__':
main()
The assert in there isn’t triggered, which tells us all characters that can be converted actually have the „is digit“ property.
Here is the result:
Character | Codepoint (hex) | Name |
---|---|---|
² | b2 | superscript two |
³ | b3 | superscript three |
¹ | b9 | superscript one |
፩ | 1369 | ethiopic digit one |
፪ | 136a | ethiopic digit two |
፫ | 136b | ethiopic digit three |
፬ | 136c | ethiopic digit four |
፭ | 136d | ethiopic digit five |
፮ | 136e | ethiopic digit six |
፯ | 136f | ethiopic digit seven |
፰ | 1370 | ethiopic digit eight |
፱ | 1371 | ethiopic digit nine |
᧚ | 19da | new tai lue tham digit one |
⁰ | 2070 | superscript zero |
⁴ | 2074 | superscript four |
⁵ | 2075 | superscript five |
⁶ | 2076 | superscript six |
⁷ | 2077 | superscript seven |
⁸ | 2078 | superscript eight |
⁹ | 2079 | superscript nine |
₀ | 2080 | subscript zero |
₁ | 2081 | subscript one |
₂ | 2082 | subscript two |
₃ | 2083 | subscript three |
₄ | 2084 | subscript four |
₅ | 2085 | subscript five |
₆ | 2086 | subscript six |
₇ | 2087 | subscript seven |
₈ | 2088 | subscript eight |
₉ | 2089 | subscript nine |
① | 2460 | circled digit one |
② | 2461 | circled digit two |
③ | 2462 | circled digit three |
④ | 2463 | circled digit four |
⑤ | 2464 | circled digit five |
⑥ | 2465 | circled digit six |
⑦ | 2466 | circled digit seven |
⑧ | 2467 | circled digit eight |
⑨ | 2468 | circled digit nine |
⑴ | 2474 | parenthesized digit one |
⑵ | 2475 | parenthesized digit two |
⑶ | 2476 | parenthesized digit three |
⑷ | 2477 | parenthesized digit four |
⑸ | 2478 | parenthesized digit five |
⑹ | 2479 | parenthesized digit six |
⑺ | 247a | parenthesized digit seven |
⑻ | 247b | parenthesized digit eight |
⑼ | 247c | parenthesized digit nine |
⒈ | 2488 | digit one full stop |
⒉ | 2489 | digit two full stop |
⒊ | 248a | digit three full stop |
⒋ | 248b | digit four full stop |
⒌ | 248c | digit five full stop |
⒍ | 248d | digit six full stop |
⒎ | 248e | digit seven full stop |
⒏ | 248f | digit eight full stop |
⒐ | 2490 | digit nine full stop |
⓪ | 24ea | circled digit zero |
⓵ | 24f5 | double circled digit one |
⓶ | 24f6 | double circled digit two |
⓷ | 24f7 | double circled digit three |
⓸ | 24f8 | double circled digit four |
⓹ | 24f9 | double circled digit five |
⓺ | 24fa | double circled digit six |
⓻ | 24fb | double circled digit seven |
⓼ | 24fc | double circled digit eight |
⓽ | 24fd | double circled digit nine |
⓿ | 24ff | negative circled digit zero |
❶ | 2776 | dingbat negative circled digit one |
❷ | 2777 | dingbat negative circled digit two |
❸ | 2778 | dingbat negative circled digit three |
❹ | 2779 | dingbat negative circled digit four |
❺ | 277a | dingbat negative circled digit five |
❻ | 277b | dingbat negative circled digit six |
❼ | 277c | dingbat negative circled digit seven |
❽ | 277d | dingbat negative circled digit eight |
❾ | 277e | dingbat negative circled digit nine |
➀ | 2780 | dingbat circled sans-serif digit one |
➁ | 2781 | dingbat circled sans-serif digit two |
➂ | 2782 | dingbat circled sans-serif digit three |
➃ | 2783 | dingbat circled sans-serif digit four |
➄ | 2784 | dingbat circled sans-serif digit five |
➅ | 2785 | dingbat circled sans-serif digit six |
➆ | 2786 | dingbat circled sans-serif digit seven |
➇ | 2787 | dingbat circled sans-serif digit eight |
➈ | 2788 | dingbat circled sans-serif digit nine |
➊ | 278a | dingbat negative circled sans-serif digit one |
➋ | 278b | dingbat negative circled sans-serif digit two |
➌ | 278c | dingbat negative circled sans-serif digit three |
➍ | 278d | dingbat negative circled sans-serif digit four |
➎ | 278e | dingbat negative circled sans-serif digit five |
➏ | 278f | dingbat negative circled sans-serif digit six |
➐ | 2790 | dingbat negative circled sans-serif digit seven |
➑ | 2791 | dingbat negative circled sans-serif digit eight |
➒ | 2792 | dingbat negative circled sans-serif digit nine |
𐩀 | 10a40 | kharoshthi digit one |
𐩁 | 10a41 | kharoshthi digit two |
𐩂 | 10a42 | kharoshthi digit three |
𐩃 | 10a43 | kharoshthi digit four |
𐹠 | 10e60 | rumi digit one |
𐹡 | 10e61 | rumi digit two |
𐹢 | 10e62 | rumi digit three |
𐹣 | 10e63 | rumi digit four |
𐹤 | 10e64 | rumi digit five |
𐹥 | 10e65 | rumi digit six |
𐹦 | 10e66 | rumi digit seven |
𐹧 | 10e67 | rumi digit eight |
𐹨 | 10e68 | rumi digit nine |
𑁒 | 11052 | brahmi number one |
𑁓 | 11053 | brahmi number two |
𑁔 | 11054 | brahmi number three |
𑁕 | 11055 | brahmi number four |
𑁖 | 11056 | brahmi number five |
𑁗 | 11057 | brahmi number six |
𑁘 | 11058 | brahmi number seven |
𑁙 | 11059 | brahmi number eight |
𑁚 | 1105a | brahmi number nine |
🄀 | 1f100 | digit zero full stop |
🄁 | 1f101 | digit zero comma |
🄂 | 1f102 | digit one comma |
🄃 | 1f103 | digit two comma |
🄄 | 1f104 | digit three comma |
🄅 | 1f105 | digit four comma |
🄆 | 1f106 | digit five comma |
🄇 | 1f107 | digit six comma |
🄈 | 1f108 | digit seven comma |
🄉 | 1f109 | digit eight comma |
🄊 | 1f10a | digit nine comma |