Digits int() doesn’t dig

Some time ago I wrote in More Than Just ASCII Digits about all the unicode characters Python’s int() function converts to numbers.

Recently I’ve discovered a discrepancy between those characters and the isdigit() method — there are characters where isdigit() returns True but int() does not convert them.

To make a list of those characters the program from the other blog entry was first ported to Python 3 and then changed to list all characters that are digits according to the isdigit() method, but raise a ValueError with int():

#!/usr/bin/env python3
import sys
import unicodedata


def main():
    for i in range(sys.maxunicode + 1):
        character = chr(i)
        try:
            int(character)
        except ValueError:
            if character.isdigit():
                name = unicodedata.name(character).lower()
                print(f'{character},{i:x},{name}')
        else:
            assert character.isdigit()


if __name__ == '__main__':
    main()

The assert in there isn’t triggered, which tells us all characters that can be converted actually have the „is digit“ property.

Here is the result:

All characters that are digits but can’t be converted by int().
Character Codepoint (hex) Name
² b2 superscript two
³ b3 superscript three
¹ b9 superscript one
1369 ethiopic digit one
136a ethiopic digit two
136b ethiopic digit three
136c ethiopic digit four
136d ethiopic digit five
136e ethiopic digit six
136f ethiopic digit seven
1370 ethiopic digit eight
1371 ethiopic digit nine
19da new tai lue tham digit one
2070 superscript zero
2074 superscript four
2075 superscript five
2076 superscript six
2077 superscript seven
2078 superscript eight
2079 superscript nine
2080 subscript zero
2081 subscript one
2082 subscript two
2083 subscript three
2084 subscript four
2085 subscript five
2086 subscript six
2087 subscript seven
2088 subscript eight
2089 subscript nine
2460 circled digit one
2461 circled digit two
2462 circled digit three
2463 circled digit four
2464 circled digit five
2465 circled digit six
2466 circled digit seven
2467 circled digit eight
2468 circled digit nine
2474 parenthesized digit one
2475 parenthesized digit two
2476 parenthesized digit three
2477 parenthesized digit four
2478 parenthesized digit five
2479 parenthesized digit six
247a parenthesized digit seven
247b parenthesized digit eight
247c parenthesized digit nine
2488 digit one full stop
2489 digit two full stop
248a digit three full stop
248b digit four full stop
248c digit five full stop
248d digit six full stop
248e digit seven full stop
248f digit eight full stop
2490 digit nine full stop
24ea circled digit zero
24f5 double circled digit one
24f6 double circled digit two
24f7 double circled digit three
24f8 double circled digit four
24f9 double circled digit five
24fa double circled digit six
24fb double circled digit seven
24fc double circled digit eight
24fd double circled digit nine
24ff negative circled digit zero
2776 dingbat negative circled digit one
2777 dingbat negative circled digit two
2778 dingbat negative circled digit three
2779 dingbat negative circled digit four
277a dingbat negative circled digit five
277b dingbat negative circled digit six
277c dingbat negative circled digit seven
277d dingbat negative circled digit eight
277e dingbat negative circled digit nine
2780 dingbat circled sans-serif digit one
2781 dingbat circled sans-serif digit two
2782 dingbat circled sans-serif digit three
2783 dingbat circled sans-serif digit four
2784 dingbat circled sans-serif digit five
2785 dingbat circled sans-serif digit six
2786 dingbat circled sans-serif digit seven
2787 dingbat circled sans-serif digit eight
2788 dingbat circled sans-serif digit nine
278a dingbat negative circled sans-serif digit one
278b dingbat negative circled sans-serif digit two
278c dingbat negative circled sans-serif digit three
278d dingbat negative circled sans-serif digit four
278e dingbat negative circled sans-serif digit five
278f dingbat negative circled sans-serif digit six
2790 dingbat negative circled sans-serif digit seven
2791 dingbat negative circled sans-serif digit eight
2792 dingbat negative circled sans-serif digit nine
𐩀 10a40 kharoshthi digit one
𐩁 10a41 kharoshthi digit two
𐩂 10a42 kharoshthi digit three
𐩃 10a43 kharoshthi digit four
𐹠 10e60 rumi digit one
𐹡 10e61 rumi digit two
𐹢 10e62 rumi digit three
𐹣 10e63 rumi digit four
𐹤 10e64 rumi digit five
𐹥 10e65 rumi digit six
𐹦 10e66 rumi digit seven
𐹧 10e67 rumi digit eight
𐹨 10e68 rumi digit nine
𑁒 11052 brahmi number one
𑁓 11053 brahmi number two
𑁔 11054 brahmi number three
𑁕 11055 brahmi number four
𑁖 11056 brahmi number five
𑁗 11057 brahmi number six
𑁘 11058 brahmi number seven
𑁙 11059 brahmi number eight
𑁚 1105a brahmi number nine
🄀 1f100 digit zero full stop
🄁 1f101 digit zero comma
🄂 1f102 digit one comma
🄃 1f103 digit two comma
🄄 1f104 digit three comma
🄅 1f105 digit four comma
🄆 1f106 digit five comma
🄇 1f107 digit six comma
🄈 1f108 digit seven comma
🄉 1f109 digit eight comma
🄊 1f10a digit nine comma