Digits int() doesn’t dig

Some time ago I wrote in More Than Just ASCII Digits about all the unicode characters Python’s int() function converts to numbers.

Recently I’ve discovered a discrepancy between those characters and the isdigit() method — there are characters where isdigit() returns True but int() does not convert them.

To make a list of those characters the program from the other blog entry was first ported to Python 3 and then changed to list all characters that are digits according to the isdigit() method, but raise a ValueError with int():

#!/usr/bin/env python3
import sys
import unicodedata


def main():
    for i in range(sys.maxunicode + 1):
        character = chr(i)
        try:
            int(character)
        except ValueError:
            if character.isdigit():
                name = unicodedata.name(character).lower()
                print(f'{character},{i:x},{name}')
        else:
            assert character.isdigit()


if __name__ == '__main__':
    main()

The assert in there isn’t triggered, which tells us all characters that can be converted actually have the „is digit“ property.

Here is the result:

All characters that are digits but can’t be converted by int().

Character

Codepoint (hex)

Name

²

b2

superscript two

³

b3

superscript three

¹

b9

superscript one

1369

ethiopic digit one

136a

ethiopic digit two

136b

ethiopic digit three

136c

ethiopic digit four

136d

ethiopic digit five

136e

ethiopic digit six

136f

ethiopic digit seven

1370

ethiopic digit eight

1371

ethiopic digit nine

19da

new tai lue tham digit one

2070

superscript zero

2074

superscript four

2075

superscript five

2076

superscript six

2077

superscript seven

2078

superscript eight

2079

superscript nine

2080

subscript zero

2081

subscript one

2082

subscript two

2083

subscript three

2084

subscript four

2085

subscript five

2086

subscript six

2087

subscript seven

2088

subscript eight

2089

subscript nine

2460

circled digit one

2461

circled digit two

2462

circled digit three

2463

circled digit four

2464

circled digit five

2465

circled digit six

2466

circled digit seven

2467

circled digit eight

2468

circled digit nine

2474

parenthesized digit one

2475

parenthesized digit two

2476

parenthesized digit three

2477

parenthesized digit four

2478

parenthesized digit five

2479

parenthesized digit six

247a

parenthesized digit seven

247b

parenthesized digit eight

247c

parenthesized digit nine

2488

digit one full stop

2489

digit two full stop

248a

digit three full stop

248b

digit four full stop

248c

digit five full stop

248d

digit six full stop

248e

digit seven full stop

248f

digit eight full stop

2490

digit nine full stop

24ea

circled digit zero

24f5

double circled digit one

24f6

double circled digit two

24f7

double circled digit three

24f8

double circled digit four

24f9

double circled digit five

24fa

double circled digit six

24fb

double circled digit seven

24fc

double circled digit eight

24fd

double circled digit nine

24ff

negative circled digit zero

2776

dingbat negative circled digit one

2777

dingbat negative circled digit two

2778

dingbat negative circled digit three

2779

dingbat negative circled digit four

277a

dingbat negative circled digit five

277b

dingbat negative circled digit six

277c

dingbat negative circled digit seven

277d

dingbat negative circled digit eight

277e

dingbat negative circled digit nine

2780

dingbat circled sans-serif digit one

2781

dingbat circled sans-serif digit two

2782

dingbat circled sans-serif digit three

2783

dingbat circled sans-serif digit four

2784

dingbat circled sans-serif digit five

2785

dingbat circled sans-serif digit six

2786

dingbat circled sans-serif digit seven

2787

dingbat circled sans-serif digit eight

2788

dingbat circled sans-serif digit nine

278a

dingbat negative circled sans-serif digit one

278b

dingbat negative circled sans-serif digit two

278c

dingbat negative circled sans-serif digit three

278d

dingbat negative circled sans-serif digit four

278e

dingbat negative circled sans-serif digit five

278f

dingbat negative circled sans-serif digit six

2790

dingbat negative circled sans-serif digit seven

2791

dingbat negative circled sans-serif digit eight

2792

dingbat negative circled sans-serif digit nine

𐩀

10a40

kharoshthi digit one

𐩁

10a41

kharoshthi digit two

𐩂

10a42

kharoshthi digit three

𐩃

10a43

kharoshthi digit four

𐹠

10e60

rumi digit one

𐹡

10e61

rumi digit two

𐹢

10e62

rumi digit three

𐹣

10e63

rumi digit four

𐹤

10e64

rumi digit five

𐹥

10e65

rumi digit six

𐹦

10e66

rumi digit seven

𐹧

10e67

rumi digit eight

𐹨

10e68

rumi digit nine

𑁒

11052

brahmi number one

𑁓

11053

brahmi number two

𑁔

11054

brahmi number three

𑁕

11055

brahmi number four

𑁖

11056

brahmi number five

𑁗

11057

brahmi number six

𑁘

11058

brahmi number seven

𑁙

11059

brahmi number eight

𑁚

1105a

brahmi number nine

🄀

1f100

digit zero full stop

🄁

1f101

digit zero comma

🄂

1f102

digit one comma

🄃

1f103

digit two comma

🄄

1f104

digit three comma

🄅

1f105

digit four comma

🄆

1f106

digit five comma

🄇

1f107

digit six comma

🄈

1f108

digit seven comma

🄉

1f109

digit eight comma

🄊

1f10a

digit nine comma