.. post:: 2018-10-03 :tags: python, unicode :category: Python :excerpt: 1 Digits `int()` doesn't dig ========================== Some time ago I wrote in `More Than Just ASCII Digits `_ about all the unicode characters Python's `int()` function converts to numbers. Recently I've discovered a discrepancy between those characters and the `isdigit()` method — there are characters where `isdigit()` returns `True` but `int()` does not convert them. To make a list of those characters the program from the other blog entry was first ported to Python 3 and then changed to list all characters that are digits according to the `isdigit()` method, but raise a `ValueError` with `int()`: .. code:: python #!/usr/bin/env python3 import sys import unicodedata def main(): for i in range(sys.maxunicode + 1): character = chr(i) try: int(character) except ValueError: if character.isdigit(): name = unicodedata.name(character).lower() print(f'{character},{i:x},{name}') else: assert character.isdigit() if __name__ == '__main__': main() The ``assert`` in there isn't triggered, which tells us all characters that *can* be converted actually *have* the „is digit“ property. Here is the result: .. csv-table:: All characters that are digits but can't be converted by `int()`. :header: Character,Codepoint (hex),Name :widths: 1,1,10 ²,b2,superscript two ³,b3,superscript three ¹,b9,superscript one ፩,1369,ethiopic digit one ፪,136a,ethiopic digit two ፫,136b,ethiopic digit three ፬,136c,ethiopic digit four ፭,136d,ethiopic digit five ፮,136e,ethiopic digit six ፯,136f,ethiopic digit seven ፰,1370,ethiopic digit eight ፱,1371,ethiopic digit nine ᧚,19da,new tai lue tham digit one ⁰,2070,superscript zero ⁴,2074,superscript four ⁵,2075,superscript five ⁶,2076,superscript six ⁷,2077,superscript seven ⁸,2078,superscript eight ⁹,2079,superscript nine ₀,2080,subscript zero ₁,2081,subscript one ₂,2082,subscript two ₃,2083,subscript three ₄,2084,subscript four ₅,2085,subscript five ₆,2086,subscript six ₇,2087,subscript seven ₈,2088,subscript eight ₉,2089,subscript nine ①,2460,circled digit one ②,2461,circled digit two ③,2462,circled digit three ④,2463,circled digit four ⑤,2464,circled digit five ⑥,2465,circled digit six ⑦,2466,circled digit seven ⑧,2467,circled digit eight ⑨,2468,circled digit nine ⑴,2474,parenthesized digit one ⑵,2475,parenthesized digit two ⑶,2476,parenthesized digit three ⑷,2477,parenthesized digit four ⑸,2478,parenthesized digit five ⑹,2479,parenthesized digit six ⑺,247a,parenthesized digit seven ⑻,247b,parenthesized digit eight ⑼,247c,parenthesized digit nine ⒈,2488,digit one full stop ⒉,2489,digit two full stop ⒊,248a,digit three full stop ⒋,248b,digit four full stop ⒌,248c,digit five full stop ⒍,248d,digit six full stop ⒎,248e,digit seven full stop ⒏,248f,digit eight full stop ⒐,2490,digit nine full stop ⓪,24ea,circled digit zero ⓵,24f5,double circled digit one ⓶,24f6,double circled digit two ⓷,24f7,double circled digit three ⓸,24f8,double circled digit four ⓹,24f9,double circled digit five ⓺,24fa,double circled digit six ⓻,24fb,double circled digit seven ⓼,24fc,double circled digit eight ⓽,24fd,double circled digit nine ⓿,24ff,negative circled digit zero ❶,2776,dingbat negative circled digit one ❷,2777,dingbat negative circled digit two ❸,2778,dingbat negative circled digit three ❹,2779,dingbat negative circled digit four ❺,277a,dingbat negative circled digit five ❻,277b,dingbat negative circled digit six ❼,277c,dingbat negative circled digit seven ❽,277d,dingbat negative circled digit eight ❾,277e,dingbat negative circled digit nine ➀,2780,dingbat circled sans-serif digit one ➁,2781,dingbat circled sans-serif digit two ➂,2782,dingbat circled sans-serif digit three ➃,2783,dingbat circled sans-serif digit four ➄,2784,dingbat circled sans-serif digit five ➅,2785,dingbat circled sans-serif digit six ➆,2786,dingbat circled sans-serif digit seven ➇,2787,dingbat circled sans-serif digit eight ➈,2788,dingbat circled sans-serif digit nine ➊,278a,dingbat negative circled sans-serif digit one ➋,278b,dingbat negative circled sans-serif digit two ➌,278c,dingbat negative circled sans-serif digit three ➍,278d,dingbat negative circled sans-serif digit four ➎,278e,dingbat negative circled sans-serif digit five ➏,278f,dingbat negative circled sans-serif digit six ➐,2790,dingbat negative circled sans-serif digit seven ➑,2791,dingbat negative circled sans-serif digit eight ➒,2792,dingbat negative circled sans-serif digit nine 𐩀,10a40,kharoshthi digit one 𐩁,10a41,kharoshthi digit two 𐩂,10a42,kharoshthi digit three 𐩃,10a43,kharoshthi digit four 𐹠,10e60,rumi digit one 𐹡,10e61,rumi digit two 𐹢,10e62,rumi digit three 𐹣,10e63,rumi digit four 𐹤,10e64,rumi digit five 𐹥,10e65,rumi digit six 𐹦,10e66,rumi digit seven 𐹧,10e67,rumi digit eight 𐹨,10e68,rumi digit nine 𑁒,11052,brahmi number one 𑁓,11053,brahmi number two 𑁔,11054,brahmi number three 𑁕,11055,brahmi number four 𑁖,11056,brahmi number five 𑁗,11057,brahmi number six 𑁘,11058,brahmi number seven 𑁙,11059,brahmi number eight 𑁚,1105a,brahmi number nine 🄀,1f100,digit zero full stop 🄁,1f101,digit zero comma 🄂,1f102,digit one comma 🄃,1f103,digit two comma 🄄,1f104,digit three comma 🄅,1f105,digit four comma 🄆,1f106,digit five comma 🄇,1f107,digit six comma 🄈,1f108,digit seven comma 🄉,1f109,digit eight comma 🄊,1f10a,digit nine comma