Regex on non latin alphabets

Hi there guys,

Hoping someone can help me out with the Regex expression regex(., '[1]+$') for capturing a name.

This condition works perfectly for Latin characters but it doesn't work for non Latin alphabets (eg. Amaharic alphabet)

Any help would be greatly appreciated!

@erobinson
@Mazz
@Norman_Hooper
@Ethan_Soergel
@Simon_Kelly


  1. a-zA-Z\s ↩︎

A google search on arabic regex expression return this
[\u0621-\u064A] so maybe?

1 Like

@Mazz a regex tester is a great idea!

@Calvin if \s works (matches any whitespace character), then \w should match a "word" character in any alphabet (Latin, Amharic, Cyrillic, etc.). Be aware that digits and underscore are also included as word characters. So [\w -]+ would match both "Charles III" and "Карл 3-й".

regex(., '^[\w\s]+$') should be the equivalent of what you've got, although I'd tweak that to regex(., '^[\w -]+$') to match just spaces instead of all whitespace, and to include hyphens for hyphenated names.

2 Likes

Hi @Mazz and @Norman_Hooper, thank you so much for your great responses.
@Norman_Hooper, the only thing with regex(., '^[\w -]+$') is that it allows integer values and we wanted to avoid that, otherwise that would have worked perfectly.

You guys inspired me to dig a little deeper and I was able to find a list of Unicode characters for the Ethiopian languages which includes Amharic.

Will need to do some more testing but this seems to work how we need it to:
regex(., '^[\u1200-\u137Fa-zA-Z\s]+$')

1 Like

I think that there are situations where the character set is NOT unicode? or am I being parched for coffee?

Im not sure I follow what you mean @Mazz? Lol, maybe I'm the one parched for coffee!

I think that this /u tag tells regex expression the unicode code for the characters you want to allow.

I think that there are niche situations where your device's character set is NOT in unicode. but I doubt that you will have to worry about it.

One thing this makes me think about is the ability to have a different validation condition for the language selected. you can do it in a round about way by picking up the lang-code label though.

this is just me falling into the rabbit hole

1 Like