Remove accented letters from a StringTag(s): Powerscript


The following snippet removes from a String accented letters and replace them by their regular ASCII equivalent.

This can be useful before inserting data into a database to made sorting easier.

String accent   = "ÈÉÊËÛÙÏÎÀÂÔÖÇèéêëûùïîàâôöç"
String noaccent = "EEEEUUIIAAOOCeeeeuuiiaaooc"

String test = "Test : à é À É ç"
String currentChar = ""
String result = ""
int i,j = 0

FOR i = 1 TO len(test)
    currentChar = mid(test,i, 1)
    j = pos(accent, currentChar)
    IF j > 0 THEN
        result += mid(noaccent,j,1)
    ELSE
        result += currentChar
    END IF
NEXT

MessageBox(test, result)

/*
result :
---------------------------
Test : à é À É ç
---------------------------
Test : a e A E c
---------------------------
OK   
---------------------------
*/
The weakness of the above HowTo is a potentially poor performance when dealing with a large string or many strings. The multiple concatenations of the result string will create many temporary strings. It's ok if you are dealing few lines but it's possible to do better.

The next version uses a blob to hold the result string. This version is more optimized since we are not creating any temporary string.


[function  string of_removeaccent(string readonly as_accent)]

CONSTANT String ACCENT_LIST   = "ÈÉÊËÛÙÏÎÀÂÔÖÇèéêëûùïîàâôöç"
CONSTANT String NOACCENT_LIST = "EEEEUUIIAAOOCeeeeuuiiaaooc"

CONSTANT int ENDOFSTRING_LEN   = 2  // end of string PB10 (unicode)
//CONSTANT int ENDOFSTRING_LEN = 1  // end of string PB9 or less (ansi)

blob    result
string  currentChar, currentNoAccent 
int     index, posAccent

ulong currentBlobIndex = 1

IF IsNull(as_accent)    THEN 
    RETURN as_accent
END IF

IF Trim(as_accent) = "" THEN 
    RETURN as_accent
END IF


// create a blob and add an extra character to take into account the End-of-string
result = Blob(as_accent + " ")

// scan the input
FOR index = 1 TO len(as_accent)
    currentChar = mid(as_accent, index , 1)
    posAccent = pos(ACCENT_LIST, currentChar)
    IF posAccent > 0 THEN
        currentNoAccent = mid(NOACCENT_LIST, posAccent, 1) 
        currentBlobIndex= BlobEdit(result, currentBlobIndex, currentNoAccent) 
    ELSE
        currentBlobIndex= BlobEdit(result, currentBlobIndex, currentChar) 
    END IF
    currentBlobIndex -= ENDOFSTRING_LEN   
NEXT

RETURN String(result)


[powerscript]
String test = "Test : à é À É ç"
String ls_result 

ls_result = of_removeaccent(ls_test)
MessageBox(ls_test, ls_result)


See also this Java HowTo
blog comments powered by Disqus