Updates

18/11/2025

5 Categorical Feature Encoding Techniques in SAS

Editor's Note: This article was originally published in 2021. The encoding methods discussed are foundational data science techniques and remain fully relevant for preparing data for machine learning models today. What is Categorical Feature Encoding? Categorical variables are usually represented...

Editor’s Note: This article was originally published in 2021. The encoding methods discussed are foundational data science techniques and remain fully relevant for preparing data for machine learning models today.

What is Categorical Feature Encoding?

Categorical variables are usually represented as strings in limited numbers, while categorical feature encoding is the process of converting data into a format understandable by machine learning models.

The performance of machine learning models depends on several factors. One factor that determines the performance of the models are the methods used to process data and feed it to the model.

As such, encoding data is a crucial process because it converts data into categorical variables understandable by machine learning models. Encoding data elevates model quality and helps in feature engineering.

In this blog, we explore the different classic encoding methods along with a snapshot of how each encoding method works in SAS Macro.

1. Label Encoding

Label Encoding assigns the value of 1-N to a class of categorical features.

For instance, if there is a variable “Hair color” with values of Black, Brown, and Red, Label encoding will replace these values with 1, 2, and 3. However, one problem with Label Encoding is that it does not consider the order or any relationship between class levels.

This will not stop machine learning algorithms from treating them in this incorrect order, which may lead to inaccurate readings.

SAS Macro for Label Encoding

Here is an example macro to perform Label Encoding in SAS:

%macro label_encode(dataset,var);  
 proc sql noprint;  
   select distinct(&var)  
   into:val1-  
   from &dataset;
   select count(distinct(&var)) into:mx from &dataset;  
 quit;
 data new;  
   set &dataset;  
   %do i=1 %to &mx;  
     if &var="&&&val&i" then new=&i;  
   %end;  
 run;  
%mend;

2. Binary Encoding

Binary Encoding converts class values into numeric values, like Label Encoding does.

However, Binary Encoding takes it a step further and converts the numeric values into binary numbers where each digit will have their own separate column.

“If there are n unique categories, then binary encoding results in the only log (base 2) ⁿ features”. For more information, visit here.

SAS Macro for Binary Encoding

Here is an example macro for Binary Encoding in SAS:

%macro binary_encoding(dataset,var);  
 proc sql noprint;  
   select distinct(&var)  
   into:val1-  
   from &dataset;
   select count(distinct(&var)) into:mx from &dataset;  
 quit;
 data new;  
   set &dataset;  
   %do i=1 %to &mx;  
     if &var="&&&val&i" then new=&i;  
   %end;  
   format new binary.;  
 run;  
%mend;

This macro creates a single variable with a binary formatted value. To split those values into multiple columns, you could create a Split Column Macro.

SAS Macro for Splitting Column

Here is an example macro for splitting columns in SAS:

%macro split_column(data,var);  
 data try;  
   set &data;  
   cha=put(&var, binary.);  
 run;
 proc sql noprint;  
   select max(length(cha)) into :ln from try ;  
 quit;
 data &data;  
   set try;  
   %do i=1 %to &ln;  
     c_&i=substr(cha,&i,1);  
   %end;  
 run;  
%mend;

3. One-Hot Encoding

One-Hot Encoding is the process of converting categorical variables into 1’s and 0’s.

The binary digits are fed into machine learning, deep learning, and statistical algorithms to make better predictions or improve the efficiency of the ML/DL/Statistical models.

SAS Macro for One-Hot Encoding

Here is an example macro to do One-Hot encoding in SAS:

%macro hot_encoding(data,var);  
 proc sql noprint;  
   select distinct &var  
   into:val1-  
   from &data;
   select count(distinct(&var)) into:len from &data;  
 quit;
 data encoded_data;  
   set &data;  
   %do i=1 %to &len;  
     if &var="&&&val&i" then %sysfunc(compress(&&&val&i,'$ - /'))=1 ;  
     else %sysfunc(compress(&&&val&i,'$ - /'))=0;  
   %end;  
 run;  
%mend;

4. Count/Frequency Encoding

As the name suggests, Frequency Encoding counts unique class values, then divides it by the total number of values.

This encoding technique helps the model understand and assign the weight either inversely or directly.

SAS Macro for Count/ Frequency Encoding

Here is an example of a macro for Frequency Encoding in SAS:

%macro frequency_encoding(dataset, var);  
 proc sql noprint;  
   create table freq as  
     select distinct(&var) as values, count(&var) as number  
     from &dataset  
     group by Values ;
   create table new as  
     select *, round(freq.number/count(&var),00.01) As freq_encode  
     from &dataset  
     left join freq  
     on &var=freq.values;  
 quit;
 data new(drop=values number);  
   set new;  
 run;  
%mend;

5. Effect/Sum/Deviation Encoding

The Deviation Encoding technique has different names; some analysts call it Effect encoding, and some say Sum Encoding, but the technique and its application are the same.

Deviation encoding is the same as Hot Encoding, but the only difference is if there are 0 values in all the columns, then the values will become -1.

For example:

One-Hot Encoding

Effect/Sum/Deviation Encoding

SAS Macro for Effect/Sum/ Deviation Encoding

Here is an example macro for Deviation Encoding in SAS:

%macro sum_encoding(data,var);  
 proc sql noprint;  
   select distinct &var  
   into:val1-  
   from &data;
   select count(distinct(&var)) into:len from &data;  
 quit;
 data encoded_data;  
   set &data;  
   %do i=1 %to &len;  
     if &var="&&&val&i" then %sysfunc(compress(&&&val&i,'$ - /'))=1 ;  
     else %sysfunc(compress(&&&val&i,'$ - /'))=0;  
   %end;  
 run;
 data sum_encode;  
   set encoded_data;  
   if %sysfunc(compress(&&&val&Len,'$ - /'))=1 then do;  
     %do x=1 %to %eval(&len-1);  
       %sysfunc(compress(&&&val&x,'$ - /'))=-1;  
     %end;  
   end;  
   drop %sysfunc(compress(&&&val&Len,'$ - /'));  
 run;  
%mend;

Conclusion

A data scientist spends over 70-80% of their time cleaning and preparing data, which means encoding or converting categorical data is a crucial part of their work.

However, it is important to select the right encoding technique to ensure data quality, which is why it is important to understand the different encoding methods.

If you are looking for more information, more specifically, on SAS Macro Definition code, you can refer to the official documentation.

While these techniques are powerful, managing and optimising a complex SAS environment requires deep expertise. As a SAS Gold Partner, Selerity’s team provides end-to-end platform support, administration, and advisory services to help you get the most value from your data.

Contact us today to see how we can simplify your analytics platform and empower your team.

Discover more from Selerity

Subscribe now to keep reading and get access to the full archive.

Continue reading