Categorical variables

If we have a categorical variable, we must perform one hot encoding, because even though it takes on numbers on a given range, they might as well be variable names, the don’t cary the subsequent properties and scale of numbers.

Variate: transportation to school with Categories: “1” := WALK “2” := BIKE “3” := BUS

Numbers have the methods >, <, = defined, but our category only has =.

  • 2>1 dosn’t hold for the categories “BIKE” > “WALK” makes no sense. At least this is the intuitive thing, it is true that sometimes models work with very litlle feature engineering.

Real world stochastics sometimes have that flavour:

  • It is like sweeping without looking, you will randomly catch some dust.
  • It is like a goldfish picking stocks to buy and the portfolio making money. These are just random instantiations of stochastic processes.