Viewing a single comment thread. View all comments

jfacowns t1_j550f70 wrote

XGBoost Question around One-Hot Encoding & Get_Dummies in Python

I am working on building a model for NHL (hockey) games and have a spreadsheet with a ton of advanced stats from teams, dates they played and so on.

All of my data in this spreadheet is categorized as a float. I am trying to add in a few columns of categorical data as I feel it could help the model.

The categorical columns have data that determines if the home team or the away team is playing on back to back days.

I am trying to determine here is one-hot encoding is best for this approach or if I'm misunderstanding how it works as a whole.

Here is some code

NHLData = pd.read_excel('C:\\Temp\\NHL_ModelBuilder.xlsx')


data.drop(['HomeTeam', 'AwayTeam','Result'],
      axis=1, inplace=True)


NHLData = pd.get_dummies(NHLData, columns= ['B2B_Home', 'B2B_Away'])

Does this make sense? Am i on the right track here?

If i do NHLData.head() I can see the one-hot encoded columns but when I do NHLData.dtypes() I see this:

B2B_Home_0              uint8
B2B_Home_1              uint8
B2B_Away_0              uint8
B2B_Away_1              uint8

Should these not be objects?

1