Submitted by AutoModerator t3_10cn8pw in MachineLearning
jfacowns t1_j550f70 wrote
XGBoost Question around One-Hot Encoding & Get_Dummies in Python
I am working on building a model for NHL (hockey) games and have a spreadsheet with a ton of advanced stats from teams, dates they played and so on.
All of my data in this spreadheet is categorized as a float. I am trying to add in a few columns of categorical data as I feel it could help the model.
The categorical columns have data that determines if the home team or the away team is playing on back to back days.
I am trying to determine here is one-hot encoding is best for this approach or if I'm misunderstanding how it works as a whole.
Here is some code
NHLData = pd.read_excel('C:\\Temp\\NHL_ModelBuilder.xlsx')
data.drop(['HomeTeam', 'AwayTeam','Result'],
axis=1, inplace=True)
NHLData = pd.get_dummies(NHLData, columns= ['B2B_Home', 'B2B_Away'])
Does this make sense? Am i on the right track here?
If i do NHLData.head() I can see the one-hot encoded columns but when I do NHLData.dtypes() I see this:
B2B_Home_0 uint8
B2B_Home_1 uint8
B2B_Away_0 uint8
B2B_Away_1 uint8
Should these not be objects?
Viewing a single comment thread. View all comments