Pandas .melt(): A Guide For The Perplexed

pd.melt() can be confusing because of its mediocre documentation and unintuitively-named parameters. It’s nevertheless important for getting dataframes into ‘tidy’ format for plotting purposes. 

First, we’ll discuss the id_vars and value_vars parameters. Pass into id_vars a list of columns that you DO NOT WANT CHANGED. These columns will remain as they are. Pass into value_vars a list of columns you want to be combined into a single column. pd.melt() takes the columns passed into value_vars and automatically generates a new column containing the names of those old columns. By default this column is called ‘variable‘. An additional new column ‘value‘ is generated containing the values from the old columns. 

Imagine you have a dataframe with columns 'asset‘, ‘US Dollar‘, ‘Euro‘, ‘Japanese Yen‘. The values correspond to the price of a cryptoasset like Bitcoin or Ethereum in a quote currency, like USD.

If we run df.melt(id_vars='asset', value_vars=['US Dollar', 'Euro', 'Japanese Yen']), we’ll get back a dataframe with an ‘asset’ column, a ‘variable‘ column containing ‘US Dollar‘, ‘Euro‘, ‘Japanese Yen‘, and a column ‘value‘ containing the value of each cryptoasset in each of the denominations. 

pd.melt() gives you some control over how the columns are renamed. By default pd.melt() will assume that any column not passed into id_vars is to be transformed. So instead of the above code, we could run pd.melt(id_vars='airline', var_name='asset', value_name='asset_price'), which will produce the same dataframe but with more appropriately named columns. 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s