Dromo WebinarLearn how Dromo can solve your data importing problems

Register now

Imputing Missing Values

Replacing missing or null values with substituted values.

Definition

Imputing missing values is a process where missing or null values in a dataset are replaced with substituted values. This can be particularly helpful in ensuring that datasets are complete and can be used effectively in analysis or machine learning models. The substituted values are often calculated based on other data in the dataset.

Example of imputing missing values using JavaScript

Let's consider the following JavaScript array containing data, where age for Jane is missing:

const data = [
  { firstName: "John", lastName: "Doe", age: 30 },
  { firstName: "Jane", lastName: "Smith", age: null },
  { firstName: "Bob", lastName: "Johnson", age: 40 },
];

This JavaScript function imputes the null "age" field by replacing it with an average age:

let totalAge = 0;
let count = 0;
data.forEach((item) => {
  if (item.age !== null) {
    totalAge += item.age;
    count++;
  }
});
let avgAge = totalAge / count;
data.forEach((item) => {
  if (item.age === null) {
    item.age = avgAge;
  }
});

Before

firstNamelastNameage
JohnDoe30
JaneSmithnull
BobJohnson40

After

firstNamelastNameage
JohnDoe30
JaneSmith35
BobJohnson40

Considerations

When imputing null values, consider the following:

  1. Imputation Method: Choose a suitable method to replace the null values. This could be a constant value, a central tendency like mean or median, or a predicted value from a model.
  2. Data Type: Ensure that the replacement value matches the data type of the field.
  3. Data Distribution: Consider the distribution of your data. Using mean for interpolation may not be appropriate for skewed data.