Dromo WebinarLearn how Dromo can solve your data importing problems

Register now

Requiring Values

Ensures specific fields in a dataset are not empty or undefined.

Definition

Requiring values to exist is a data validation procedure that ensures specific fields in a dataset are not empty or undefined. The process identifies fields that have missing or undefined values and flags them with an error message.

It's a crucial step in data cleaning, particularly when specific fields are necessary for further analysis. Failing to handle null values can lead to errors and misinterpretations in data analysis or model training stages.

Example of requiring values using JavaScript

Here's a JavaScript function that adds an error message to each record in a data array where the email field is null or undefined:

function validateNonNull(data, field) {
  return data.map((item) => {
    if (item[field] === null || item[field] === undefined) {
      item.error = `Error: ${field} field is null or undefined`;
    }
    return item;
  });
}
const data = [
  { name: "John", email: "john@email.com" },
  { name: "Jane", email: null },
  { name: "Bob", email: undefined },
];
const validatedData = validateNonNull(data, "email");
console.log(validatedData);

Before

NameEmail
Johnjohn@email.com
Janenull
Bobundefined

After

NameEmailError
Johnjohn@email.com
JanenullError: email field is null or undefined
BobundefinedError: email field is null or undefined

Considerations

  • False positives: Be aware that this method might not catch empty strings, which in many contexts also constitute missing data. You might need to check for empty strings explicitly.
  • Data types: Be sure to consider the data type of the field. For example, a numerical field might be 0, which is different from null or undefined but could be treated as a falsy value in some languages.
  • Trimming Fields: Before requiring non-null values, you might want to trim fields to ensure whitespace isn't causing valid data to be interpreted as null.
  • Imputing Missing Values: If a field has null or undefined values, you might decide to impute (fill in) those values with an estimated value, rather than flagging or removing them.