Assessment of Missingness


NMAR Analysis

I believe the column HURRICANE.NAMES is likely NMAR (Not Missing At Random). This column is 95.31% missing, and the missingness is directly related to the value itself - hurricane names are only recorded when the outage is actually caused by a named hurricane. The absence of a hurricane name is meaningful information indicating the outage was not hurricane-related.

To make this column MAR (Missing At Random), we would need additional data such as:

  • Wind speed measurements during the outage
  • Barometric pressure readings
  • Official weather service classifications
  • Storm tracking data that could indicate unnamed storm systems

Missingness Dependency

I investigated whether the missingness of CAUSE.CATEGORY.DETAIL depends on other columns using permutation testing.

Testing CAUSE.CATEGORY dependency:

  • Observed difference in missingness rates: 0.885
  • P-value: 0.000 (< 0.05)
  • Conclusion: The missingness of detail strongly depends on the cause category, which makes intuitive sense as some cause categories naturally have more detailed subcategorizations than others.

Testing OUTAGE_DAYOFWEEK dependency:

  • Observed difference in missingness rates: 0.122
  • P-value: 0.088 (> 0.05)
  • Conclusion: The missingness of detail does not significantly depend on the day of the week, suggesting no systematic reporting bias based on when outages occur.