Assessment of Missingness
NMAR Analysis
I believe the column HURRICANE.NAMES is likely NMAR (Not Missing At Random). This column is 95.31% missing, and the missingness is directly related to the value itself - hurricane names are only recorded when the outage is actually caused by a named hurricane. The absence of a hurricane name is meaningful information indicating the outage was not hurricane-related.
To make this column MAR (Missing At Random), we would need additional data such as:
- Wind speed measurements during the outage
- Barometric pressure readings
- Official weather service classifications
- Storm tracking data that could indicate unnamed storm systems
Missingness Dependency
I investigated whether the missingness of CAUSE.CATEGORY.DETAIL depends on other columns using permutation testing.
Testing CAUSE.CATEGORY dependency:
- Observed difference in missingness rates: 0.885
- P-value: 0.000 (< 0.05)
- Conclusion: The missingness of detail strongly depends on the cause category, which makes intuitive sense as some cause categories naturally have more detailed subcategorizations than others.
Testing OUTAGE_DAYOFWEEK dependency:
- Observed difference in missingness rates: 0.122
- P-value: 0.088 (> 0.05)
- Conclusion: The missingness of detail does not significantly depend on the day of the week, suggesting no systematic reporting bias based on when outages occur.