logo
0
0
WeChat Login

Weather By Province (NOAA -> China TPV/ATP)

This project computes province-year climate indicators for China (including Hong Kong, Macao, and Taiwan) using NOAA datasets:

  • TPV: standard deviation of province daily mean temperature within a year.
  • ATP: annual mean of province daily mean temperature.

Data Sources

  • GHCN-D directory: https://www.ncei.noaa.gov/pub/data/ghcn/daily/
  • GHCN-D stations: https://www.ncei.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
  • GHCN-D inventory: https://www.ncei.noaa.gov/pub/data/ghcn/daily/ghcnd-inventory.txt
  • GHCN-D by year: https://www.ncei.noaa.gov/pub/data/ghcn/daily/by_year/{YYYY}.csv.gz
  • GHCN-M v4: https://www.ncei.noaa.gov/pub/data/ghcn/v4/
  • GHCN-M TAVG QCU tar: https://www.ncei.noaa.gov/pub/data/ghcn/v4/ghcnm.tavg.latest.qcu.tar.gz
  • geoBoundaries CHN ADM1 API: https://www.geoboundaries.org/api/current/gbOpen/CHN/ADM1/

Coordinate-to-Province Mapping (when only station coordinates are available)

  1. Parse ghcnd-stations.txt for station ID and lat/lon.
  2. Build station point geometry in EPSG:4326.
  3. Load CHN ADM1 province polygons and run point-in-polygon (within).
  4. For unresolved boundary points, run nearest-polygon fallback with distance threshold (border_buffer_km in config).
  5. Mark unresolved points as unassigned and exclude from province aggregates.

Mapping output file:

  • output/station_province_mapping.csv

Temperature Rules

  • GHCN-D unit: tenths of degree C, divide by 10.
  • Use TAVG first.
  • If TAVG is missing and TMAX/TMIN available, use (TMAX + TMIN) / 2.
  • Remove outliers outside config range.
  • Missing code -9999 is treated as null.

Metrics

For each province-year:

  • tpv_std_daily_tmean_c: std of province daily mean temperature.
  • atp_annual_mean_tmean_c: mean of province daily mean temperature.
  • quality fields: valid_days, valid_station_days, coverage_ratio, quality_flag.

Run

Install dependencies:

pip install -r requirements.txt

Run full pipeline:

python -m src.pipeline.run_pipeline --config config/params.yaml --start-year 2000 --end-year 2025

Outputs

  • output/province_year_tpv_atp.csv
  • output/quality_report.csv
  • output/atp_validation_vs_ghcnm.csv
  • output/station_province_mapping.csv

Notes

  • GHCN-M is used for ATP consistency checking, not for replacing daily TPV/ATP core calculation.
  • Thresholds for strict quality control are in config/params.yaml.

About

weather_by_prov

Language
Python100%