Composite Datastore Tutorial#

Noisepy is a python software package to process ambient seismic noise cross correlations. This tutorial aims to introduce the use of noisepy for a toy problem on a composite data store. It can be ran locally or on the cloud.

The data is stored on AWS S3:

First, we install the noisepy-seis package

# Uncomment and run this line if the environment doesn't have noisepy already installed:
# ! pip install noisepy-seis 

Warning: NoisePy uses obspy as a core Python module to manipulate seismic data. If you use Google Colab, restart the runtime now for proper installation of obspy on Colab.

Import necessary modules#

Then we import the basic modules

%load_ext autoreload
%autoreload 2
from noisepy.seis import cross_correlate, stack_cross_correlations, __version__    # noisepy core functions
from noisepy.seis.io.asdfstore import ASDFCCStore, ASDFStackStore                  # Object to store ASDF data within noisepy
from noisepy.seis.io.compositerawstore import CompositeRawStore
from noisepy.seis.io.s3store import SCEDCS3DataStore, NCEDCS3DataStore
from noisepy.seis.io.channel_filter_store import channel_filter
from noisepy.seis.io.datatypes import Channel, CCMethod, ConfigParameters, FreqNorm, RmResp, StackMethod, TimeNorm        # Main configuration object
from noisepy.seis.io.channelcatalog import XMLStationChannelCatalog                # Required stationXML handling object
import os
import obspy
import shutil
from datetime import datetime, timezone
from datetimerange import DateTimeRange

from noisepy.seis.io.plotting_modules import plot_all_moveout

print(f"Using NoisePy version {__version__}")

path = "./composite_data" 

os.makedirs(path, exist_ok=True)
cc_data_path = os.path.join(path, "CCF")
stack_data_path = os.path.join(path, "STACK")
S3_STORAGE_OPTIONS = {"s3": {"anon": True}}
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/noisepy/seis/io/utils.py:13: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm
Using NoisePy version 0.1.dev1

We will work with a single day worth of data on SCEDC and NCEDC. The continuous data is organized with a single day and channel per miniseed. For this example, you can choose any year since 2002. We will just cross correlate a single day.

# SCEDC S3 bucket common URL characters for that day.
SCEDC_DATA = "s3://scedc-pds/continuous_waveforms/"
NCEDC_DATA = "s3://ncedc-pds/continuous_waveforms/NC/"

SCEDC_STATION_XML = "s3://scedc-pds/FDSNstationXML/CI/"  
NCEDC_STATION_XML = "s3://ncedc-pds/FDSNstationXML/NC/"

# timeframe for analysis
start = datetime(2012, 1, 1, tzinfo=timezone.utc)
end = datetime(2012, 1, 3, tzinfo=timezone.utc)
timerange = DateTimeRange(start, end)
print(timerange)
2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000

Ambient Noise Project Configuration#

We prepare the configuration of the workflow by declaring and storing parameters into the ConfigParameters() object and/or editing the config.yml file.

# Initialize ambient noise workflow configuration
config = ConfigParameters() # default config parameters which can be customized

Customize the job parameters below:

config.start_date = start
config.end_date = end
config.acorr_only = False # only perform auto-correlation or not
config.xcorr_only = True # only perform cross-correlation or not

config.inc_hours = 24 # INC_HOURS is used in hours (integer) as the 
        #chunk of time that the paralelliztion will work.
        # data will be loaded in memory, so reduce memory with smaller 
        # inc_hours if there are over 400+ stations.
        # At regional scale for SCEDC, we can afford 20Hz data and inc_hour 
        # being a day of data.

# pre-processing parameters
config.sampling_rate = 20  # (int) Sampling rate in Hz of desired processing (it can be different than the data sampling rate)
config.single_freq = False
config.cc_len = 3600  # (float) basic unit of data length for fft (sec)
config.step = 1800.0  # (float) overlapping between each cc_len (sec)

config.ncomp = 3  # 1 or 3 component data (needed to decide whether do rotation)

config.stationxml = False  # station.XML file used to remove instrument response for SAC/miniseed data
      # If True, the stationXML file is assumed to be provided.
config.rm_resp = RmResp.INV  # select 'no' to not remove response and use 'inv' if you use the stationXML,'spectrum',

############## NOISE PRE-PROCESSING ##################
config.freqmin, config.freqmax = 0.05, 2.0  # broad band filtering of the data before cross correlation
config.max_over_std  = 10  # threshold to remove window of bad signals: set it to 10*9 if prefer not to remove them

################### SPECTRAL NORMALIZATION ############
config.freq_norm = FreqNorm.RMA  # choose between "rma" for a soft whitening or "no" for no whitening. Pure whitening is not implemented correctly at this point.
config.smoothspect_N = 10  # moving window length to smooth spectrum amplitude (points)
    # here, choose smoothspect_N for the case of a strict whitening (e.g., phase_only)

#################### TEMPORAL NORMALIZATION ##########
config.time_norm = TimeNorm.ONE_BIT # 'no' for no normalization, or 'rma', 'one_bit' for normalization in time domain,
config.smooth_N = 10  # moving window length for time domain normalization if selected (points)

############ cross correlation ##############
config.cc_method = CCMethod.XCORR # 'xcorr' for pure cross correlation OR 'deconv' for deconvolution;
    # FOR "COHERENCY" PLEASE set freq_norm to "rma", time_norm to "no" and cc_method to "xcorr"

# OUTPUTS:
config.substack = True  # True = smaller stacks within the time chunk. False: it will stack over inc_hours
config.substack_windows = 1  # how long to stack over (for monitoring purpose)
    # if substack=True, substack_windows=2, then you pre-stack every 2 correlation windows.
    # for instance: substack=True, substack_windows=1 means that you keep ALL of the correlations
    # if substack=False, the cross correlation will be stacked over the inc_hour window

### For monitoring applications ####
## we recommend substacking ever 2-4 cross correlations and storing the substacks
# e.g. 
# config.substack = True 
# config.substack_windows = 4

config.maxlag = 200  # lags of cross-correlation to save (sec)

# the network list that are used here
config.networks = ['CI', 'NC']
# For this tutorial make sure the previous run is empty
#os.system(f"rm -rf {cc_data_path}")
if os.path.exists(cc_data_path):
    shutil.rmtree(cc_data_path)

Step 1: Cross-correlation#

In this instance, we read directly the data from the scedc bucket into the cross correlation code. We are not attempting to recreate a data store. Therefore we go straight to step 1 of the cross correlations.

We first declare the data and cross correlation stores

scedc_catalog = XMLStationChannelCatalog(SCEDC_STATION_XML, 
                                         storage_options=S3_STORAGE_OPTIONS)
ncedc_catalog = XMLStationChannelCatalog(NCEDC_STATION_XML, "{network}.{name}.xml", 
                                         storage_options=S3_STORAGE_OPTIONS)

scedc_store = SCEDCS3DataStore(SCEDC_DATA, scedc_catalog,  
                               channel_filter(["CI"], ["VES", "SVD", "BBR"], ["BH?", "HH?"]), 
                               timerange, storage_options=S3_STORAGE_OPTIONS)

ncedc_store = NCEDCS3DataStore(NCEDC_DATA, ncedc_catalog, 
                               channel_filter(["NC"], ["BBGB", "AFD", "GDXB"], ["BH?", "HH?"]), 
                               timerange, storage_options=S3_STORAGE_OPTIONS)

raw_store = CompositeRawStore({"CI": scedc_store, 
                               "NC": ncedc_store}) # Composite store for reading data from both SCEDC and NCEDC
cc_store = ASDFCCStore(cc_data_path)               # Store for writing CC data

get the time range of the data in the data store inventory

span = raw_store.get_timespans()
print(span)
[2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000, 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000]

Get the channels available during a given time spane

channel_list=raw_store.get_channels(span[1])
print(channel_list)
2025-12-02 21:21:43,901 140408732842880 INFO utils.log_raw(): TIMING:  0.559 secs for Listing 1731 files from s3://scedc-pds/continuous_waveforms/2012/2012_002/
2025-12-02 21:21:43,937 140408732842880 INFO utils.log_raw(): TIMING:  0.036 secs for Init: 1 timespans and 18 channels
2025-12-02 21:21:44,067 140407027525312 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://scedc-pds/FDSNstationXML/CI/CI_BBR.xml
2025-12-02 21:21:44,206 140406775871168 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://scedc-pds/FDSNstationXML/CI/CI_VES.xml
2025-12-02 21:21:44,210 140407010744000 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://scedc-pds/FDSNstationXML/CI/CI_SVD.xml
2025-12-02 21:21:47,536 140408732842880 INFO s3store.get_channels(): Getting 18 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:21:47,755 140408732842880 INFO utils.log_raw(): TIMING:  0.173 secs for Listing 804 files from s3://ncedc-pds/continuous_waveforms/NC/2012/2012.002/
2025-12-02 21:21:47,772 140408732842880 INFO utils.log_raw(): TIMING:  0.017 secs for Init: 1 timespans and 9 channels
2025-12-02 21:21:47,822 140406507435712 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://ncedc-pds/FDSNstationXML/NC/NC.AFD.xml
2025-12-02 21:21:47,855 140406731826880 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://ncedc-pds/FDSNstationXML/NC/NC.GDXB.xml
2025-12-02 21:21:47,869 140407026476736 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://ncedc-pds/FDSNstationXML/NC/NC.BBGB.xml
2025-12-02 21:21:48,210 140408732842880 INFO s3store.get_channels(): Getting 9 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
[CI.BBR.BHE, CI.BBR.BHN, CI.BBR.BHZ, CI.BBR.HHE, CI.BBR.HHN, CI.BBR.HHZ, CI.SVD.BHE, CI.SVD.BHN, CI.SVD.BHZ, CI.SVD.HHE, CI.SVD.HHN, CI.SVD.HHZ, CI.VES.BHE, CI.VES.BHN, CI.VES.BHZ, CI.VES.HHE, CI.VES.HHN, CI.VES.HHZ, NC.AFD.HHE, NC.AFD.HHN, NC.AFD.HHZ, NC.BBGB.HHE, NC.BBGB.HHN, NC.BBGB.HHZ, NC.GDXB.HHE, NC.GDXB.HHN, NC.GDXB.HHZ]

Perform the cross correlation#

The data will be pulled from SCEDC & NCEDC, cross correlated, and stored locally if this notebook is ran locally. If you are re-calculating, we recommend to clear the old cc_store.

# we also define a channel pair filter that limits cross-correlation to stations pairs closer than 600 km
def pair_filter(src: Channel, rec: Channel) -> bool:
    latS = src.station.lat
    lonS = src.station.lon
    latR = rec.station.lat
    lonR = rec.station.lon
    dist, _, _ = obspy.geodetics.base.gps2dist_azimuth(latS, lonS, latR, lonR)
    dist /= 1e3 # to km
    if dist <= 600:
        return True
    else:
        return False
cross_correlate(raw_store, config, cc_store, pair_filter=pair_filter)
2025-12-02 21:21:48,268 140408732842880 INFO correlate.cross_correlate(): Starting Cross-Correlation with 4 cores
2025-12-02 21:21:48,829 140408732842880 INFO utils.log_raw(): TIMING:  0.560 secs for Listing 1731 files from s3://scedc-pds/continuous_waveforms/2012/2012_001/
2025-12-02 21:21:48,864 140408732842880 INFO utils.log_raw(): TIMING:  0.035 secs for Init: 2 timespans and 36 channels
2025-12-02 21:21:48,866 140408732842880 INFO s3store.get_channels(): Getting 18 channels for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000
2025-12-02 21:21:49,015 140408732842880 INFO utils.log_raw(): TIMING:  0.138 secs for Listing 804 files from s3://ncedc-pds/continuous_waveforms/NC/2012/2012.001/
2025-12-02 21:21:49,032 140408732842880 INFO utils.log_raw(): TIMING:  0.018 secs for Init: 2 timespans and 18 channels
2025-12-02 21:21:49,034 140408732842880 INFO s3store.get_channels(): Getting 9 channels for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000
2025-12-02 21:21:49,036 140408732842880 INFO utils.log_raw(): TIMING CC Main:  0.767 secs for get 27 channels
2025-12-02 21:21:49,046 140408732842880 INFO correlate.cc_timespan(): Checking for stations already done: 17 pairs
2025-12-02 21:21:49,054 140408732842880 INFO utils.log_raw(): TIMING CC Main:  0.017 secs for check for 6 stations already done (warm up cache)
2025-12-02 21:21:49,056 140408732842880 INFO utils.log_raw(): TIMING CC Main:  0.002 secs for check for stations already done
2025-12-02 21:21:49,057 140408732842880 INFO correlate.cc_timespan(): Still need to process: 6/6 stations, 27/27 channels, 17/17 pairs for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000
2025-12-02 21:21:51,315 140408732842880 INFO correlate._filter_channel_data(): Filtered to 18/27 channels with sampling rate >= 20.0
2025-12-02 21:21:51,318 140408732842880 INFO utils.log_raw(): TIMING CC Main:  2.262 secs for Read channel data: 18 channels
2025-12-02 21:21:53,102 140406507435712 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:53,186 140406490654400 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:53,315 140406289327808 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:54,837 140406731826880 INFO noise_module.preprocess_raw(): removing response for NC.BBGB..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:54,875 140406473873088 INFO noise_module.preprocess_raw(): removing response for NC.BBGB..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:54,902 140406775871168 INFO noise_module.preprocess_raw(): removing response for NC.BBGB..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:54,958 140406306109120 INFO noise_module.preprocess_raw(): removing response for NC.AFD..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:55,054 140406272546496 INFO noise_module.preprocess_raw(): removing response for NC.AFD..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:59,332 140406490654400 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:21:59,345 140406507435712 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:00,722 140406289327808 INFO noise_module.preprocess_raw(): removing response for NC.AFD..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:05,959 140406490654400 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:05,988 140406507435712 INFO noise_module.preprocess_raw(): removing response for CI.VES..BHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:06,326 140406306109120 INFO noise_module.preprocess_raw(): removing response for CI.VES..BHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:06,403 140406473873088 INFO noise_module.preprocess_raw(): removing response for CI.VES..BHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:07,749 140406272546496 INFO noise_module.preprocess_raw(): removing response for NC.GDXB..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
 WARNING (norm_resp): computed and reported sensitivities differ by more than 5 percent. 
	 Execution continuing.
2025-12-02 21:22:08,008 140406731826880 INFO noise_module.preprocess_raw(): removing response for NC.GDXB..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:08,118 140406775871168 INFO noise_module.preprocess_raw(): removing response for NC.GDXB..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
 WARNING (norm_resp): computed and reported sensitivities differ by more than 5 percent. 
	 Execution continuing.
 WARNING (norm_resp): computed and reported sensitivities differ by more than 5 percent. 
	 Execution continuing.
2025-12-02 21:22:13,915 140408732842880 INFO utils.log_raw(): TIMING CC Main: 22.597 secs for Preprocess: 18 channels
2025-12-02 21:22:13,916 140408732842880 INFO correlate.check_memory(): Require  0.22gb memory for cross correlations
2025-12-02 21:22:15,564 140408732842880 INFO utils.log_raw(): TIMING CC Main:  1.647 secs for Compute FFTs: 18 channels
2025-12-02 21:22:15,568 140408732842880 INFO correlate.cc_timespan(): Starting CC with 17 station pairs
2025-12-02 21:22:21,000 140408732842880 INFO utils.log_raw(): TIMING CC Main:  5.431 secs for Correlate and write to store
2025-12-02 21:22:21,180 140408732842880 INFO utils.log_raw(): TIMING CC Main: 32.911 secs for Process the chunk of 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000
2025-12-02 21:22:21,191 140408732842880 INFO s3store.get_channels(): Getting 18 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:21,200 140408732842880 INFO s3store.get_channels(): Getting 9 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:21,203 140408732842880 INFO utils.log_raw(): TIMING CC Main:  0.012 secs for get 27 channels
2025-12-02 21:22:21,212 140408732842880 INFO correlate.cc_timespan(): Checking for stations already done: 17 pairs
2025-12-02 21:22:21,214 140408732842880 INFO utils.log_raw(): TIMING CC Main:  0.011 secs for check for 6 stations already done (warm up cache)
2025-12-02 21:22:21,216 140408732842880 INFO utils.log_raw(): TIMING CC Main:  0.002 secs for check for stations already done
2025-12-02 21:22:21,217 140408732842880 INFO correlate.cc_timespan(): Still need to process: 6/6 stations, 27/27 channels, 17/17 pairs for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:23,539 140408732842880 INFO correlate._filter_channel_data(): Filtered to 18/27 channels with sampling rate >= 20.0
2025-12-02 21:22:23,542 140408732842880 INFO utils.log_raw(): TIMING CC Main:  2.325 secs for Read channel data: 18 channels
2025-12-02 21:22:25,578 140406507435712 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:25,730 140406490654400 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:25,736 140406289327808 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:27,034 140406272546496 INFO noise_module.preprocess_raw(): removing response for NC.BBGB..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:27,198 140406731826880 INFO noise_module.preprocess_raw(): removing response for NC.BBGB..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:27,210 140406473873088 INFO noise_module.preprocess_raw(): removing response for NC.BBGB..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:27,221 140406306109120 INFO noise_module.preprocess_raw(): removing response for NC.AFD..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:27,553 140406775871168 INFO noise_module.preprocess_raw(): removing response for NC.AFD..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:31,977 140406289327808 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:32,314 140406490654400 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:33,070 140406507435712 INFO noise_module.preprocess_raw(): removing response for NC.AFD..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:37,655 140406272546496 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:37,830 140406731826880 INFO noise_module.preprocess_raw(): removing response for CI.VES..BHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:38,514 140406490654400 INFO noise_module.preprocess_raw(): removing response for CI.VES..BHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:38,642 140406289327808 INFO noise_module.preprocess_raw(): removing response for CI.VES..BHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:40,292 140406473873088 INFO noise_module.preprocess_raw(): removing response for NC.GDXB..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
 WARNING (norm_resp): computed and reported sensitivities differ by more than 5 percent. 
	 Execution continuing.
2025-12-02 21:22:40,692 140406775871168 INFO noise_module.preprocess_raw(): removing response for NC.GDXB..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2025-12-02 21:22:40,734 140406306109120 INFO noise_module.preprocess_raw(): removing response for NC.GDXB..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
 WARNING (norm_resp): computed and reported sensitivities differ by more than 5 percent. 
	 Execution continuing.
 WARNING (norm_resp): computed and reported sensitivities differ by more than 5 percent. 
	 Execution continuing.
2025-12-02 21:22:46,181 140408732842880 INFO utils.log_raw(): TIMING CC Main: 22.639 secs for Preprocess: 18 channels
2025-12-02 21:22:46,182 140408732842880 INFO correlate.check_memory(): Require  0.22gb memory for cross correlations
2025-12-02 21:22:47,846 140408732842880 INFO utils.log_raw(): TIMING CC Main:  1.663 secs for Compute FFTs: 18 channels
2025-12-02 21:22:47,851 140408732842880 INFO correlate.cc_timespan(): Starting CC with 17 station pairs
2025-12-02 21:22:53,168 140408732842880 INFO utils.log_raw(): TIMING CC Main:  5.317 secs for Correlate and write to store
2025-12-02 21:22:53,347 140408732842880 INFO utils.log_raw(): TIMING CC Main: 32.157 secs for Process the chunk of 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:53,357 140408732842880 INFO utils.log_raw(): TIMING CC Main: 65.088 secs for Step 1 in total with 4 cores

The cross correlations are saved as a single file for each channel pair and each increment of inc_hours. We now will stack all the cross correlations over all time chunk and look at all station pairs results.

Step 2: Stack the cross correlation#

We now create the stack stores. Because this tutorial runs locally, we will use an ASDF stack store to output the data. ASDF is a data container in HDF5 that is used in full waveform modeling and inversion. H5 behaves well locally.

Each station pair will have 1 H5 file with all components of the cross correlations. While this produces many H5 files, it has come down to the noisepy team’s favorite option:

  1. the thread-safe installation of HDF5 is not trivial

  2. the choice of grouping station pairs within a single file is not flexible to a broad audience of users.

# open a new cc store in read-only mode since we will be doing parallel access for stacking
cc_store = ASDFCCStore(cc_data_path, mode="r")
stack_store = ASDFStackStore(stack_data_path)

Configure the stacking#

There are various methods for optimal stacking. We refern to Yang et al (2022) for a discussion and comparison of the methods:

Yang X, Bryan J, Okubo K, Jiang C, Clements T, Denolle MA. Optimal stacking of noise cross-correlation functions. Geophysical Journal International. 2023 Mar;232(3):1600-18. https://doi.org/10.1093/gji/ggac410

Users have the choice to implement linear, phase weighted stacks pws (Schimmel et al, 1997), robust stacking (Yang et al, 2022), acf autocovariance filter (Nakata et al, 2019), nroot stacking. Users may choose the stacking method of their choice by entering the strings in config.stack_method.

If chosen all, the current code only ouputs linear, pws, robust since nroot is less common and acf is computationally expensive.

config.stack_method = StackMethod.LINEAR
method_list = [method for method in dir(StackMethod) if not method.startswith("__")]
print(method_list)
['ALL', 'AUTO_COVARIANCE', 'LINEAR', 'NROOT', 'PWS', 'ROBUST', 'SELECTIVE']
cc_store.get_station_pairs()
[(NC.BBGB, NC.BBGB),
 (NC.BBGB, NC.GDXB),
 (NC.AFD, NC.GDXB),
 (NC.BBGB, CI.BBR),
 (NC.AFD, NC.AFD),
 (NC.BBGB, CI.SVD),
 (NC.AFD, CI.VES),
 (NC.BBGB, CI.VES),
 (NC.BBGB, NC.AFD),
 (CI.SVD, CI.BBR),
 (CI.BBR, CI.BBR),
 (NC.GDXB, NC.GDXB),
 (CI.SVD, CI.SVD),
 (CI.VES, NC.GDXB),
 (CI.BBR, CI.VES),
 (CI.SVD, CI.VES),
 (CI.VES, CI.VES)]
stack_cross_correlations(cc_store, stack_store, config)
2025-12-02 21:22:53,504 140408732842880 INFO stack.initializer(): Station pairs: 17
2025-12-02 21:22:57,130 140247750970240 INFO stack.stack_store_pair(): Stacking CI.BBR_CI.BBR/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,154 140568821394304 INFO stack.stack_store_pair(): Stacking CI.BBR_CI.VES/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,179 139640273259392 INFO stack.stack_store_pair(): Stacking CI.SVD_CI.BBR/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,198 140247750970240 INFO utils.log_raw(): TIMING:  0.062 secs for loading CCF data
2025-12-02 21:22:57,199 140210006608768 INFO stack.stack_store_pair(): Stacking CI.SVD_CI.SVD/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,216 140247750970240 INFO utils.log_raw(): TIMING:  0.018 secs for stack/rotate all station pairs (CI.BBR, CI.BBR)
2025-12-02 21:22:57,244 140247750970240 INFO utils.log_raw(): TIMING:  0.027 secs for writing stack pair (CI.BBR, CI.BBR)
2025-12-02 21:22:57,245 140247750970240 INFO stack.stack_store_pair(): Stacking CI.SVD_CI.VES/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,249 140568821394304 INFO utils.log_raw(): TIMING:  0.089 secs for loading CCF data
2025-12-02 21:22:57,266 140568821394304 INFO utils.log_raw(): TIMING:  0.017 secs for stack/rotate all station pairs (CI.BBR, CI.VES)
2025-12-02 21:22:57,269 140210006608768 INFO utils.log_raw(): TIMING:  0.064 secs for loading CCF data
2025-12-02 21:22:57,281 139640273259392 INFO utils.log_raw(): TIMING:  0.097 secs for loading CCF data
2025-12-02 21:22:57,285 140210006608768 INFO utils.log_raw(): TIMING:  0.016 secs for stack/rotate all station pairs (CI.SVD, CI.SVD)
2025-12-02 21:22:57,299 139640273259392 INFO utils.log_raw(): TIMING:  0.018 secs for stack/rotate all station pairs (CI.SVD, CI.BBR)
2025-12-02 21:22:57,316 140210006608768 INFO utils.log_raw(): TIMING:  0.029 secs for writing stack pair (CI.SVD, CI.SVD)
2025-12-02 21:22:57,317 140210006608768 INFO stack.stack_store_pair(): Stacking CI.VES_CI.VES/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,339 140247750970240 INFO utils.log_raw(): TIMING:  0.088 secs for loading CCF data
2025-12-02 21:22:57,341 140568821394304 INFO utils.log_raw(): TIMING:  0.075 secs for writing stack pair (CI.BBR, CI.VES)
2025-12-02 21:22:57,344 140568821394304 INFO stack.stack_store_pair(): Stacking CI.VES_NC.GDXB/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,359 140247750970240 INFO utils.log_raw(): TIMING:  0.020 secs for stack/rotate all station pairs (CI.SVD, CI.VES)
2025-12-02 21:22:57,372 139640273259392 INFO utils.log_raw(): TIMING:  0.072 secs for writing stack pair (CI.SVD, CI.BBR)
2025-12-02 21:22:57,373 139640273259392 INFO stack.stack_store_pair(): Stacking NC.AFD_CI.VES/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,376 140210006608768 INFO utils.log_raw(): TIMING:  0.054 secs for loading CCF data
2025-12-02 21:22:57,390 140210006608768 INFO utils.log_raw(): TIMING:  0.014 secs for stack/rotate all station pairs (CI.VES, CI.VES)
2025-12-02 21:22:57,414 140210006608768 INFO utils.log_raw(): TIMING:  0.023 secs for writing stack pair (CI.VES, CI.VES)
2025-12-02 21:22:57,416 140210006608768 INFO stack.stack_store_pair(): Stacking NC.AFD_NC.AFD/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,430 140568821394304 INFO utils.log_raw(): TIMING:  0.081 secs for loading CCF data
2025-12-02 21:22:57,438 140247750970240 INFO utils.log_raw(): TIMING:  0.079 secs for writing stack pair (CI.SVD, CI.VES)
2025-12-02 21:22:57,439 140247750970240 INFO stack.stack_store_pair(): Stacking NC.AFD_NC.GDXB/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,450 140568821394304 INFO utils.log_raw(): TIMING:  0.021 secs for stack/rotate all station pairs (CI.VES, NC.GDXB)
2025-12-02 21:22:57,468 139640273259392 INFO utils.log_raw(): TIMING:  0.089 secs for loading CCF data
2025-12-02 21:22:57,471 140210006608768 INFO utils.log_raw(): TIMING:  0.052 secs for loading CCF data
2025-12-02 21:22:57,484 140210006608768 INFO utils.log_raw(): TIMING:  0.013 secs for stack/rotate all station pairs (NC.AFD, NC.AFD)
2025-12-02 21:22:57,486 139640273259392 INFO utils.log_raw(): TIMING:  0.017 secs for stack/rotate all station pairs (NC.AFD, CI.VES)
2025-12-02 21:22:57,510 140210006608768 INFO utils.log_raw(): TIMING:  0.025 secs for writing stack pair (NC.AFD, NC.AFD)
2025-12-02 21:22:57,512 140210006608768 INFO stack.stack_store_pair(): Stacking NC.BBGB_CI.BBR/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,521 140247750970240 INFO utils.log_raw(): TIMING:  0.077 secs for loading CCF data
2025-12-02 21:22:57,530 140568821394304 INFO utils.log_raw(): TIMING:  0.079 secs for writing stack pair (CI.VES, NC.GDXB)
2025-12-02 21:22:57,531 140568821394304 INFO stack.stack_store_pair(): Stacking NC.BBGB_CI.SVD/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,539 140247750970240 INFO utils.log_raw(): TIMING:  0.018 secs for stack/rotate all station pairs (NC.AFD, NC.GDXB)
2025-12-02 21:22:57,566 139640273259392 INFO utils.log_raw(): TIMING:  0.080 secs for writing stack pair (NC.AFD, CI.VES)
2025-12-02 21:22:57,567 139640273259392 INFO stack.stack_store_pair(): Stacking NC.BBGB_CI.VES/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,601 140210006608768 INFO utils.log_raw(): TIMING:  0.085 secs for loading CCF data
2025-12-02 21:22:57,610 140568821394304 INFO utils.log_raw(): TIMING:  0.075 secs for loading CCF data
2025-12-02 21:22:57,619 140247750970240 INFO utils.log_raw(): TIMING:  0.080 secs for writing stack pair (NC.AFD, NC.GDXB)
2025-12-02 21:22:57,621 140247750970240 INFO stack.stack_store_pair(): Stacking NC.BBGB_NC.AFD/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,622 140210006608768 INFO utils.log_raw(): TIMING:  0.021 secs for stack/rotate all station pairs (NC.BBGB, CI.BBR)
2025-12-02 21:22:57,628 140568821394304 INFO utils.log_raw(): TIMING:  0.018 secs for stack/rotate all station pairs (NC.BBGB, CI.SVD)
2025-12-02 21:22:57,657 139640273259392 INFO utils.log_raw(): TIMING:  0.085 secs for loading CCF data
2025-12-02 21:22:57,673 139640273259392 INFO utils.log_raw(): TIMING:  0.017 secs for stack/rotate all station pairs (NC.BBGB, CI.VES)
2025-12-02 21:22:57,701 140210006608768 INFO utils.log_raw(): TIMING:  0.079 secs for writing stack pair (NC.BBGB, CI.BBR)
2025-12-02 21:22:57,701 140247750970240 INFO utils.log_raw(): TIMING:  0.075 secs for loading CCF data
2025-12-02 21:22:57,702 140210006608768 INFO stack.stack_store_pair(): Stacking NC.BBGB_NC.BBGB/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,706 140568821394304 INFO utils.log_raw(): TIMING:  0.078 secs for writing stack pair (NC.BBGB, CI.SVD)
2025-12-02 21:22:57,707 140568821394304 INFO stack.stack_store_pair(): Stacking NC.BBGB_NC.GDXB/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,722 140247750970240 INFO utils.log_raw(): TIMING:  0.020 secs for stack/rotate all station pairs (NC.BBGB, NC.AFD)
2025-12-02 21:22:57,759 139640273259392 INFO utils.log_raw(): TIMING:  0.086 secs for writing stack pair (NC.BBGB, CI.VES)
2025-12-02 21:22:57,762 139640273259392 INFO stack.stack_store_pair(): Stacking NC.GDXB_NC.GDXB/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2025-12-02 21:22:57,767 140210006608768 INFO utils.log_raw(): TIMING:  0.057 secs for loading CCF data
2025-12-02 21:22:57,779 140210006608768 INFO utils.log_raw(): TIMING:  0.012 secs for stack/rotate all station pairs (NC.BBGB, NC.BBGB)
2025-12-02 21:22:57,787 140568821394304 INFO utils.log_raw(): TIMING:  0.077 secs for loading CCF data
2025-12-02 21:22:57,801 140247750970240 INFO utils.log_raw(): TIMING:  0.079 secs for writing stack pair (NC.BBGB, NC.AFD)
2025-12-02 21:22:57,803 140568821394304 INFO utils.log_raw(): TIMING:  0.016 secs for stack/rotate all station pairs (NC.BBGB, NC.GDXB)
2025-12-02 21:22:57,807 140210006608768 INFO utils.log_raw(): TIMING:  0.027 secs for writing stack pair (NC.BBGB, NC.BBGB)
2025-12-02 21:22:57,818 139640273259392 INFO utils.log_raw(): TIMING:  0.050 secs for loading CCF data
2025-12-02 21:22:57,826 139640273259392 INFO utils.log_raw(): TIMING:  0.008 secs for stack/rotate all station pairs (NC.GDXB, NC.GDXB)
2025-12-02 21:22:57,841 139640273259392 INFO utils.log_raw(): TIMING:  0.015 secs for writing stack pair (NC.GDXB, NC.GDXB)
2025-12-02 21:22:57,853 140568821394304 INFO utils.log_raw(): TIMING:  0.049 secs for writing stack pair (NC.BBGB, NC.GDXB)
2025-12-02 21:22:58,464 140408732842880 INFO utils.log_raw(): TIMING:  4.964 secs for step 2 in total

QC_1 of the cross correlations for Imaging#

We now explore the quality of the cross correlations. We plot the moveout of the cross correlations, filtered in some frequency band.

cc_store.get_station_pairs()
[(NC.BBGB, NC.BBGB),
 (NC.BBGB, NC.GDXB),
 (NC.AFD, NC.GDXB),
 (NC.BBGB, CI.BBR),
 (NC.AFD, NC.AFD),
 (NC.BBGB, CI.SVD),
 (NC.AFD, CI.VES),
 (NC.BBGB, CI.VES),
 (NC.BBGB, NC.AFD),
 (CI.SVD, CI.BBR),
 (CI.BBR, CI.BBR),
 (NC.GDXB, NC.GDXB),
 (CI.SVD, CI.SVD),
 (CI.VES, NC.GDXB),
 (CI.BBR, CI.VES),
 (CI.SVD, CI.VES),
 (CI.VES, CI.VES)]
pairs = stack_store.get_station_pairs()
print(f"Found {len(pairs)} station pairs")
sta_stacks = stack_store.read_bulk(timerange, pairs) # no timestamp used in ASDFStackStore
Found 17 station pairs
2025-12-02 21:22:59,009 140408732842880 INFO utils.log_raw(): TIMING:  0.488 secs for loading 17 stacks
plot_all_moveout(sta_stacks, 'Allstack_linear', 0.1, 0.2, 'ZZ', 1)
2025-12-02 21:22:59,036 140408732842880 INFO plotting_modules.plot_all_moveout(): Plottting: Allstack_linear, 17 station pairs
_images/f3f6ce44290f31e287430b388c3c9e34e090e0fb8b683ad81695754730be9901.png