NoisePy Composite Datastore Tutorial#

Noisepy is a python software package to process ambient seismic noise cross correlations. This tutorial aims to introduce the use of noisepy for a toy problem on a composite data store. It can be ran locally or on the cloud.

The data is stored on AWS S3:

First, we install the noisepy-seis package

# Uncomment and run this line if the environment doesn't have noisepy already installed:
# ! pip install noisepy-seis 

Warning: NoisePy uses obspy as a core Python module to manipulate seismic data. If you use Google Colab, restart the runtime now for proper installation of obspy on Colab.

Import necessary modules#

Then we import the basic modules

%load_ext autoreload
%autoreload 2
from noisepy.seis import cross_correlate, stack_cross_correlations, __version__    # noisepy core functions
from noisepy.seis.io.asdfstore import ASDFCCStore, ASDFStackStore                  # Object to store ASDF data within noisepy
from noisepy.seis.io.numpystore import NumpyCCStore, NumpyStackStore
from noisepy.seis.io.compositerawstore import CompositeRawStore
from noisepy.seis.io.s3store import SCEDCS3DataStore, NCEDCS3DataStore
from noisepy.seis.io.channel_filter_store import channel_filter
from noisepy.seis.io.datatypes import Channel, CCMethod, ConfigParameters, FreqNorm, RmResp, StackMethod, TimeNorm        # Main configuration object
from noisepy.seis.io.channelcatalog import XMLStationChannelCatalog                # Required stationXML handling object
import os
import obspy
import shutil
from datetime import datetime, timezone
from datetimerange import DateTimeRange


from noisepy.seis.io.plotting_modules import plot_all_moveout

print(f"Using NoisePy version {__version__}")

path = "./composite_data" 

os.makedirs(path, exist_ok=True)
cc_data_path = os.path.join(path, "CCF")
stack_data_path = os.path.join(path, "STACK")
S3_STORAGE_OPTIONS = {"s3": {"anon": True}}
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/noisepy/seis/io/utils.py:13: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm
Using NoisePy version 0.1.dev1

We will work with a single day worth of data on SCEDC and NCEDC. The continuous data is organized with a single day and channel per miniseed. For this example, you can choose any year since 2002. We will just cross correlate a single day.

# SCEDC S3 bucket common URL characters for that day.
SCEDC_DATA = "s3://scedc-pds/continuous_waveforms/"
NCEDC_DATA = "s3://ncedc-pds/continuous_waveforms/NC/"

SCEDC_STATION_XML = "s3://scedc-pds/FDSNstationXML/CI/"  
NCEDC_STATION_XML = "s3://ncedc-pds/FDSNstationXML/NC/"

# timeframe for analysis
start = datetime(2012, 1, 1, tzinfo=timezone.utc)
end = datetime(2012, 1, 3, tzinfo=timezone.utc)
timerange = DateTimeRange(start, end)
print(timerange)
2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000

Ambient Noise Project Configuration#

We prepare the configuration of the workflow by declaring and storing parameters into the ConfigParameters() object and/or editing the config.yml file.

# Initialize ambient noise workflow configuration
config = ConfigParameters() # default config parameters which can be customized

Customize the job parameters below:

config.start_date = start
config.end_date = end
config.acorr_only = False # only perform auto-correlation or not
config.xcorr_only = True # only perform cross-correlation or not

config.inc_hours = 24 # INC_HOURS is used in hours (integer) as the 
        #chunk of time that the paralelliztion will work.
        # data will be loaded in memory, so reduce memory with smaller 
        # inc_hours if there are over 400+ stations.
        # At regional scale for SCEDC, we can afford 20Hz data and inc_hour 
        # being a day of data.

# pre-processing parameters
config.samp_freq= 20  # (int) Sampling rate in Hz of desired processing (it can be different than the data sampling rate)
config.single_freq = False
config.cc_len= 3600  # (float) basic unit of data length for fft (sec)
config.step= 1800.0  # (float) overlapping between each cc_len (sec)

config.ncomp = 3  # 1 or 3 component data (needed to decide whether do rotation)

config.stationxml= False  # station.XML file used to remove instrument response for SAC/miniseed data
      # If True, the stationXML file is assumed to be provided.
config.rm_resp= RmResp.INV  # select 'no' to not remove response and use 'inv' if you use the stationXML,'spectrum',


############## NOISE PRE-PROCESSING ##################
config.freqmin,config.freqmax = 0.05,2.0  # broad band filtering of the data before cross correlation
config.max_over_std  = 10  # threshold to remove window of bad signals: set it to 10*9 if prefer not to remove them

################### SPECTRAL NORMALIZATION ############
config.freq_norm= FreqNorm.RMA  # choose between "rma" for a soft whitening or "no" for no whitening. Pure whitening is not implemented correctly at this point.
config.smoothspect_N = 10  # moving window length to smooth spectrum amplitude (points)
    # here, choose smoothspect_N for the case of a strict whitening (e.g., phase_only)


#################### TEMPORAL NORMALIZATION ##########
config.time_norm = TimeNorm.ONE_BIT # 'no' for no normalization, or 'rma', 'one_bit' for normalization in time domain,
config.smooth_N= 10  # moving window length for time domain normalization if selected (points)


############ cross correlation ##############

config.cc_method= CCMethod.XCORR # 'xcorr' for pure cross correlation OR 'deconv' for deconvolution;
    # FOR "COHERENCY" PLEASE set freq_norm to "rma", time_norm to "no" and cc_method to "xcorr"

# OUTPUTS:
config.substack = True  # True = smaller stacks within the time chunk. False: it will stack over inc_hours
config.substack_len = config.cc_len  # how long to stack over (for monitoring purpose): need to be multiples of cc_len
    # if substack=True, substack_len=2*cc_len, then you pre-stack every 2 correlation windows.
    # for instance: substack=True, substack_len=cc_len means that you keep ALL of the correlations
    # if substack=False, the cross correlation will be stacked over the inc_hour window

### For monitoring applications ####
## we recommend substacking ever 2-4 cross correlations and storing the substacks
# e.g. 
# config.substack = True 
# config.substack_len = 4* config.cc_len

config.maxlag= 200  # lags of cross-correlation to save (sec)

# the network list that are used here
config.net_list = ['CI', 'NC']
# For this tutorial make sure the previous run is empty
#os.system(f"rm -rf {cc_data_path}")
if os.path.exists(cc_data_path):
    shutil.rmtree(cc_data_path)

Step 1: Cross-correlation#

In this instance, we read directly the data from the scedc bucket into the cross correlation code. We are not attempting to recreate a data store. Therefore we go straight to step 1 of the cross correlations.

We first declare the data and cross correlation stores

scedc_stations = "RPV,SVD,BBR".split(",")  # SCEDC station
ncedc_stations = "KCT,KRP,KHMB".split(",") # NCEDC station

scedc_catalog = XMLStationChannelCatalog(SCEDC_STATION_XML, 
                                         storage_options=S3_STORAGE_OPTIONS)
ncedc_catalog = XMLStationChannelCatalog(NCEDC_STATION_XML, "{network}.{name}.xml", 
                                         storage_options=S3_STORAGE_OPTIONS)


scedc_store = SCEDCS3DataStore(SCEDC_DATA, scedc_catalog,  channel_filter(["CI"], scedc_stations, 
                                ["BHE", "BHN", "BHZ", "HHE", "HHN", "HHZ"]), 
                               timerange, storage_options=S3_STORAGE_OPTIONS)

ncedc_store = NCEDCS3DataStore(NCEDC_DATA, ncedc_catalog, channel_filter(["NC"], ncedc_stations, 
                                ["BHE", "BHN", "BHZ", "HHE", "HHN", "HHZ"]), 
                               timerange, storage_options=S3_STORAGE_OPTIONS)

raw_store = CompositeRawStore({"CI": scedc_store, 
                               "NC": ncedc_store}) # Composite store for reading data from both SCEDC and NCEDC
cc_store = ASDFCCStore(cc_data_path)               # Store for writing CC data

get the time range of the data in the data store inventory

span = raw_store.get_timespans()
print(span)
[2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000, 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000]

Get the channels available during a given time spane

channel_list=raw_store.get_channels(span[1])
print(channel_list)
2024-06-24 21:34:40,234 140045541878656 INFO utils.log_raw(): TIMING: 0.5540 secs. for Loading 1743 files from s3://scedc-pds/continuous_waveforms/2012/2012_002/
2024-06-24 21:34:40,264 140045541878656 INFO utils.log_raw(): TIMING: 0.0295 secs. for Init: 1 timespans and 12 channels
2024-06-24 21:34:40,392 140043968570944 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://scedc-pds/FDSNstationXML/CI/CI_SVD.xml
2024-06-24 21:34:40,622 140043951789632 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://scedc-pds/FDSNstationXML/CI/CI_BBR.xml
2024-06-24 21:34:43,535 140045541878656 INFO s3store.get_channels(): Getting 12 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000: [CI.BBR.BHE, CI.BBR.BHN, CI.BBR.BHZ, CI.BBR.HHE, CI.BBR.HHN, CI.BBR.HHZ, CI.SVD.BHE, CI.SVD.BHN, CI.SVD.BHZ, CI.SVD.HHE, CI.SVD.HHN, CI.SVD.HHZ]
2024-06-24 21:34:43,741 140045541878656 INFO utils.log_raw(): TIMING: 0.1787 secs. for Loading 804 files from s3://ncedc-pds/continuous_waveforms/NC/2012/2012.002/
2024-06-24 21:34:43,755 140045541878656 INFO utils.log_raw(): TIMING: 0.0145 secs. for Init: 1 timespans and 9 channels
2024-06-24 21:34:43,816 140043708528192 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://ncedc-pds/FDSNstationXML/NC/NC.KRP.xml
2024-06-24 21:34:43,838 140043968570944 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://ncedc-pds/FDSNstationXML/NC/NC.KCT.xml
2024-06-24 21:34:43,840 140043691746880 INFO channelcatalog._get_inventory_from_file(): Reading StationXML file s3://ncedc-pds/FDSNstationXML/NC/NC.KHMB.xml
2024-06-24 21:34:44,448 140045541878656 INFO s3store.get_channels(): Getting 9 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000: [NC.KCT.HHE, NC.KCT.HHN, NC.KCT.HHZ, NC.KHMB.HHE, NC.KHMB.HHN, NC.KHMB.HHZ, NC.KRP.HHE, NC.KRP.HHN, NC.KRP.HHZ]
[CI.BBR.BHE, CI.BBR.BHN, CI.BBR.BHZ, CI.BBR.HHE, CI.BBR.HHN, CI.BBR.HHZ, CI.SVD.BHE, CI.SVD.BHN, CI.SVD.BHZ, CI.SVD.HHE, CI.SVD.HHN, CI.SVD.HHZ, NC.KCT.HHE, NC.KCT.HHN, NC.KCT.HHZ, NC.KHMB.HHE, NC.KHMB.HHN, NC.KHMB.HHZ, NC.KRP.HHE, NC.KRP.HHN, NC.KRP.HHZ]

Perform the cross correlation#

The data will be pulled from SCEDC & NCEDC, cross correlated, and stored locally if this notebook is ran locally. If you are re-calculating, we recommend to clear the old cc_store.

# we also define a channel pair filter that limits cross-correlation to stations pairs closer than 600 km
def pair_filter(src: Channel, rec: Channel) -> bool:
    latS = src.station.lat
    lonS = src.station.lon
    latR = rec.station.lat
    lonR = rec.station.lon
    dist, _, _ = obspy.geodetics.base.gps2dist_azimuth(latS, lonS, latR, lonR)
    dist /= 1e3 # to km
    if dist <= 600:
        return True
    else:
        return False
cross_correlate(raw_store, config, cc_store, pair_filter=pair_filter)
2024-06-24 21:34:44,504 140045541878656 INFO correlate.cross_correlate(): Starting Cross-Correlation with 4 cores
2024-06-24 21:34:44,905 140045541878656 INFO utils.log_raw(): TIMING: 0.4010 secs. for Loading 1743 files from s3://scedc-pds/continuous_waveforms/2012/2012_001/
2024-06-24 21:34:44,934 140045541878656 INFO utils.log_raw(): TIMING: 0.0286 secs. for Init: 2 timespans and 24 channels
2024-06-24 21:34:44,935 140045541878656 INFO s3store.get_channels(): Getting 12 channels for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000: [CI.BBR.BHE, CI.BBR.BHN, CI.BBR.BHZ, CI.BBR.HHE, CI.BBR.HHN, CI.BBR.HHZ, CI.SVD.BHE, CI.SVD.BHN, CI.SVD.BHZ, CI.SVD.HHE, CI.SVD.HHN, CI.SVD.HHZ]
2024-06-24 21:34:45,094 140045541878656 INFO utils.log_raw(): TIMING: 0.1530 secs. for Loading 804 files from s3://ncedc-pds/continuous_waveforms/NC/2012/2012.001/
2024-06-24 21:34:45,108 140045541878656 INFO utils.log_raw(): TIMING: 0.0146 secs. for Init: 2 timespans and 18 channels
2024-06-24 21:34:45,110 140045541878656 INFO s3store.get_channels(): Getting 9 channels for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000: [NC.KCT.HHE, NC.KCT.HHN, NC.KCT.HHZ, NC.KHMB.HHE, NC.KHMB.HHN, NC.KHMB.HHZ, NC.KRP.HHE, NC.KRP.HHN, NC.KRP.HHZ]
2024-06-24 21:34:45,112 140045541878656 INFO utils.log_raw(): TIMING CC Main: 0.6079 secs. for get 21 channels
2024-06-24 21:34:45,118 140045541878656 INFO correlate.cc_timespan(): Checking for stations already done: 9 pairs
2024-06-24 21:34:45,121 140045541878656 INFO utils.log_raw(): TIMING CC Main: 0.0082 secs. for check for 5 stations already done (warm up cache)
2024-06-24 21:34:45,123 140045541878656 INFO utils.log_raw(): TIMING CC Main: 0.0016 secs. for check for stations already done
2024-06-24 21:34:45,123 140045541878656 INFO correlate.cc_timespan(): Still need to process: 5/5 stations, 21/21 channels, 9/9 pairs for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000
2024-06-24 21:34:47,061 140045541878656 INFO correlate._filter_channel_data(): Filtered to 21/21 channels with sampling rate >= 20.0
2024-06-24 21:34:47,062 140045541878656 INFO utils.log_raw(): TIMING CC Main: 1.9392 secs. for Read channel data: 21 channels
2024-06-24 21:34:48,836 140043708528192 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:48,871 140043364591168 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:48,894 140043381372480 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:48,984 140043347809856 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:49,022 140043247154752 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:50,230 140043691746880 INFO noise_module.preprocess_raw(): removing response for CI.BBR..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:50,319 140043230373440 INFO noise_module.preprocess_raw(): removing response for CI.BBR..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:50,411 140043674965568 INFO noise_module.preprocess_raw(): removing response for CI.BBR..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:55,931 140043708528192 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:57,561 140043381372480 INFO noise_module.preprocess_raw(): removing response for CI.SVD..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:57,684 140043347809856 INFO noise_module.preprocess_raw(): removing response for CI.SVD..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:58,102 140043247154752 INFO noise_module.preprocess_raw(): removing response for NC.KCT..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:34:58,470 140043364591168 INFO noise_module.preprocess_raw(): removing response for CI.SVD..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:00,742 140043691746880 INFO noise_module.preprocess_raw(): removing response for NC.KCT..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:00,757 140043674965568 INFO noise_module.preprocess_raw(): removing response for NC.KCT..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:00,894 140043230373440 INFO noise_module.preprocess_raw(): removing response for NC.KHMB..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:03,665 140043708528192 INFO noise_module.preprocess_raw(): removing response for NC.KHMB..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:08,151 140043381372480 INFO noise_module.preprocess_raw(): removing response for NC.KHMB..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:08,481 140043347809856 INFO noise_module.preprocess_raw(): removing response for NC.KRP..HHE | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:09,382 140043364591168 INFO noise_module.preprocess_raw(): removing response for NC.KRP..HHN | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:17,085 140043247154752 INFO noise_module.preprocess_raw(): removing response for NC.KRP..HHZ | 2012-01-01T00:00:00.000000Z - 2012-01-01T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:24,083 140045541878656 INFO utils.log_raw(): TIMING CC Main: 37.0207 secs. for Preprocess: 21 channels
2024-06-24 21:35:24,084 140045541878656 INFO correlate.check_memory(): Require  0.26gb memory for cross correlations
2024-06-24 21:35:25,797 140045541878656 INFO utils.log_raw(): TIMING CC Main: 1.7122 secs. for Compute FFTs: 21 channels
2024-06-24 21:35:25,802 140045541878656 INFO correlate.cc_timespan(): Starting CC with 9 station pairs
2024-06-24 21:35:29,898 140045541878656 INFO utils.log_raw(): TIMING CC Main: 4.0957 secs. for Correlate and write to store
2024-06-24 21:35:30,058 140045541878656 INFO utils.log_raw(): TIMING CC Main: 45.5535 secs. for Process the chunk of 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000
2024-06-24 21:35:30,070 140045541878656 INFO s3store.get_channels(): Getting 12 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000: [CI.BBR.BHE, CI.BBR.BHN, CI.BBR.BHZ, CI.BBR.HHE, CI.BBR.HHN, CI.BBR.HHZ, CI.SVD.BHE, CI.SVD.BHN, CI.SVD.BHZ, CI.SVD.HHE, CI.SVD.HHN, CI.SVD.HHZ]
2024-06-24 21:35:30,076 140045541878656 INFO s3store.get_channels(): Getting 9 channels for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000: [NC.KCT.HHE, NC.KCT.HHN, NC.KCT.HHZ, NC.KHMB.HHE, NC.KHMB.HHN, NC.KHMB.HHZ, NC.KRP.HHE, NC.KRP.HHN, NC.KRP.HHZ]
2024-06-24 21:35:30,079 140045541878656 INFO utils.log_raw(): TIMING CC Main: 0.0091 secs. for get 21 channels
2024-06-24 21:35:30,085 140045541878656 INFO correlate.cc_timespan(): Checking for stations already done: 9 pairs
2024-06-24 21:35:30,086 140045541878656 INFO utils.log_raw(): TIMING CC Main: 0.0072 secs. for check for 5 stations already done (warm up cache)
2024-06-24 21:35:30,088 140045541878656 INFO utils.log_raw(): TIMING CC Main: 0.0019 secs. for check for stations already done
2024-06-24 21:35:30,089 140045541878656 INFO correlate.cc_timespan(): Still need to process: 5/5 stations, 21/21 channels, 9/9 pairs for 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:35:32,412 140045541878656 INFO correlate._filter_channel_data(): Filtered to 21/21 channels with sampling rate >= 20.0
2024-06-24 21:35:32,413 140045541878656 INFO utils.log_raw(): TIMING CC Main: 2.3244 secs. for Read channel data: 21 channels
2024-06-24 21:35:34,029 140043381372480 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:34,108 140043364591168 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:34,144 140043691746880 INFO noise_module.preprocess_raw(): removing response for CI.BBR..BHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:34,312 140043674965568 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:34,345 140043347809856 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv

2024-06-24 21:35:35,575 140043230373440 INFO noise_module.preprocess_raw(): removing response for CI.BBR..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:35,677 140043708528192 INFO noise_module.preprocess_raw(): removing response for CI.BBR..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:35,739 140043247154752 INFO noise_module.preprocess_raw(): removing response for CI.BBR..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:40,649 140043381372480 INFO noise_module.preprocess_raw(): removing response for CI.SVD..BHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:42,434 140043364591168 INFO noise_module.preprocess_raw(): removing response for CI.SVD..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:42,831 140043691746880 INFO noise_module.preprocess_raw(): removing response for CI.SVD..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:42,931 140043347809856 INFO noise_module.preprocess_raw(): removing response for CI.SVD..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:42,994 140043674965568 INFO noise_module.preprocess_raw(): removing response for NC.KCT..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:46,270 140043708528192 INFO noise_module.preprocess_raw(): removing response for NC.KCT..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:46,486 140043247154752 INFO noise_module.preprocess_raw(): removing response for NC.KHMB..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:46,938 140043230373440 INFO noise_module.preprocess_raw(): removing response for NC.KCT..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:48,992 140043381372480 INFO noise_module.preprocess_raw(): removing response for NC.KHMB..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:52,731 140043364591168 INFO noise_module.preprocess_raw(): removing response for NC.KHMB..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:53,671 140043691746880 INFO noise_module.preprocess_raw(): removing response for NC.KRP..HHE | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:35:53,749 140043347809856 INFO noise_module.preprocess_raw(): removing response for NC.KRP..HHN | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:36:02,156 140043674965568 INFO noise_module.preprocess_raw(): removing response for NC.KRP..HHZ | 2012-01-02T00:00:00.000000Z - 2012-01-02T23:59:59.950000Z | 20.0 Hz, 1728000 samples using inv
2024-06-24 21:36:09,357 140045541878656 INFO utils.log_raw(): TIMING CC Main: 36.9441 secs. for Preprocess: 21 channels
2024-06-24 21:36:09,358 140045541878656 INFO correlate.check_memory(): Require  0.26gb memory for cross correlations
2024-06-24 21:36:11,038 140045541878656 INFO utils.log_raw(): TIMING CC Main: 1.6790 secs. for Compute FFTs: 21 channels
2024-06-24 21:36:11,044 140045541878656 INFO correlate.cc_timespan(): Starting CC with 9 station pairs
2024-06-24 21:36:14,809 140045541878656 INFO utils.log_raw(): TIMING CC Main: 3.7649 secs. for Correlate and write to store
2024-06-24 21:36:14,970 140045541878656 INFO utils.log_raw(): TIMING CC Main: 44.9002 secs. for Process the chunk of 2012-01-02T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:14,980 140045541878656 INFO utils.log_raw(): TIMING CC Main: 90.4766 secs. for Step 1 in total with 4 cores

The cross correlations are saved as a single file for each channel pair and each increment of inc_hours. We now will stack all the cross correlations over all time chunk and look at all station pairs results.

Step 2: Stack the cross correlation#

We now create the stack stores. Because this tutorial runs locally, we will use an ASDF stack store to output the data. ASDF is a data container in HDF5 that is used in full waveform modeling and inversion. H5 behaves well locally.

Each station pair will have 1 H5 file with all components of the cross correlations. While this produces many H5 files, it has come down to the noisepy team’s favorite option:

  1. the thread-safe installation of HDF5 is not trivial

  2. the choice of grouping station pairs within a single file is not flexible to a broad audience of users.

# open a new cc store in read-only mode since we will be doing parallel access for stacking
cc_store = ASDFCCStore(cc_data_path, mode="r")
stack_store = ASDFStackStore(stack_data_path)

Configure the stacking#

There are various methods for optimal stacking. We refern to Yang et al (2022) for a discussion and comparison of the methods:

Yang X, Bryan J, Okubo K, Jiang C, Clements T, Denolle MA. Optimal stacking of noise cross-correlation functions. Geophysical Journal International. 2023 Mar;232(3):1600-18. https://doi.org/10.1093/gji/ggac410

Users have the choice to implement linear, phase weighted stacks pws (Schimmel et al, 1997), robust stacking (Yang et al, 2022), acf autocovariance filter (Nakata et al, 2019), nroot stacking. Users may choose the stacking method of their choice by entering the strings in config.stack_method.

If chosen all, the current code only ouputs linear, pws, robust since nroot is less common and acf is computationally expensive.

config.stack_method = StackMethod.LINEAR
method_list = [method for method in dir(StackMethod) if not method.startswith("__")]
print(method_list)
['ALL', 'AUTO_COVARIANCE', 'LINEAR', 'NROOT', 'PWS', 'ROBUST', 'SELECTIVE']
cc_store.get_station_pairs()
[(NC.KCT, NC.KCT),
 (NC.KCT, NC.KRP),
 (NC.KHMB, NC.KHMB),
 (NC.KRP, NC.KRP),
 (CI.SVD, CI.SVD),
 (CI.BBR, CI.SVD),
 (NC.KHMB, NC.KRP),
 (CI.BBR, CI.BBR),
 (NC.KCT, NC.KHMB)]
stack_cross_correlations(cc_store, stack_store, config)
2024-06-24 21:36:15,113 140045541878656 INFO stack.initializer(): Station pairs: 9
2024-06-24 21:36:18,623 139866818571136 INFO stack.stack_store_pair(): Stacking CI.BBR_CI.SVD/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,660 139993639398272 INFO stack.stack_store_pair(): Stacking CI.BBR_CI.BBR/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,716 140565725358976 INFO stack.stack_store_pair(): Stacking NC.KCT_NC.KCT/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,743 139993639398272 ERROR stack.stack_store_pair(): Error stacking pair (CI.BBR, CI.BBR): more than 9 cross-component exists for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000 (CI.BBR, CI.BBR)! please double check
2024-06-24 21:36:18,746 139993639398272 INFO stack.stack_store_pair(): Stacking NC.KCT_NC.KHMB/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,762 139866818571136 ERROR stack.stack_store_pair(): Error stacking pair (CI.BBR, CI.SVD): more than 9 cross-component exists for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000 (CI.BBR, CI.SVD)! please double check
2024-06-24 21:36:18,764 139866818571136 INFO stack.stack_store_pair(): Stacking NC.KCT_NC.KRP/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,777 140565725358976 INFO utils.log_raw(): TIMING: 0.0560 secs. for loading CCF data
2024-06-24 21:36:18,779 140041246247808 INFO stack.stack_store_pair(): Stacking CI.SVD_CI.SVD/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,792 140565725358976 INFO utils.log_raw(): TIMING: 0.0152 secs. for stack/rotate all station pairs (NC.KCT, NC.KCT)
2024-06-24 21:36:18,812 140565725358976 INFO utils.log_raw(): TIMING: 0.0189 secs. for writing stack pair (NC.KCT, NC.KCT)
2024-06-24 21:36:18,815 140565725358976 INFO stack.stack_store_pair(): Stacking NC.KHMB_NC.KHMB/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,827 139993639398272 INFO utils.log_raw(): TIMING: 0.0751 secs. for loading CCF data
2024-06-24 21:36:18,830 139866818571136 INFO utils.log_raw(): TIMING: 0.0625 secs. for loading CCF data
2024-06-24 21:36:18,846 139993639398272 INFO utils.log_raw(): TIMING: 0.0182 secs. for stack/rotate all station pairs (NC.KCT, NC.KHMB)
2024-06-24 21:36:18,847 139866818571136 INFO utils.log_raw(): TIMING: 0.0176 secs. for stack/rotate all station pairs (NC.KCT, NC.KRP)
2024-06-24 21:36:18,874 140041246247808 ERROR stack.stack_store_pair(): Error stacking pair (CI.SVD, CI.SVD): more than 9 cross-component exists for 2012-01-01T00:00:00+0000 - 2012-01-02T00:00:00+0000 (CI.SVD, CI.SVD)! please double check
2024-06-24 21:36:18,874 140565725358976 INFO utils.log_raw(): TIMING: 0.0569 secs. for loading CCF data
2024-06-24 21:36:18,877 140041246247808 INFO stack.stack_store_pair(): Stacking NC.KHMB_NC.KRP/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,891 140565725358976 INFO utils.log_raw(): TIMING: 0.0165 secs. for stack/rotate all station pairs (NC.KHMB, NC.KHMB)
2024-06-24 21:36:18,901 139866818571136 INFO utils.log_raw(): TIMING: 0.0533 secs. for writing stack pair (NC.KCT, NC.KRP)
2024-06-24 21:36:18,902 139993639398272 INFO utils.log_raw(): TIMING: 0.0561 secs. for writing stack pair (NC.KCT, NC.KHMB)
2024-06-24 21:36:18,903 139866818571136 INFO stack.stack_store_pair(): Stacking NC.KRP_NC.KRP/2012-01-01T00:00:00+0000 - 2012-01-03T00:00:00+0000
2024-06-24 21:36:18,910 140565725358976 INFO utils.log_raw(): TIMING: 0.0191 secs. for writing stack pair (NC.KHMB, NC.KHMB)
2024-06-24 21:36:18,946 139866818571136 INFO utils.log_raw(): TIMING: 0.0405 secs. for loading CCF data
2024-06-24 21:36:18,948 140041246247808 INFO utils.log_raw(): TIMING: 0.0674 secs. for loading CCF data
2024-06-24 21:36:18,957 139866818571136 INFO utils.log_raw(): TIMING: 0.0115 secs. for stack/rotate all station pairs (NC.KRP, NC.KRP)
2024-06-24 21:36:18,968 140041246247808 INFO utils.log_raw(): TIMING: 0.0195 secs. for stack/rotate all station pairs (NC.KHMB, NC.KRP)
2024-06-24 21:36:18,976 139866818571136 INFO utils.log_raw(): TIMING: 0.0181 secs. for writing stack pair (NC.KRP, NC.KRP)
2024-06-24 21:36:19,002 140041246247808 INFO utils.log_raw(): TIMING: 0.0346 secs. for writing stack pair (NC.KHMB, NC.KRP)
2024-06-24 21:36:19,580 140045541878656 INFO utils.log_raw(): TIMING: 4.4699 secs. for step 2 in total
2024-06-24 21:36:19,581 140045541878656 ERROR stack.stack_cross_correlations(): Error stacking 3/9 pairs. Check the logs for more information. Failed pairs: 
(CI.BBR, CI.BBR)
(CI.BBR, CI.SVD)
(CI.SVD, CI.SVD)

QC_1 of the cross correlations for Imaging#

We now explore the quality of the cross correlations. We plot the moveout of the cross correlations, filtered in some frequency band.

cc_store.get_station_pairs()
[(NC.KCT, NC.KCT),
 (NC.KCT, NC.KRP),
 (NC.KHMB, NC.KHMB),
 (NC.KRP, NC.KRP),
 (CI.SVD, CI.SVD),
 (CI.BBR, CI.SVD),
 (NC.KHMB, NC.KRP),
 (CI.BBR, CI.BBR),
 (NC.KCT, NC.KHMB)]
pairs = stack_store.get_station_pairs()
print(f"Found {len(pairs)} station pairs")
sta_stacks = stack_store.read_bulk(timerange, pairs) # no timestamp used in ASDFStackStore
Found 6 station pairs
2024-06-24 21:36:19,752 140045541878656 INFO utils.log_raw(): TIMING: 0.1155 secs. for loading 6 stacks
plot_all_moveout(sta_stacks, 'Allstack_linear', 0.1, 0.2, 'ZZ', 1)
2024-06-24 21:36:19,778 140045541878656 INFO plotting_modules.plot_all_moveout(): Plottting: Allstack_linear, 6 station pairs
200 8001
_images/a3a70cda2060f9b685b79b8057039a5310e525b38f6980ce19608df850d91c35.png