epydemix.population package

Submodules

epydemix.population.population module

class epydemix.population.population.Population(name: str = 'population')[source]

Bases: object

Represents a population for epidemiological modeling, including demographic data and contact matrices.

The Population class manages and stores population data, including demographic distributions and contact matrices for various layers (e.g., school, work, home, community). It provides methods to add and retrieve this data for use in simulations and analysis.

Attributes:

name (str): The name of the population. Nk (List): List representing population data for different demographic groups. Nk_names (List[str]): List of demographic group names. contact_matrices (Dict[str, np.ndarray]): Dictionary mapping layer names to their corresponding contact matrices

(aggregated by age groups).

Example 1: Online import (data will be fetched from GitHub)
population_online = load_epydemix_population(

population_name=”United_States”, # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)

)

Example 2: Offline import (data will be loaded from a local directory) # Ensure that the folder is downloaded locally before running this population_offline = load_epydemix_population(

population_name=”United_States”, path_to_data=”path/to/local/epydemix_data/”, # Path to the local data folder # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)

)

add_contact_matrix(contact_matrix: ndarray, layer_name: str = 'all') None[source]

Adds a contact matrix for a specified layer.

Parameters:
  • contact_matrix (np.ndarray) – The contact matrix to be added, representing contact patterns between different demographic groups.

  • layer_name (str, optional) – The name of the contact layer (e.g., “home”, “work”). Defaults to “all”. Cannot be “overall” as it’s reserved.

Raises:

ValueError – If contact_matrix is not a 2D square array or if layer_name is “overall”

Returns:

None

add_population(Nk: List[float], Nk_names: List[str] | None = None) None[source]

Adds population data for different demographic groups.

Parameters:
  • Nk (List[float]) – A list representing the population size for each demographic group.

  • Nk_names (Optional[List[str]], optional) – A list of demographic group names. If not provided, a default list of indices is generated. Defaults to None.

Returns:

None

property layers: List[str]

Available contact matrix layers.

Returns:

Names of available contact layers (e.g., [‘home’, ‘work’, ‘school’])

Return type:

List[str]

property mean_contacts: Dict[str, float]

Mean number of contacts per person per layer.

Returns:

Dictionary mapping layer names to mean contacts per person

Return type:

Dict[str, float]

property num_groups: int

Number of demographic groups.

Returns:

Number of demographic groups in the population

Return type:

int

property total_contacts: Dict[str, float]

Total number of contacts per layer.

Returns:

Dictionary mapping layer names to total contacts

Return type:

Dict[str, float]

property total_population: float

Total population across all demographic groups.

Returns:

Sum of population in all demographic groups

Return type:

float

validate() None[source]

Validate all aspects of population data consistency. Raises ValueError if any validation fails.

epydemix.population.population.aggregate_demographic(data: DataFrame, grouping: Dict[str, List[str]]) DataFrame[source]

Aggregates demographic data based on a grouping dictionary.

Parameters:
  • data (pd.DataFrame) – A DataFrame containing demographic data with columns ‘group_name’ and ‘value’.

  • grouping (Dict[str, List[str]]) – A dictionary where keys are new group names and values are lists of original group names to aggregate.

Returns:

A DataFrame with two columns: ‘group_name’ and ‘value’, where ‘value’ is the sum of the ‘value’ column from the original DataFrame for each new group.

Return type:

pd.DataFrame

epydemix.population.population.aggregate_matrix(initial_matrix: ndarray, old_population: ndarray, new_population: ndarray, age_group_mapping: Dict[str, list], old_age_groups_idx: Dict[str, int], new_age_group_idx: Dict[str, int]) ndarray[source]

Aggregates a contact matrix based on new demographic groupings.

Parameters:
  • initial_matrix (np.ndarray) – The initial contact matrix (rates) between old demographic groups.

  • old_population (np.ndarray) – The population sizes of the old demographic groups.

  • new_population (np.ndarray) – The population sizes of the new aggregated demographic groups.

  • age_group_mapping (Dict[str, list]) – A dictionary mapping new demographic group names to lists of old group names.

  • old_age_groups_idx (Dict[str, int]) – A dictionary mapping old age group names to their indices in the contact matrix.

  • new_age_group_idx (Dict[str, int]) – A dictionary mapping new age group names to their indices in the aggregated matrix.

Returns:

The aggregated contact matrix (rates) for the new demographic groups.

Return type:

np.ndarray

epydemix.population.population.get_available_locations(attribute: str = 'age', data_version: str = 'v1.2.0', level: int | None = None) DataFrame[source]

Returns a list of available locations from the epydemix-data repository.

Parameters:
  • attribute (str) – The demographic attribute layer. Defaults to “age”.

  • data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.

  • level (Optional[int]) – If provided, filters the result to only rows where the level column equals this value. Geographic levels are: 0 = country, 1 = state/province, 2 = county. Silently ignored for data versions that do not include a level column. Defaults to None (no filter).

Returns:

A DataFrame containing the list of available locations.

Return type:

pd.DataFrame

epydemix.population.population.get_primary_contacts_source(population_name: str, path_to_data: str, attribute: str = 'age') str | None[source]

Retrieves the primary contact source for a given population name from the locations data.

Parameters:
  • population_name (str) – The name of the population whose primary contact source is to be retrieved.

  • path_to_data (str) – The path to the directory containing the data.

  • attribute (str) – The demographic attribute layer. Defaults to “age”.

Returns:

The primary contact source for the given population name.

Returns None if the population name is not found.

Return type:

Optional[str]

Raises:

ValueError – If the population name is not found in the locations data.

epydemix.population.population.load_epydemix_population(population_name: str, contacts_source: str | None = None, path_to_data: str | None = None, layers: List[str] = ['school', 'work', 'home', 'community'], age_group_mapping: Dict[str, List[str]] | None = None, supported_contacts_sources: Dict[str, List[str]] = {'age': ['prem_2017', 'prem_2021', 'mistry_2021', 'litvinova_2025'], 'race_ethnicity': ['litvinova_2025'], 'sex': ['litvinova_2025']}, data_version: str = 'v1.2.0', attribute: str = 'age') Population[source]

Loads population and contact matrix data for a specified population.

Parameters:
  • population_name (str) – The name of the population to load.

  • contacts_source (Optional[str]) – The source of contact matrices. If None, the default source is retrieved.

  • path_to_data (Optional[str]) – The local path to the data directory. If None, data is fetched from GitHub.

  • layers (List[str]) – The layers of contact matrices to load.

  • age_group_mapping (Optional[Dict[str, List[str]]]) – Mapping of age groups. If None, defaults based on contacts_source.

  • supported_contacts_sources (Dict[str, List[str]]) – Dict mapping attribute names to their supported contact sources.

  • data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.

  • attribute (str) – The demographic attribute layer. Defaults to “age”.

Returns:

An instance of the Population class with the loaded data.

Return type:

Population

Raises:

ValueError – If any provided value is not valid or if there are issues with the data files.

epydemix.population.population.map_age_groups_to_idx(age_group_mapping: Dict[str, List[str]], old_age_groups_idx: Dict[str, int], new_age_group_idx: Dict[str, int]) Dict[int, int][source]

Maps old age groups to new age groups using index mappings.

Parameters:
  • age_group_mapping (Dict[str, List[str]]) – A dictionary where keys are new age groups, and values are lists of old age groups.

  • old_age_groups_idx (Dict[str, int]) – A dictionary mapping old age group names to their respective indices.

  • new_age_group_idx (Dict[str, int]) – A dictionary mapping new age group names to their respective indices.

Returns:

A dictionary mapping old age group indices to new age group indices.

Return type:

Dict[int, int]

epydemix.population.population.validate_age_group_mapping(age_group_mapping: Dict[str, List[str]], allowed_values: List[str]) None[source]

Validates that all age group mapping values are within the allowed values.

Parameters:
  • age_group_mapping (Dict[str, List[str]]) – A dictionary where keys are age group names and values are lists of values for each age group.

  • allowed_values (List[str]) – A list of allowed values that the age group mapping values should be within.

Raises:

ValueError – If any value in the age group mapping is not in the list of allowed values.

epydemix.population.population.validate_contacts_source(contacts_source: str, supported_contacts_sources: List[str]) None[source]

Validates if a given contacts source is in the list of supported contact sources.

Parameters:
  • contacts_source (str) – The contact source to validate.

  • supported_contacts_sources (List[str]) – A list of supported contact sources.

Raises:

ValueError – If the contacts_source is not found in the list of supported sources.

epydemix.population.population.validate_population_name(population_name: str, path_to_data: str, attribute: str = 'age') None[source]

Validates if a given population name exists in the locations data.

Location names use underscores for spaces within a name and double underscores to separate geographic hierarchy levels, e.g. United_States__Alabama__Autauga_County.

Parameters:
  • population_name (str) – The name of the population to validate.

  • path_to_data (str) – The path to the directory containing the data.

  • attribute (str) – The demographic attribute layer. Defaults to “age”.

Raises:

ValueError – If the population_name is not found in the list of locations.

Module contents

class epydemix.population.Population(name: str = 'population')[source]

Bases: object

Represents a population for epidemiological modeling, including demographic data and contact matrices.

The Population class manages and stores population data, including demographic distributions and contact matrices for various layers (e.g., school, work, home, community). It provides methods to add and retrieve this data for use in simulations and analysis.

Attributes:

name (str): The name of the population. Nk (List): List representing population data for different demographic groups. Nk_names (List[str]): List of demographic group names. contact_matrices (Dict[str, np.ndarray]): Dictionary mapping layer names to their corresponding contact matrices

(aggregated by age groups).

Example 1: Online import (data will be fetched from GitHub)
population_online = load_epydemix_population(

population_name=”United_States”, # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)

)

Example 2: Offline import (data will be loaded from a local directory) # Ensure that the folder is downloaded locally before running this population_offline = load_epydemix_population(

population_name=”United_States”, path_to_data=”path/to/local/epydemix_data/”, # Path to the local data folder # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)

)

add_contact_matrix(contact_matrix: ndarray, layer_name: str = 'all') None[source]

Adds a contact matrix for a specified layer.

Parameters:
  • contact_matrix (np.ndarray) – The contact matrix to be added, representing contact patterns between different demographic groups.

  • layer_name (str, optional) – The name of the contact layer (e.g., “home”, “work”). Defaults to “all”. Cannot be “overall” as it’s reserved.

Raises:

ValueError – If contact_matrix is not a 2D square array or if layer_name is “overall”

Returns:

None

add_population(Nk: List[float], Nk_names: List[str] | None = None) None[source]

Adds population data for different demographic groups.

Parameters:
  • Nk (List[float]) – A list representing the population size for each demographic group.

  • Nk_names (Optional[List[str]], optional) – A list of demographic group names. If not provided, a default list of indices is generated. Defaults to None.

Returns:

None

property layers: List[str]

Available contact matrix layers.

Returns:

Names of available contact layers (e.g., [‘home’, ‘work’, ‘school’])

Return type:

List[str]

property mean_contacts: Dict[str, float]

Mean number of contacts per person per layer.

Returns:

Dictionary mapping layer names to mean contacts per person

Return type:

Dict[str, float]

property num_groups: int

Number of demographic groups.

Returns:

Number of demographic groups in the population

Return type:

int

property total_contacts: Dict[str, float]

Total number of contacts per layer.

Returns:

Dictionary mapping layer names to total contacts

Return type:

Dict[str, float]

property total_population: float

Total population across all demographic groups.

Returns:

Sum of population in all demographic groups

Return type:

float

validate() None[source]

Validate all aspects of population data consistency. Raises ValueError if any validation fails.

epydemix.population.get_available_locations(attribute: str = 'age', data_version: str = 'v1.2.0', level: int | None = None) DataFrame[source]

Returns a list of available locations from the epydemix-data repository.

Parameters:
  • attribute (str) – The demographic attribute layer. Defaults to “age”.

  • data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.

  • level (Optional[int]) – If provided, filters the result to only rows where the level column equals this value. Geographic levels are: 0 = country, 1 = state/province, 2 = county. Silently ignored for data versions that do not include a level column. Defaults to None (no filter).

Returns:

A DataFrame containing the list of available locations.

Return type:

pd.DataFrame

epydemix.population.load_epydemix_population(population_name: str, contacts_source: str | None = None, path_to_data: str | None = None, layers: List[str] = ['school', 'work', 'home', 'community'], age_group_mapping: Dict[str, List[str]] | None = None, supported_contacts_sources: Dict[str, List[str]] = {'age': ['prem_2017', 'prem_2021', 'mistry_2021', 'litvinova_2025'], 'race_ethnicity': ['litvinova_2025'], 'sex': ['litvinova_2025']}, data_version: str = 'v1.2.0', attribute: str = 'age') Population[source]

Loads population and contact matrix data for a specified population.

Parameters:
  • population_name (str) – The name of the population to load.

  • contacts_source (Optional[str]) – The source of contact matrices. If None, the default source is retrieved.

  • path_to_data (Optional[str]) – The local path to the data directory. If None, data is fetched from GitHub.

  • layers (List[str]) – The layers of contact matrices to load.

  • age_group_mapping (Optional[Dict[str, List[str]]]) – Mapping of age groups. If None, defaults based on contacts_source.

  • supported_contacts_sources (Dict[str, List[str]]) – Dict mapping attribute names to their supported contact sources.

  • data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.

  • attribute (str) – The demographic attribute layer. Defaults to “age”.

Returns:

An instance of the Population class with the loaded data.

Return type:

Population

Raises:

ValueError – If any provided value is not valid or if there are issues with the data files.