epydemix.population package
Submodules
epydemix.population.population module
- class epydemix.population.population.Population(name: str = 'population')[source]
Bases:
objectRepresents a population for epidemiological modeling, including demographic data and contact matrices.
The Population class manages and stores population data, including demographic distributions and contact matrices for various layers (e.g., school, work, home, community). It provides methods to add and retrieve this data for use in simulations and analysis.
- Attributes:
name (str): The name of the population. Nk (List): List representing population data for different demographic groups. Nk_names (List[str]): List of demographic group names. contact_matrices (Dict[str, np.ndarray]): Dictionary mapping layer names to their corresponding contact matrices
(aggregated by age groups).
- Example 1: Online import (data will be fetched from GitHub)
- population_online = load_epydemix_population(
population_name=”United_States”, # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)
)
Example 2: Offline import (data will be loaded from a local directory) # Ensure that the folder is downloaded locally before running this population_offline = load_epydemix_population(
population_name=”United_States”, path_to_data=”path/to/local/epydemix_data/”, # Path to the local data folder # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)
)
- add_contact_matrix(contact_matrix: ndarray, layer_name: str = 'all') None[source]
Adds a contact matrix for a specified layer.
- Parameters:
contact_matrix (np.ndarray) – The contact matrix to be added, representing contact patterns between different demographic groups.
layer_name (str, optional) – The name of the contact layer (e.g., “home”, “work”). Defaults to “all”. Cannot be “overall” as it’s reserved.
- Raises:
ValueError – If contact_matrix is not a 2D square array or if layer_name is “overall”
- Returns:
None
- add_population(Nk: List[float], Nk_names: List[str] | None = None) None[source]
Adds population data for different demographic groups.
- Parameters:
Nk (List[float]) – A list representing the population size for each demographic group.
Nk_names (Optional[List[str]], optional) – A list of demographic group names. If not provided, a default list of indices is generated. Defaults to None.
- Returns:
None
- property layers: List[str]
Available contact matrix layers.
- Returns:
Names of available contact layers (e.g., [‘home’, ‘work’, ‘school’])
- Return type:
List[str]
- property mean_contacts: Dict[str, float]
Mean number of contacts per person per layer.
- Returns:
Dictionary mapping layer names to mean contacts per person
- Return type:
Dict[str, float]
- property num_groups: int
Number of demographic groups.
- Returns:
Number of demographic groups in the population
- Return type:
int
- property total_contacts: Dict[str, float]
Total number of contacts per layer.
- Returns:
Dictionary mapping layer names to total contacts
- Return type:
Dict[str, float]
- property total_population: float
Total population across all demographic groups.
- Returns:
Sum of population in all demographic groups
- Return type:
float
- epydemix.population.population.aggregate_demographic(data: DataFrame, grouping: Dict[str, List[str]]) DataFrame[source]
Aggregates demographic data based on a grouping dictionary.
- Parameters:
data (pd.DataFrame) – A DataFrame containing demographic data with columns ‘group_name’ and ‘value’.
grouping (Dict[str, List[str]]) – A dictionary where keys are new group names and values are lists of original group names to aggregate.
- Returns:
A DataFrame with two columns: ‘group_name’ and ‘value’, where ‘value’ is the sum of the ‘value’ column from the original DataFrame for each new group.
- Return type:
pd.DataFrame
- epydemix.population.population.aggregate_matrix(initial_matrix: ndarray, old_population: ndarray, new_population: ndarray, age_group_mapping: Dict[str, list], old_age_groups_idx: Dict[str, int], new_age_group_idx: Dict[str, int]) ndarray[source]
Aggregates a contact matrix based on new demographic groupings.
- Parameters:
initial_matrix (np.ndarray) – The initial contact matrix (rates) between old demographic groups.
old_population (np.ndarray) – The population sizes of the old demographic groups.
new_population (np.ndarray) – The population sizes of the new aggregated demographic groups.
age_group_mapping (Dict[str, list]) – A dictionary mapping new demographic group names to lists of old group names.
old_age_groups_idx (Dict[str, int]) – A dictionary mapping old age group names to their indices in the contact matrix.
new_age_group_idx (Dict[str, int]) – A dictionary mapping new age group names to their indices in the aggregated matrix.
- Returns:
The aggregated contact matrix (rates) for the new demographic groups.
- Return type:
np.ndarray
- epydemix.population.population.get_available_locations(attribute: str = 'age', data_version: str = 'v1.2.0', level: int | None = None) DataFrame[source]
Returns a list of available locations from the epydemix-data repository.
- Parameters:
attribute (str) – The demographic attribute layer. Defaults to “age”.
data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.
level (Optional[int]) – If provided, filters the result to only rows where the
levelcolumn equals this value. Geographic levels are: 0 = country, 1 = state/province, 2 = county. Silently ignored for data versions that do not include alevelcolumn. Defaults to None (no filter).
- Returns:
A DataFrame containing the list of available locations.
- Return type:
pd.DataFrame
- epydemix.population.population.get_primary_contacts_source(population_name: str, path_to_data: str, attribute: str = 'age') str | None[source]
Retrieves the primary contact source for a given population name from the locations data.
- Parameters:
population_name (str) – The name of the population whose primary contact source is to be retrieved.
path_to_data (str) – The path to the directory containing the data.
attribute (str) – The demographic attribute layer. Defaults to “age”.
- Returns:
- The primary contact source for the given population name.
Returns None if the population name is not found.
- Return type:
Optional[str]
- Raises:
ValueError – If the population name is not found in the locations data.
- epydemix.population.population.load_epydemix_population(population_name: str, contacts_source: str | None = None, path_to_data: str | None = None, layers: List[str] = ['school', 'work', 'home', 'community'], age_group_mapping: Dict[str, List[str]] | None = None, supported_contacts_sources: Dict[str, List[str]] = {'age': ['prem_2017', 'prem_2021', 'mistry_2021', 'litvinova_2025'], 'race_ethnicity': ['litvinova_2025'], 'sex': ['litvinova_2025']}, data_version: str = 'v1.2.0', attribute: str = 'age') Population[source]
Loads population and contact matrix data for a specified population.
- Parameters:
population_name (str) – The name of the population to load.
contacts_source (Optional[str]) – The source of contact matrices. If None, the default source is retrieved.
path_to_data (Optional[str]) – The local path to the data directory. If None, data is fetched from GitHub.
layers (List[str]) – The layers of contact matrices to load.
age_group_mapping (Optional[Dict[str, List[str]]]) – Mapping of age groups. If None, defaults based on contacts_source.
supported_contacts_sources (Dict[str, List[str]]) – Dict mapping attribute names to their supported contact sources.
data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.
attribute (str) – The demographic attribute layer. Defaults to “age”.
- Returns:
An instance of the Population class with the loaded data.
- Return type:
- Raises:
ValueError – If any provided value is not valid or if there are issues with the data files.
- epydemix.population.population.map_age_groups_to_idx(age_group_mapping: Dict[str, List[str]], old_age_groups_idx: Dict[str, int], new_age_group_idx: Dict[str, int]) Dict[int, int][source]
Maps old age groups to new age groups using index mappings.
- Parameters:
age_group_mapping (Dict[str, List[str]]) – A dictionary where keys are new age groups, and values are lists of old age groups.
old_age_groups_idx (Dict[str, int]) – A dictionary mapping old age group names to their respective indices.
new_age_group_idx (Dict[str, int]) – A dictionary mapping new age group names to their respective indices.
- Returns:
A dictionary mapping old age group indices to new age group indices.
- Return type:
Dict[int, int]
- epydemix.population.population.validate_age_group_mapping(age_group_mapping: Dict[str, List[str]], allowed_values: List[str]) None[source]
Validates that all age group mapping values are within the allowed values.
- Parameters:
age_group_mapping (Dict[str, List[str]]) – A dictionary where keys are age group names and values are lists of values for each age group.
allowed_values (List[str]) – A list of allowed values that the age group mapping values should be within.
- Raises:
ValueError – If any value in the age group mapping is not in the list of allowed values.
- epydemix.population.population.validate_contacts_source(contacts_source: str, supported_contacts_sources: List[str]) None[source]
Validates if a given contacts source is in the list of supported contact sources.
- Parameters:
contacts_source (str) – The contact source to validate.
supported_contacts_sources (List[str]) – A list of supported contact sources.
- Raises:
ValueError – If the contacts_source is not found in the list of supported sources.
- epydemix.population.population.validate_population_name(population_name: str, path_to_data: str, attribute: str = 'age') None[source]
Validates if a given population name exists in the locations data.
Location names use underscores for spaces within a name and double underscores to separate geographic hierarchy levels, e.g.
United_States__Alabama__Autauga_County.- Parameters:
population_name (str) – The name of the population to validate.
path_to_data (str) – The path to the directory containing the data.
attribute (str) – The demographic attribute layer. Defaults to “age”.
- Raises:
ValueError – If the population_name is not found in the list of locations.
Module contents
- class epydemix.population.Population(name: str = 'population')[source]
Bases:
objectRepresents a population for epidemiological modeling, including demographic data and contact matrices.
The Population class manages and stores population data, including demographic distributions and contact matrices for various layers (e.g., school, work, home, community). It provides methods to add and retrieve this data for use in simulations and analysis.
- Attributes:
name (str): The name of the population. Nk (List): List representing population data for different demographic groups. Nk_names (List[str]): List of demographic group names. contact_matrices (Dict[str, np.ndarray]): Dictionary mapping layer names to their corresponding contact matrices
(aggregated by age groups).
- Example 1: Online import (data will be fetched from GitHub)
- population_online = load_epydemix_population(
population_name=”United_States”, # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)
)
Example 2: Offline import (data will be loaded from a local directory) # Ensure that the folder is downloaded locally before running this population_offline = load_epydemix_population(
population_name=”United_States”, path_to_data=”path/to/local/epydemix_data/”, # Path to the local data folder # Specify the preferred contact data source (needed only if you want to override the default primary source) contacts_source=”mistry_2021”, layers=[“home”, “work”, “school”, “community”] # Load contact layers (by default all layers are imported)
)
- add_contact_matrix(contact_matrix: ndarray, layer_name: str = 'all') None[source]
Adds a contact matrix for a specified layer.
- Parameters:
contact_matrix (np.ndarray) – The contact matrix to be added, representing contact patterns between different demographic groups.
layer_name (str, optional) – The name of the contact layer (e.g., “home”, “work”). Defaults to “all”. Cannot be “overall” as it’s reserved.
- Raises:
ValueError – If contact_matrix is not a 2D square array or if layer_name is “overall”
- Returns:
None
- add_population(Nk: List[float], Nk_names: List[str] | None = None) None[source]
Adds population data for different demographic groups.
- Parameters:
Nk (List[float]) – A list representing the population size for each demographic group.
Nk_names (Optional[List[str]], optional) – A list of demographic group names. If not provided, a default list of indices is generated. Defaults to None.
- Returns:
None
- property layers: List[str]
Available contact matrix layers.
- Returns:
Names of available contact layers (e.g., [‘home’, ‘work’, ‘school’])
- Return type:
List[str]
- property mean_contacts: Dict[str, float]
Mean number of contacts per person per layer.
- Returns:
Dictionary mapping layer names to mean contacts per person
- Return type:
Dict[str, float]
- property num_groups: int
Number of demographic groups.
- Returns:
Number of demographic groups in the population
- Return type:
int
- property total_contacts: Dict[str, float]
Total number of contacts per layer.
- Returns:
Dictionary mapping layer names to total contacts
- Return type:
Dict[str, float]
- property total_population: float
Total population across all demographic groups.
- Returns:
Sum of population in all demographic groups
- Return type:
float
- epydemix.population.get_available_locations(attribute: str = 'age', data_version: str = 'v1.2.0', level: int | None = None) DataFrame[source]
Returns a list of available locations from the epydemix-data repository.
- Parameters:
attribute (str) – The demographic attribute layer. Defaults to “age”.
data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.
level (Optional[int]) – If provided, filters the result to only rows where the
levelcolumn equals this value. Geographic levels are: 0 = country, 1 = state/province, 2 = county. Silently ignored for data versions that do not include alevelcolumn. Defaults to None (no filter).
- Returns:
A DataFrame containing the list of available locations.
- Return type:
pd.DataFrame
- epydemix.population.load_epydemix_population(population_name: str, contacts_source: str | None = None, path_to_data: str | None = None, layers: List[str] = ['school', 'work', 'home', 'community'], age_group_mapping: Dict[str, List[str]] | None = None, supported_contacts_sources: Dict[str, List[str]] = {'age': ['prem_2017', 'prem_2021', 'mistry_2021', 'litvinova_2025'], 'race_ethnicity': ['litvinova_2025'], 'sex': ['litvinova_2025']}, data_version: str = 'v1.2.0', attribute: str = 'age') Population[source]
Loads population and contact matrix data for a specified population.
- Parameters:
population_name (str) – The name of the population to load.
contacts_source (Optional[str]) – The source of contact matrices. If None, the default source is retrieved.
path_to_data (Optional[str]) – The local path to the data directory. If None, data is fetched from GitHub.
layers (List[str]) – The layers of contact matrices to load.
age_group_mapping (Optional[Dict[str, List[str]]]) – Mapping of age groups. If None, defaults based on contacts_source.
supported_contacts_sources (Dict[str, List[str]]) – Dict mapping attribute names to their supported contact sources.
data_version (str) – The git tag/version of the epydemix-data repository. Defaults to “v1.2.0”.
attribute (str) – The demographic attribute layer. Defaults to “age”.
- Returns:
An instance of the Population class with the loaded data.
- Return type:
- Raises:
ValueError – If any provided value is not valid or if there are issues with the data files.