Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 2017

An Online Analytical Processing Multi-Dimensional Data Warehouse for Malaria Data

Affiliations

An Online Analytical Processing Multi-Dimensional Data Warehouse for Malaria Data

S M Niaz Arifin et al. Database (Oxford).

Abstract

https://dw.vecnet.org/datawarehouse/.

Figures

Figure 1.
Figure 1.
Data taxonomy and ontology. (A) The taxonomy organizes VecNet-DW data into three broad categories: (i) historical data, which may range over several decades, are collected, modelled and stored from OSSs and the literature; (ii) predictive (or synthetic) data, which are mostly generated as outputs of different types of malaria models; and (iii) static data, which are mostly non-numeric (textual), also collected and modelled from the OSSs and the literature, and stored in the lookup tables. Both the historical and predictive categories may encompass aggregated and non-aggregated forms, while the static category may only encompass the non-aggregated form. (B) An ontology for VecNet-DW. Entities and their relationships are represented by rectangles and labelled arrows, respectively. Entities within the same ontology level are marked with the same colours. The root level (Level 1) has a single entity as the DW, VecNet-DW, which has multiple data marts and lookup tables as Level 2 entities. Each data mart can be modelled as one or more data cubes (Level 3). Each data cube usually has one or more fact tables (Level 4), and multiple dimension tables (Level 4), all of which store records (facts and dimensions, Level 5) of varying granularities. Each lookup table can be either a relational table or a dictionary (Level 3). A relational table may contain records that are mostly dimension-less. A dictionary may in turn be a relational table, or may contain semantic definitions (Level 4), which are stored as records.
Figure 2.
Figure 2.
Constellation schema for historical data. The constellation schema, also known as the galaxy schema, is a collection of simple star schemas. It ties together all the fact tables and dimension tables contained within all data marts representing historical data. The connections link fact tables to corresponding dimension tables. The fact tables are (partially) shaded in light blue. Dimensions are shaded in brown.
Figure 3.
Figure 3.
Snowflake schema for predictive data. The snowflake schema for predictive data consists of the dimension tables and one single fact table, Simulations (shaded in light blue). It contains time series data as direct outputs by the stored models (EMOD and OpenMalaria), and is connected to several dimensions (shaded in light brown). The ‘run’ dimension is snowflaked: a run represents a specific simulation run of a particular model, with the current version stored in model version. Its spatio-temporal information is stored via the conformed dimensions ‘location’ and ‘date’ (shaded in green), respectively. Each simulation run is also associated with a ‘template’, which describes all simulation parameters, along with their respective values. The output of a simulation run is stored in multiple ‘channels’. A user submits a simulation job as an ‘experiment’. Each experiment is transformed into runs, which, in turn, are broken into ‘executions’. An execution thus represents a single realization of the run configuration. To allow replications of a simulation run, the ‘execution’ dimension is coupled with the ‘replication’ dimension. The symbol n represents the many side of a one-to-many relationship.
Figure 4.
Figure 4.
VecNet-DW components. The four separate and distinct components of the DW environment: operational source systems, data staging area, data presentation area and data access tools. Each component serves specific functions, as described in Methods. The modelling phases and/or implementation technologies used are listed at the bottom in the blue-shaded boxes for all components.
Figure 5.
Figure 5.
Illustrative example of roll up and roll down. These operations allow the user to navigate among levels of data ranging from the most summarized (up) level to the most detailed (down) level, along a specific dimension. (A) Illustrative data showing mosquito abundances for various locations (continents and countries), years, and Anopheles species. (B) The data cube, derived from the data in A, shows the mosquito abundance facts (numbers in rounded rectangles). The cube is associated with three dimensions: species, year, and continent, which are displayed along the three axes, with data labels coming from A. Fact cells with different values of the continent dimension are distinctly colour-coded for ease of visualization. (C) The roll down operation produces a more detailed view of data by rolling down one level along the hierarchical location (from continent to country). The mosquito abundances of continent Africa are rolled down to abundances for countries of Africa (Angola, Benin, Kenya, and Nigeria in this scenario). Note that when rolled up, the entire cube in C represents one row of data in B (in this case, the topmost row, representing Africa).
Figure 6.
Figure 6.
Illustrative example of slice and dice. These operations permit users to access a DW through any of its dimensions. (A) Illustrative data showing mosquito abundances for various dates, times, and sites (mosquito collection sites). In figures (B–D), fact cells with different values of the time dimension are distinctly colour-coded for ease of visualization. (B) The data cube, derived from the data in (A). (C) The slice operation selects a rectangular subset of the cube by choosing a single value for one of its dimensions, creating a new cube with one fewer dimension. The mosquito abundances of all sites, for all dates, at 7PM are sliced out of the data cube. (D) The dice operation produces a sub-cube by selecting specific values of multiple dimensions. The abundances of sites Ifakara and Garki, for all dates, at 8 PM and 10 PM are diced out of the data cube.
Figure 7.
Figure 7.
Screenshot of VecNet-DW homepage. The homepage provides links (on the left navigation bar) to the Dimensional Data browser and the Lookup Tables browser. The Dimensional Data browser allows users to access all data marts, which are composed of relevant facts and dimensions. The Lookup Tables browser allows users access to all lookup tables, which serve as auxiliary tables to hold static data.
Figure 8.
Figure 8.
Screenshot of a faceted search with aggregated data on the Weather data cube. In this example, mean temperature and total precipitation data are aggregated over time by year, using the hierarchical ‘date’ dimension. The location Kenya → Nyanza → Kisumu is selected from the hierarchical dimension ‘location’, using the ‘Data Slicer’ panel. Various properties of the generated graphs (type, title, legend, axes, series etc.) can also be modified. The tooltip, as shown in the figure, appears when hovering over a data point in a data series, showing the value (24.24 in this example) of the data point and the name of the data series.
Figure 9.
Figure 9.
Screenshot of graphs generated from non-aggregated data. This example shows the resulting graphs of a faceted search with non-aggregated data on the Weather data cube. Temperature and precipitation data are displayed over time by day, using the hierarchical ‘date’ dimension. The location Kenya → Nyanza → Kisumu is selected from the hierarchical dimension ‘location’, using the ‘Data Slicer’ panel.
Figure 10.
Figure 10.
Use case example. A researcher wishes to investigate the effectiveness of a proposed new mosquito control intervention in a particular malaria endemic location. VecNet’s stored model EMOD (available via the Transmission Simulator, 24) will be used to model the current transmission baseline and to estimate the impact of various levels of coverage of the proposed new intervention. As part of the simulation setup, a population profile must be selected, specifying the distribution of the population over 5-year age intervals. In the Dimensional Data browser, the data cube Demographics is selected, along with two dimensions ‘location’ and ‘date’, and all age measures (i.e. years 0–4, 5–9 through 80plus). (A) Two data slices are specified, Kenya as the ‘location’ and 2001–20 as the ‘date’. (B) The resulting age distribution graph of this query. (C) The resulting table of this query, which is then used to either build a custom profile or select from an option provided by the EMOD web interface.
Figure 11.
Figure 11.
Screenshot of the lookup tables browser example. The screenshot depicts a partial view of the Species Bionomics lookup table, which stores bionomics parameter values for different mosquito species. Users can select or deselect all or any number of parameters (columns), and sort the tabular view by any parameter. The selected data can be copied as plain text, or downloaded in different formats (CSV, PDF etc.).
Figure 12.
Figure 12.
Screenshot of the ‘Results Viewer’. The ‘Results Viewer’ interface is available for public viewing through the Transmission Simulator (24). In this example, output channels ‘Daily EIR’ and ‘Parasite Prevalence’ are selected to be displayed from a ‘Rift Valley’ run from the ‘Risk Mapper—Kenya’ experiment. The resulting graphs are displayed on the right. For each graph, the user can zoom in using the slider placed under the x-axis, by dragging out a rectangle in the graph. In this example, the ‘Daily EIR’ graph is zoomed in to display a monthly view of the data series (from July 2002 to January 2005).

Similar articles

See all similar articles

References

    1. World Health Organization (WHO). http://www.who.int/en/ (August 2017, date last accessed).
    1. VecNet Data Warehouse Browser. https://ci.vecnet.org/datawarehouse/ (August 2017, date last accessed.).
    1. Vector-Borne Disease Network (VecNet). http://www.vecnet.org/. (August 2017, date last accessed).
    1. World Malaria Report (WMR). http://www.who.int/malaria/world_malaria_report_2010/en/ (August 2017, date last accessed).
    1. Malaria Atlas Project. http://www.map.ox.ac.uk/ (August 2017, date last accessed).

Publication types

Feedback