Extract, Transform, Load (ETL)

The Extract, Transform, Load (ETL) is a process in the database usage that combines three database functions that transfer data from one database to another. The first stage, Extract, involves reading and extracting data from various source systems. The second stage, Transform, converts the data from its original format into the format that meets the requirements of the target database. The last stage, Load, saves the new data into the target database, thus finishing the process of transferring the data.

In BigFix Inventory, the Extract stage involves extracting data from the BigFix server. The data includes information about the infrastructure, installed agents, and detected software. ETL also checks whether a new software catalog is available, gathers information about the software scan and files that are present on the endpoints, and collects data from VM managers.

The extracted data is then transformed to a single format that can be loaded to the BigFix Inventory database. This stage also involves matching scan data with the software catalog, calculating processor value units (PVUs), processing the capacity scan, and converting information that is contained in the XML files. After the data is extracted and transformed, it is loaded into the database and can be used by BigFix Inventory.The hardest load on the BigFix Inventory server occurs during ETL when the following actions are performed:
  • A large number of small files is retrieved from the BigFix server (Extract).
  • Many small and medium files that contain information about installed software packages and process usage data are parsed (Transform).
  • The database is populated with the parsed data (Load).
At the same time, BigFix Inventory prunes large volumes of old data that exceeds its data retention period.

Performance of the ETL process depends on the number of scan files, usage analyses, and package analyses that are processed during a single import. The main bottleneck is storage performance because many small files must be read, processed, and written to the BigFix Inventory database in a short time. By properly scheduling scans and distributing them over the computers in your infrastructure, you can reduce the length of the ETL process and improve its performance.

An important factor that influences the duration of the ETL process is the amount of updates on the file system since the last scan. Such operations as security updates or significant system upgrades can cause ETL to run longer, because it has to process information about all modified files. For example, regular updates released by Microsoft on Tuesdays would significantly lengthen the Wednesday import in environments with many Windows platforms.