Skip to main content

Introduction to the Log_File_Schemas Table

The log_file_schemas table is designed to store and manage schema information related to log files for different companies. This table plays a crucial role in the system by maintaining structured representations of log file formats, enabling efficient parsing and analysis of log data.

Table Structure

The log_file_schemas table is structured to support versioning and company-specific schema management. Here’s a detailed breakdown of its columns:
Column NameData TypeConstraintsDescription
iduuidprimary key (composite)Unique identifier for the schema
revisionintegerprimary key (composite), not nullVersion number of the schema
company_iduuidforeign key, not nullReference to the company owning the schema
sample_file_datajsonbJSON representation of the schema structure
sample_file_nametextnot nullName of the sample file used to generate the schema
sample_file_iduuidOptional reference to the stored sample file
created_attimestamptznot null, default current_timestampTimestamp of schema creation
source_dataset_iduuidforeign keyReference to the source dataset used for the schema

Usage and Functionality

The log_file_schemas table is designed to be flexible and support evolving schema definitions. Here are some key points about its usage:
  1. Versioning System: The table implements a revision system using a composite primary key (id, revision). This allows for multiple versions of a schema to be stored and tracked over time.
  2. Company-Specific Schemas: Each schema is associated with a specific company through the company_id foreign key. This ensures that schemas are isolated between different companies using the system.
  3. JSON Schema Storage: The sample_file_data column uses the jsonb data type, allowing for flexible and efficient storage of schema structures in JSON format.
  4. Dataset Linkage: The source_dataset_id column links each schema to its source dataset, enabling traceability and data lineage.
  5. Access Control: Row-level security policies are implemented to restrict access based on user roles and company affiliation:
    • Only users with the ADMIN role can insert, update, or delete schemas.
    • All authenticated users from the same company can view (select) the schemas.

Notes

  • The table structure has evolved over time, with columns being renamed and added to better suit the application’s needs.
  • A unique constraint is enforced through a trigger function to ensure that no two rows can have the same source_dataset_id with different id values:
    CREATE OR REPLACE FUNCTION check_unique_source_dataset_id()
    RETURNS TRIGGER AS $$
    BEGIN
      IF EXISTS (
        SELECT 1 FROM log_file_schemas
        WHERE source_dataset_id = NEW.source_dataset_id
          AND id != NEW.id
      ) THEN
        RAISE EXCEPTION 'Duplicate source_dataset_id found';
      END IF;
      RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER enforce_unique_source_dataset_id
    BEFORE INSERT OR UPDATE ON log_file_schemas
    FOR EACH ROW EXECUTE FUNCTION check_unique_source_dataset_id();
    
  • The table is designed to work closely with the datasets table, allowing for a clear relationship between raw data and its corresponding schema.
By leveraging the log_file_schemas table, the Serial application can efficiently manage and version log file schemas for multiple companies, enabling robust log parsing and analysis capabilities while maintaining data isolation and access control.