What is Metadata?
Metadata describes other data; it provides structured reference data that identifies attributes of the information that it describes so that the data can be identified and sorted. Metadata is created anytime a document or file is modified, including when it is deleted. Some examples of basic document file metadata include the author, date created, date modified, and file size. If a user needs to locate a specific document file, they can search for it by searching for one of those particular attributes. Metadata is not only used for document files, but it is also used for images, relational databases, computer files, spreadsheets, video files, audio files, and web pages. Some typical metadata elements include title, descriptions, tags, categories, author, creation date/time, most recent modification date/time, and users who have permission to access or update. The following are examples of typical metadata for different data types:
Image. Date/time, filename, camera settings, geolocation.
Relational database. Tables, columns, data types, constraints, table relationships.
Computer file. File name, type, size, creation date/time, last modification date/time.
Spreadsheet. Tab names, table names, column names, user comments.
Book. Title, author, publisher details, copyright details, description on back, table of contents, index, page numbers.
Email. Subject, from, to, date/time sent, sending and receiving server names and IPs, format, anti-spam software details.
Web page. Page title, page description, icons.
As you can see, there is quite a bit of valuable information contained within the metadata, and that information is different across data types. These differences pose challenges for metadata management, as the various data types and associated metadata are being accumulated across various systems and technologies, providing many opportunities for data to be compromised. Having consistency across the data is the best practice to ensure that the metadata is accurate, as are the conclusions reached through the analysis of metadata, regardless of the metadata’s structure or location. Having consistency also ensures that sensitive data is secure by defining and controlling access to metadata and its management. Having a metadata management plan can not only keep your metadata secure and accessible, but it also has the following advantages:
Data quality. If metadata management is automated, data quality is assured as the data is regulated and data inconsistencies and errors can be identified and addressed in real time.1 Quality rules can be set for the metadata based on the data types and usage types.
Regulatory compliance. Regulations such as GDPR, CCPA, HIPAA, and BCBS are consequential in fields such as finance, retail, healthcare, and pharmaceutical/life sciences. Having consistent and accurate metadata makes it easier to achieve the regulatory compliance standards governing the storage, use, and transmission of sensitive data.
Speed and productivity. With automated metadata management, data scientists, analysts, and IT staff spend far less time identifying source data and resolving errors, freeing up their time for work on higher value activities.
1 Kandregula, 2020, “An Introduction to Metadata Management”