Generate valid Dataset JSON-LD structured data for data pages and research publications. Help Google Dataset Search discover your datasets with proper name, description, and distribution markup.
OneStepToRank monitors your structured data in production, alerts you when schema breaks, and tracks how your rich results change over time.
Get StartedDataset schema is structured data markup that tells search engines a page hosts or describes a dataset. Built on the Schema.org Dataset type, it provides machine-readable details about your data: the name, description, creator, license, file format, download URL, and geographic or temporal coverage. When Google reads this markup, it indexes your dataset in Google Dataset Search, a specialized search engine used by researchers, data scientists, journalists, and analysts to find publicly available data from across the web.
Without Dataset schema, your data page is essentially invisible to Google Dataset Search. Even if your dataset ranks in regular Google search, it will not appear in the dedicated dataset search experience that increasingly drives data discovery. Structured data is the only way to ensure your datasets are found by the people who need them most.
Data publishers compete for visibility in a growing ocean of publicly available datasets. Government agencies, universities, research labs, and companies publish millions of datasets, and discoverability is the key differentiator. Dataset schema gives your data a structured presence in Google Dataset Search, displaying your dataset name, creator, license, and description in a format that researchers trust and recognize.
Google Dataset Search uses Dataset structured data to power its search results and filters. Users can filter by license type, file format, update frequency, and geographic coverage -- but only if your schema includes those fields. Pages with complete Dataset markup rank higher in Dataset Search and provide users with the confidence to download and use your data. Including a clear license is especially important, as researchers need to know whether they can legally use, modify, and redistribute the data before investing time in analysis.
Copy the generated JSON-LD script tag from this tool and paste it into the <head> section of your dataset page, or place it before the closing </body> tag. JSON-LD is Google's preferred format for structured data because it is decoupled from the visual content, making it easy to add and maintain without changing your page layout.
For data portals hosting many datasets, generate JSON-LD dynamically from your metadata database. Each dataset page should have its own unique schema with accurate name, description, and distribution details. If your datasets are part of a larger catalog (like data.gov or a university repository), include the includedInDataCatalog property to establish that relationship.
After deploying, validate your live page with the Rich Results Test and check Google Dataset Search directly to confirm your dataset appears. Use this generator alongside our Local Rank Checker and other free SEO tools to build a comprehensive structured data strategy.
Dataset schema markup is structured data you add to web pages that host or describe datasets. It uses the Schema.org Dataset type encoded in JSON-LD format, providing machine-readable details like the dataset name, description, creator, license, file format, and download URL. This enables your dataset to appear in Google Dataset Search, making it discoverable by researchers, data scientists, and analysts worldwide.
Google Dataset Search is a specialized search engine that indexes datasets from across the web. It relies heavily on Schema.org Dataset markup to discover and understand datasets. Pages with proper Dataset structured data are eligible to appear in Dataset Search results with rich metadata including the creator, license, format, and coverage. Without this markup, your dataset is essentially invisible to this important discovery channel.
Google requires at minimum a name and description for Dataset schema. The description should be between 50 and 5000 characters and should clearly explain what data the dataset contains, how it was collected, and what it can be used for. Strongly recommended fields include creator, license, datePublished, distribution (with download URL and file format), and keywords for the best visibility in Dataset Search.
The license property should contain a URL pointing to the full text of the license under which your dataset is distributed. Common options include Creative Commons licenses such as CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) and CC0 (https://creativecommons.org/publicdomain/zero/1.0/), as well as Open Data Commons licenses. Google Dataset Search displays the license prominently, so a well-known open license makes your dataset more accessible and attractive to potential users.