Google Expands Indexing to Include CSV Files

Aug 26, 2023
2 min read

Google has discreetly made a change to its Search Central guidelines, highlighting its newfound ability to index .csv files. This revelation presents a fresh opportunity for websites to be crawled, but on the flip side, those who prefer to keep their .csv files private should consider adjusting their robots.txt to prevent such crawling.

Understanding CSVs

CSV, or Comma-separated values files, are essentially plain text documents that store data in a tabulated manner akin to spreadsheets. These files strictly contain textual data, excluding any styling details such as fonts, or media elements like pictures or interactive links.

Notably, these files are often employed to manage data in spreadsheet tools, or to list URLs for tools like Screaming Frog to crawl.

Breaking Down Google's Newly Acquired Capability

Historically, conducting a "filetype" search on Google for .csv files yielded no specific .csv file results. Example searches include:

- filetype:csv site:.gov

- filetype:csv site:.edu

- filetype:csv site:.com

While Google's recent dive into indexing .csv files is intriguing, it's worth noting that the tech giant has, in a way, been leveraging these files before. Google’s Dataset search has been incorporating .csv files, but with the stipulation that they're defined using structured data.

Looking back, Google's former Developer documentation (available on Archive.org) indicated that .csv files were a recognized standard for the dataset search features. This emphasis on tabular data in search results traces its origins back to 2018 when Google began showcasing such data in search results, given the data was paired with structured data.

The initial guidelines highlighted:

Datasets become more discoverable when they come with supplementary details such as their title, brief, author, and distribution modalities presented as structured data...

Recent Changes to Google's Documentation

In 2022, Google rehashed the above guidelines and relocated them to the newer Search Central Documentation. This revised guide underscores Google's dependence on structured data when it comes to exploiting .csv files for their dataset search appearances.

With these new changes, questions arise: Will Google, in the future, crawl .csv files for search appearances beyond the structured data context? Current documentation provides insights:

“Datasets become easily accessible when they're supplemented with details like their title, brief, author, and distribution modalities as structured data. Google’s methodology in discovering datasets taps into schema.org and diverse metadata standards that can be appended to dataset-describing pages…

Tying Google's CSV Indexing to Algorithmic Adjustments?

There's speculation within the industry about whether the indexing of .csv files has any correlation with Google's core algorithm alterations. Is it mere coincidence that both these updates seemed to align in their timeline? Or perhaps Google has either augmented their crawling mechanism for .csv indexing or had the capability latent within?

In the ever-changing world of search and continuous algorithm changes, only time will tell how these updates play out. As a Google Partner agency, our Search Channel team is continuously pushing boundaries in performance optimisation.