Tuesday, 13 July 2021

Extract text from multiple Google Docs into a Sheet

The following Google Apps Script is designed to iterate through Google Docs in a given Google Drive folder and extract the paragraphs of text along with a link to each file into a Google Sheet. A new column will automatically be appended for each paragraph.

This tool was designed with the vision that you may wish to centrally collate some comments/feedback written by others in Docs, into one central location so you do not need to open each file in turn.

Provide a Google Folder ID to extract text from Docs
Provide a Google Folder ID to extract text from Docs

Instructions

  1. Enter the ID of the Google Drive folder that contains all of the Google Docs into the box on the Welcome sheet.
  2. Click the Start button (or use the option from the Admin menu at the top) to run the tool.
  3. The Data sheet will then populate with file information, including the paragraphs of text. You can click the links to be taken directly to that Google Doc.
  4. The 'Log' sheet is designed to capture any error messages but these will also be displayed as a popup onscreen to help you troubleshoot any problems.

 

The Code

A file iterator is the method used to loop through each Google Doc in the Drive folder. Using this method we can 'getName()' and 'getUrl()' before then opening the file using the 'DocumentApp'.

From this point we can 'openByUrl()' then 'getBody()' which allows us to then 'getParagraphs()'. From here we need a loop as there may be multiple paragraphs from which we want to 'getText()'.

In the loop we check that the text is not empty (ie a blank paragraph) as we are not interested in adding blank columns to the Google Sheet. We also have a little counter (j in this instance) which moves along 1 column for every 1 paragraph of text added.


Download

Extract text from multiple Google Docs into a Sheet download here (please use 'File' > 'Make a copy' for your own version).


No comments:

Post a Comment