Tuesday 13 July 2021

Extract text from multiple Google Docs into a Sheet

The following Google Apps Script is designed to iterate through Google Docs in a given Google Drive folder and extract the paragraphs of text along with a link to each file into a Google Sheet. A new column will automatically be appended for each paragraph.

This tool was designed with the vision that you may wish to centrally collate some comments/feedback written by others in Docs, into one central location so you do not need to open each file in turn.

Provide a Google Folder ID to extract text from Docs
Provide a Google Folder ID to extract text from Docs

Instructions

  1. Enter the ID of the Google Drive folder that contains all of the Google Docs into the box on the Welcome sheet.
  2. Click the Start button (or use the option from the Admin menu at the top) to run the tool.
  3. The Data sheet will then populate with file information, including the paragraphs of text. You can click the links to be taken directly to that Google Doc.
  4. The 'Log' sheet is designed to capture any error messages but these will also be displayed as a popup onscreen to help you troubleshoot any problems.

 

The Code

A file iterator is the method used to loop through each Google Doc in the Drive folder. Using this method we can 'getName()' and 'getUrl()' before then opening the file using the 'DocumentApp'.

From this point we can 'openByUrl()' then 'getBody()' which allows us to then 'getParagraphs()'. From here we need a loop as there may be multiple paragraphs from which we want to 'getText()'.

In the loop we check that the text is not empty (ie a blank paragraph) as we are not interested in adding blank columns to the Google Sheet. We also have a little counter (j in this instance) which moves along 1 column for every 1 paragraph of text added.


Download

Extract text from multiple Google Docs into a Sheet download here (please use 'File' > 'Make a copy' for your own version).


7 comments:

  1. Hi Scott

    Thank you!

    This is only designed to get the Document 'body' not other areas of the Google Doc. You may be able to do this with 'getHeader()' - https://developers.google.com/apps-script/reference/document/document?hl=en#getheader

    Kind regards
    Phil

    ReplyDelete
  2. Hi Phil! I tried using this but i keep getting this error: "Problem with File: Exception: The document is inaccessible. Please try again later." I've checked the permissions on the file and its set to "Anyone with link can edit" so Im not sure what else to try.

    ReplyDelete
    Replies
    1. Hi Jeff

      Hhhmmm, you're definitely entering the ID of the Drive 'folder' into the Google Sheet right?
      It's a Google Doc and not a Word Doc?
      Have you tried creating a fresh Google Doc in a folder with a simple bit of text in and trying that?

      Ta
      Phil

      Delete
  3. Hi Phil,

    I'm using documents with dropdown menu's but theyre not being recognised as text so its just skipping them, anything i can do to fix this?

    Thanks,

    ReplyDelete
    Replies
    1. Hi James

      As in Smart Chips? To my knowledge Google are still yet to release compatibility with Smart Chips and Apps Script - as it's affecting a number of other systems I would like to develop further.

      Ta

      Delete
    2. Dropdowns dont come under smart chips in the menu but yeah perhaps they're under the same limitations.

      Delete
    3. Rightio. Sorry I've not done a lot more with extracting Google Doc text as I find it very cumbersome and try to work with Spreadsheets where possible.

      Delete