Quick-and-dirty knowledge base for ODU RCS.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

244 lines
9.6 KiB

Google Drive & RClone: Setting Up CLI Access to Google Drive Data
=================================================================
> This is the original draft (Nov 17, 2022).
> The published version is here:
> https://wiki.hpc.odu.edu/en/DataMgmt/cloud/grive-rclone-setup
Google Drive is a popular cloud storage platform to backup and share files. This article provides a step-by-step guidance to enable access and transfer data from your Drive to/from ODU HPC via rclone command-line program. By using rclone, you will be able to automate data transfer and synchronization between the Drive and the cluster storage.
> This article assumes that you have installed rclone (or rclone is available) on your system. Refer to [rclone downloads page](https://rclone.org/downloads/) if you need to download and install rclone.
>
> On Wahab HPC, you will use `module load rclone` to make rclone available to your shell environment.
{.is-info}
> This guide can also be used to enable access to Google Drive from Linux, Mac, and Windows desktop.{.is-info}
Setting Up Access
-----------------
> Because of the web access involved somewhere in the steps, it is best that you use the [remote desktop](https://wiki.hpc.odu.edu/GettingStarted#connecting-via-rdp) or [virtual desktop](/open-ondemand/virtual-desktop).
The first step is to issue the `rclone config` command. This will guide you through a series of questions, which will be broken up and commented throughout due to the length. First, we need to create a new **remote**, which is simply a user-defined name for a particular Google Drive storage area. in the following instruction, we will use `my-gdrive` as a name, but please feel free to specify a name that best describes your data (it must not contain whitespaces or begin with a dash [`-`]).
```
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> my-gdrive
```
Rclone will prompt your response after the `>` character. Here, `n` and `my-gdrive` are the responses to the question. In the illustration above, no remote has been created yet, so there are only a few options. If you have existing remote(s), you will see more options.
Next, we need to specify the storage type. Type in "drive" for Google Drive.
```
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / 1Fichier
\ "fichier"
2 / Alias for an existing remote
\ "alias"
3 / Amazon Drive
\ "amazon cloud drive"
...
12 / Google Cloud Storage (this is not Google Drive)
\ "google cloud storage"
13 / Google Drive
\ "drive"
14 / Google Photos
\ "google photos"
...
Storage> drive
```
The following steps will ask for a "client ID". It is highly recommended that you use ODU's client ID so that your rclone sessions would perform better (i.e. faster):
```
Google Application Client Id
Setting your own is recommended.
See https://rclone.org/drive/#making-your-own-client-id for how to create your own.
If you leave this blank, it will use an internal key which is low performance.
Enter a string value. Press Enter for the default ("").
client_id> 605919805393-odnfmddo2v24ffodmg80j6ht4oi4kftn.apps.googleusercontent.com
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret> ### SEND EMAIL TO ITSResearchAndCloudComputing@odu.edu for this value
```
For security reasons, we do not publish the client secret. Please contact us via email to get the client secret value (it will begin with `GOCSPX`).
The next prompt will ask what kind of access you want. in >99% of the cases, you will want to us option one (`drive`), which gives you full read-write access to your data stored in the Drive (you can limit the write/modify access later when using `rclone`).
```
Scope that rclone should use when requesting access from drive.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Full access all files, excluding Application Data Folder.
\ "drive"
2 / Read-only access to file metadata and file contents.
\ "drive.readonly"
/ Access to files created by rclone only.
3 | These are visible in the drive website.
| File authorization is revoked when the user deauthorizes the app.
\ "drive.file"
/ Allows read and write access to the Application Data folder.
4 | This is not visible in the drive website.
\ "drive.appfolder"
/ Allows read-only access to file metadata but
5 | does not allow any access to read or download file content.
\ "drive.metadata.readonly"
scope> drive
```
Root folder: Do you want to allow access to the entire Drive? Or just a specific subfolder in your Drive? This is where you can specify it. If you leave blank, you will use the root folder of the Drive.
```
ID of the root folder
Leave blank normally.
Fill in to access "Computers" folders (see docs), or for rclone to use
a non root folder as its starting point.
Enter a string value. Press Enter for the default ("").
root_folder_id>
```
> ### What is my folder ID?
> The Google folder ID is shown as a series of letters and digits in the URL of the corresponding folder from the web interface. You can use the "Get link" submenu (or button), which will return an URL like this:
>
> `https://drive.google.com/drive/folders/16hY6ZurF09Ax1GzsxqJDJNNxsv-P8ihe?usp=share_link`
>
> The `16hY6ZurF09Ax1GzsxqJDJNNxsv-P8ihe` string is the root folder ID.
> *(FYI this is a demo folder on Research Computing's Google Drive, it is safe but does not contain anything useful to you, most likely.)*
{.is-info}
The next prompt asks for service account. Skip this by leave it blank.
```
Service Account Credentials JSON file path
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a string value. Press Enter for the default ("").
service_account_file>
```
The next series of prompts are important. Use the auto-config to launch the web browser *on the same machine* to give rclone permission to access your data stored in the Drive storage. This will allow rclone to access your data *from this machine only*.
```
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> y
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth?state=SOME_RANDOM_STRING
Log in and authorize rclone for access
Waiting for code...
```
You will need to authorize access from the browser. If you have not logged in to your ODU Google Drive account, please do so now and authorize access to this.
<!-- FIXME insert screenshots -->
At this time, on the browser you will see a prompt like this:
> **rclone for ODU research computing** wants access to your Google Account.
> ...
> This will allow **rclone for ODU research computing** to: See, edit, create, and delete all of your Google Drive files.
> Make sure you trust rclone for ODU research computing.
> Despite its scary-sounding advice, you need to allow access. This is what connects the `rclone` program to your data to be able to manipulate them. It is *your* invocation fo the rclone program to the *remote* you specify that will "see, edit, create, and delete" the data on your Drive. You can always remove Drive access from rclone from your Google Account settings.{.is-info}
The next steps are finalization:
```
Got code
Configure this as a team drive?
y) Yes
n) No (default)
y/n> n
--------------------
[my-gdrive]
type = drive
client_id = 605919805393-odnfmddo2v24ffodmg80j6ht4oi4kftn.apps.googleusercontent.com
client_secret = GOCSPX*******
scope = drive
token = {"access_token":"###REDACTED###","token_type":"Bearer","refresh_token":"###REDACTED###","expiry":"2022-11-17T05:07:15.32276879Z"}
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:
Name Type
==== ====
my-gdrive drive
```
> If you want to access a shared Drive (or sometimes called team Drive) instead of a personal Drive storage (do not confuse this with a Drive location shared by somebody to you personally), you will need to respond "y" to the question "Configure this as a team drive?".{.is-info}
Voila! Your Drive setup is good to go.
Testing the Drive Access
------------------------
Let us now test if this access works correctly. Let us just list the contents of the root folder. From the terminal, type (do not include `$` shell prompt):
```
$ rclone ls --max-depth=1 my-gdrive:
```
If all is well, you should see the listing of all the files in the root directories (no folders).
Here is an example from one of the staff members' listing (redacted):
```
$ rclone ls --max-depth=1 wpurwant-gdrive:
-1 BLANK - Old Dominion University, Norfolk Maturity/Capabilities Model Assessment.xlsx
129915 Position Statements and Bios_2020.pdf
67430 NSF_RFI_Response_final.pdf
22627 DEAPSECURE 2.0 brainstorming
-1 DataUp response.docx
20318 DeapSECURE-module-3-MachineLearning
-1 Fabric Benchmarking 2017.docx
-1 ODU Training.docx
-1 ODU Zoom meetings.docx
-1 PEARC19 Champion Related Activities.docx
-1 Research Computing Strategy brainstorming doc.docx
-1 Restricted-data-computing-platforms-ODU-2022.d20220407.pptx
```
The first number on every row is the file size. If it is -1, it indicates a native Goggle document (Docs, Sheets, Slides). Other files will show the file sizes.
References
----------
* Official documentation:
https://rclone.org/drive/