Skip to main content

How It Works

The SharePoint connector is a self-contained integration that runs entirely inside your Azure subscription. ROOTKey publishes a Terraform module that you deploy once per site. The module wires up an Azure Function App that:
  1. Resolves your SharePoint site URL to a Graph site ID.
  2. Enumerates every document library (drive) on the site.
  3. Registers a Microsoft Graph webhook subscription per drive.
  4. On each notification, runs a delta query against the affected drive and streams every new/updated file to the ROOTKey API authenticated with your Connector API Key.
ROOTKey stores the file and anchors it on-chain, enabling both integrity verification and full file recovery in the event of corruption, ransomware, or accidental deletion.
SharePoint Site             Microsoft Graph         Azure Function App           ROOTKey API
─────────────────           ───────────────         ──────────────────           ───────────
  File created /     →      Webhook (per drive) →   Routes by subscriptionId →   POST /api-v1/connectors/files/
  updated in any                                    Acquires per-drive lease     (Connector API Key
  document library                                  Runs delta query             from Key Vault,
                                                    Streams each file            + idempotency
                                                                                 headers)
ROOTKey’s cyber resilience guarantee includes full recovery — not just detection. For that reason the connector uploads the full file content, not only a hash. Anchoring a hash alone cannot restore a corrupted, encrypted, or deleted file.
One connector handles every document library on a site. A typical SharePoint site has multiple libraries (e.g., Documents, Contracts, Marketing Assets). A single deployment auto-discovers them all and creates one Graph subscription per library. New libraries added later are picked up automatically on the next 12-hour reconciliation cycle.

What This Module Creates in Your Azure Subscription

Full transparency on what lands in your subscription when you terraform apply. Everything is namespaced by name_suffix + a deterministic hash of the site URL, so multiple deployments don’t collide.
ResourcePurposeCost impact
azurerm_linux_function_appThe connector itself (Node.js 22, Consumption plan, HTTPS-only, TLS 1.2 min, HTTP/2 enabled, CORS closed).Pay-per-execution; free tier covers 1M executions + 400 K GB-s/month, perpetual.
azurerm_service_plan (Y1 Consumption)Hosting plan for the Function App.No fixed cost; you pay only for executions above the free tier.
azurerm_storage_accountFunction backing, per-drive delta cursors, lock blobs, and DLQ.~$0.05–1/month for typical loads.
azurerm_storage_container (connector-state)Holds subscriptions.json (the drive→subscription registry), delta-{driveId}.txt per drive, and lock blobs (delta-sync-{driveId}.lock, subscriptions-reconciliation.lock).Included in storage cost.
azurerm_storage_queue (rootkey-dlq)Dead-letter queue for per-file failures, auto-replayed by a queue-triggered function.Free in normal operation.
azurerm_key_vault (Standard, RBAC) + 3 secretsStores the Graph client secret, ROOTKey API key, and the random webhook clientState. Secrets are never stored as plain Function App settings.~$0.01/month. Purge protection is enabled by default.
azurerm_user_assigned_identityThe Function App’s identity. Granted least-privilege RBAC: Key Vault Secrets User, Storage Blob Data Contributor, Storage Queue Data Contributor.None.
azurerm_log_analytics_workspace + azurerm_application_insightsTelemetry and logs with configurable retention (default 30 days).$0 within the App Insights free tier (5 GB/month).
Role assignmentsRBAC entries linking the managed identity to the Key Vault, blob, and queue scopes.None.
The module does not create or modify the Microsoft Entra ID App Registration — you create that yourself and pass the Tenant ID, Client ID, and Client Secret in (see Prerequisites). It also does not create the Resource Group, which must pre-exist. Cost does not scale with the number of document libraries on the site — one Function App handles them all. For a typical site with a few thousand uploads per month across all libraries, the total recurring cost added to your Azure bill is well under $5/month, dominated by egress to the ROOTKey API.

Infrastructure Impact Summary

Apart from creating one Microsoft Graph webhook subscription per document library on the configured site, no. The connector reads files via the App Registration’s Sites.Read.All + Files.Read.All permissions and never writes back to SharePoint. No mailbox, OneDrive, or Teams resource is touched. Subscriptions are scoped to the specific site — no tenant-wide enumeration.
The module creates new resources inside the Resource Group you specify and does not modify any pre-existing resources in it. Role assignments are scoped to the resources the module itself creates — it does not grant any permissions on resources outside its scope.
On the next 12-hour reconciliation cycle (or the next deploy / Function App restart, which triggers an immediate reconciliation via runOnStartup: true), the connector enumerates the site’s libraries, detects the new one, and creates a Graph subscription for it. No human action required.
The reconciliation cycle detects the missing library, deletes its Graph subscription, and removes its per-drive delta cursor and lock blobs. No orphaned subscriptions are left behind. DLQ messages still in flight for the removed library are detected and dropped during replay.
Yes. The whole purpose of the connector is to forward file content to ROOTKey so it can be anchored and recovered. Transport is HTTPS-only (the module rejects non-https:// API URLs at plan time). Files are streamed directly from Graph to the ROOTKey API; the Function App never writes them to local storage or to any other Azure service.
The Graph client secret you paste into Terraform, the ROOTKey API Key, and a randomly generated webhook clientState are all stored in Azure Key Vault in your own subscription, encrypted at rest with the Microsoft-managed key for Key Vault. The Function App resolves them at boot using its managed identity and Key Vault references (@Microsoft.KeyVault(SecretUri=…)). They are not stored as plain Function App settings.
Each file gets up to 3 upload attempts with exponential backoff (initial 1s, capped at 30s) before being sent to the Azure Storage DLQ. From the DLQ, a queue-triggered function automatically replays each message up to 5 more times (queue retries with backoff). Only after all those retries fail does the message land in the rootkey-dlq-poison queue for human inspection. You can configure an Azure Monitor alarm on the DLQ or poison queue length.
A timer trigger runs every 12h (and on every cold start, via runOnStartup) that reconciles all subscriptions for the site and runs a safety-net delta sync per drive. So even if a notification is dropped, the missed changes are picked up — at worst, within the next 12h, or immediately on the next deploy/restart.
The webhook is protected by a 32-char random clientState value generated at apply time and stored in Key Vault. Any POST to the webhook URL without the matching clientState is rejected with 401. Notifications also carry a subscriptionId that the function maps to a known drive — unknown subscription IDs are logged and ignored.
Yes. Running terraform destroy removes every resource the module created (Function App, storage account, Key Vault, App Insights, identity, role assignments). The App Registration, Resource Group, and SharePoint site are not deleted. Note: if you kept purge_protection on the Key Vault (the default), the Vault and its secrets will remain in soft-delete state for 7 days after destroy before they can be fully purged.

Prerequisites

Before starting, ensure you have:
  • An Azure subscription and a pre-existing Resource Group to host the connector.
  • Permissions to register applications in Microsoft Entra ID (Application Administrator or Global Administrator) and to grant admin consent.
  • Permissions to apply Terraform with Contributor and User Access Administrator (or equivalent) on the chosen Resource Group.
  • The SharePoint site URL of the site you want to monitor (e.g. https://contoso.sharepoint.com/sites/legal). The connector self-discovers every document library on the site — you do not need to provide library IDs.
  • Terraform v1.3 or later installed locally (or in a CI/CD pipeline that runs terraform apply).
  • Node.js v22 or later on the machine running Terraform — the Function App source is compiled at apply time.
  • Azure CLI authenticated (az login) or service principal credentials in the environment.

Required Microsoft Graph and Azure Permissions

Microsoft Graph (Application permissions)

The App Registration you create needs two Microsoft Graph application permissions with admin consent:
PermissionTypeWhy
Sites.Read.AllApplicationResolve the site URL to a site ID and enumerate its document libraries.
Files.Read.AllApplicationRead drive items and file content for each library.

Azure RBAC (granted by the module to its own managed identity)

For full transparency — the module attaches these role assignments to a brand-new user-assigned managed identity it creates. None of these grant access to anything outside the resources the module itself provisions:
RoleScope
Key Vault Secrets UserThe Key Vault created by the module.
Storage Blob Data ContributorThe Storage Account created by the module.
Storage Queue Data ContributorThe Storage Account created by the module.
The Terraform principal applying the module needs Contributor (to create the resources) and User Access Administrator (to attach those role assignments) on the Resource Group.

Configuration Fields

FieldRequiredDefaultDescription
Connector NameYesA human-readable name to identify this connector in the dashboard.
Destination VaultYesThe ROOTKey vault where anchored files will be stored.
Tenant IDYesMicrosoft Entra ID tenant ID (a UUID).
Client IDYesApplication (client) ID of the App Registration.
Client SecretYesA client secret generated for the App Registration. Stored in Key Vault by the module.
Site URLYesFull URL of the SharePoint site to monitor. Must end with .sharepoint.com.
Azure RegionYesAzure region where the connector resources will be deployed (e.g., westeurope).
Resource Group NameYesPre-existing Azure Resource Group.
Name SuffixYes3–12 lowercase alphanumeric chars used to namespace the resources (e.g., acme or prod).
Max file size (bytes)No524288000 (500 MiB)Files larger than this are skipped and sent to the DLQ.
Log retention (days)No30Application Insights / Log Analytics retention (30–730).
Key Vault purge protectionNotrueKeep purge protection enabled for production. Set to false only during short pilots; once enabled it cannot be disabled and the Key Vault cannot be fully purged for 7 days after destroy.
TagsNo{}Extra tags applied to every module-managed resource.

Setup

The setup has a natural ordering: the dashboard requires the Tenant/Client/Site URL to create the connector, and the Function App requires the Connector API Key to call the ROOTKey API. The dashboard resolves this by generating a ready-to-run Terraform block with all values pre-filled.
1

Register an application in Microsoft Entra ID

Go to the Azure PortalMicrosoft Entra IDApp registrationsNew registration.
  • Name: something descriptive, e.g., ROOTKey SharePoint Connector.
  • Supported account types: Accounts in this organizational directory only.
  • Redirect URI: leave blank.
Click Register. Note the Application (client) ID and Directory (tenant) ID — you will need both.
2

Grant Microsoft Graph permissions

In your new App Registration, go to API permissionsAdd a permissionMicrosoft GraphApplication permissions.Add both:
  • Sites.Read.All
  • Files.Read.All
Then click Grant admin consent for [your tenant] and confirm.
3

Create a client secret

Go to Certificates & secretsNew client secret.
  • Set an expiry appropriate for your rotation policy (e.g., 12 or 24 months).
  • Click Add and immediately copy the Value — it is shown only once.
Store the secret securely until you paste it into the dashboard.
Azure App Registration secrets expire. Set a calendar reminder ahead of the expiry — when the secret expires, the connector starts failing with 401 from Graph. Rotation steps are in the Troubleshooting section.
4

Pre-create the Resource Group

In your subscription, create (or pick) a Resource Group to host the connector. The Terraform principal needs Contributor and User Access Administrator on that Resource Group.
5

Confirm the SharePoint site URL

You only need the URL — the connector resolves it to a site ID at runtime and discovers every document library under it.Examples of acceptable values:
  • https://contoso.sharepoint.com/sites/legal
  • https://contoso.sharepoint.com/sites/marketing/
  • https://contoso.sharepoint.com (root site)
The hostname must end with .sharepoint.com.
6

Create the connector in the dashboard

Go to app.rootkey.aiConnectorsNew Connector → select SharePoint.Fill in all required fields (see Configuration Fields above). Save the connector.
7

Copy the Connector API Key and the Terraform block

At the end of the wizard, the dashboard displays:
  1. The Connector API Key.
  2. A ready-to-run Terraform block, pre-filled with your values.
The Connector API Key is shown only once and is already embedded in the Terraform block. Copy both now and store them securely before closing this screen. The key cannot be retrieved again.
The generated block looks like:
module "rootkey_sharepoint_connector" {
  source = "github.com/rootkey-ai/rootkey-connectors//sharepoint"

  resource_group_name = "rootkey-connectors"
  azure_location      = "westeurope"
  name_suffix         = "acme"

  graph_tenant_id     = "11111111-1111-1111-1111-111111111111"
  graph_client_id     = "22222222-2222-2222-2222-222222222222"
  graph_client_secret = "Xyz~RandomSecretFromAppRegistration"
  site_url            = "https://contoso.sharepoint.com/sites/legal"

  rootkey_api_key = "rk_conn_xxxxxxxxxxxxxxxxxxxx"

  # Optional
  max_file_size_bytes               = 524288000
  log_retention_days                = 30
  enable_key_vault_purge_protection = true
  tags = {
    "cost-center" = "security"
  }
}
8

Deploy the Terraform module

Save the block into a .tf file in an empty directory, then run:
terraform init
terraform apply
The module bundles the Function App source, provisions every resource, and wires the managed identity, Key Vault, and storage roles.
9

Validate the connector

Because the timer is registered with runOnStartup: true, the Function App reconciles subscriptions within seconds of deploy. You can confirm by inspecting the connector-state blob container:
az storage blob list \
  --container-name connector-state \
  --account-name $(terraform output -raw storage_account_name) \
  --query "[].name"
You should see subscriptions.json (with one entry per document library on the site) and one delta-{driveId}.txt per drive after the first sync.Then upload a test file to any document library on the site. Within seconds it should appear in the destination vault and the connector status in the dashboard should be ACTIVE.

Reliability and observability

The connector is built for at-least-once delivery to ROOTKey with explicit handling of every failure mode.

Retry behaviour

LayerRetriesWhen
In-function retry3 attempts per file (initial + 2 retries) with exponential backoff (1s → 2s, capped 30s) and jitterOn 429, 5xx, network or timeout errors from the ROOTKey API.
DLQ replayThe queue-triggered dlqReplay function automatically reprocesses every DLQ message, with another 3-attempt in-function budget per replayUntil either success or the queue’s maxDequeueCount (5) is exhausted.
Poison queueIf all DLQ replays fail with transient errors, the message lands in rootkey-dlq-poison for human inspectionPersistent or systemic failure.
Permanent failure short-circuitWhen DLQ replay hits a PermanentError (oversize, 4xx), the message is acked immediately and a structured marker is loggedThe poison queue is bypassed deliberately so it stays reserved for “we don’t know why this keeps failing”.
Graph-side retryMicrosoft Graph retries the webhook for up to ~4 hours with exponential backoffThe Function App returns 5xx (only when the delta query itself fails).
Safety-net delta syncThe reconciliation timer also runs a delta query per drive — catches up on any missed webhooksEvery 12h and on every cold start.

Idempotency

Every upload to the ROOTKey API carries three headers extracted from the Graph drive item:
HeaderSource
x-rootkey-source-drive-idThe drive that emitted the notification.
x-rootkey-source-item-idGraph driveItem.id.
x-rootkey-source-etagGraph driveItem.eTag (quotes stripped).
The ROOTKey API uses these to deduplicate redelivered events.

Concurrent invocations

The connector serializes work at two levels:
  • Per-drive sync lease (delta-sync-{driveId}.lock): only one delta sync per drive runs at a time. Independent drives sync in parallel. Concurrent invocations for the same drive return 202 Accepted immediately.
  • Global reconciliation lease (subscriptions-reconciliation.lock): only one timer instance reconciles subscriptions at a time. Without this, two concurrent timer runs would race on subscriptions.json and create duplicate Graph subscriptions per drive (Graph allows duplicates per resource — each duplicate generates an extra notification per change).
Both leases auto-expire if the holder crashes, so the system self-recovers.

What to monitor

SignalWhat it meansHow to alert
rootkey-dlq queue length > 0 for more than ~10 minDLQ replay is failing repeatedly with transient errors.Azure Monitor metric alert on the queue length.
rootkey-dlq-poison queue receives a messageA file failed all DLQ replay attempts for a transient-looking reason — human attention needed.Azure Monitor metric alert on ApproximateMessageCount.
App Insights trace contains rootkey.event.dlq_replay_terminal_failureA file hit a permanent error during DLQ replay (oversize, 4xx). The poison queue is bypassed deliberately.App Insights alert on the message marker.
App Insights trace contains rootkey.metric.sync_lease_contention (steady-state)Concurrent notifications for the same drive serialize — small numbers are healthy. Sustained high rate means a drive is being hammered.App Insights chart on bin(timestamp, 5m).
App Insights trace contains rootkey.metric.reconciliation_lease_contentionTwo timer instances raced on subscription reconciliation; the loser skipped. Expected at most once per cycle.App Insights alert if it appears more than ~3×/day.
Function App Http5xx > 0The webhook is failing (Graph will retry).Azure Monitor metric alert.
renewSubscription hasn’t run in > 13hTimer is unhealthy or the Function App is stopped.App Insights availability or platform health metric.
ROOTKey dashboard connector status ERRORAPI rejected an upload (invalid key, vault deleted, quota).Email/Slack via your dashboard notification settings.
Useful Kusto queries:
// All recent errors
traces
| where cloud_RoleName == "<function_app_name>"
| where severityLevel >= 3
| order by timestamp desc
| take 100

// Terminal DLQ failures (PermanentError caught during replay)
traces
| where cloud_RoleName == "<function_app_name>"
| where message has "rootkey.event.dlq_replay_terminal_failure"
| order by timestamp desc

// Per-drive sync lease contention
traces
| where cloud_RoleName == "<function_app_name>"
| where message has "rootkey.metric.sync_lease_contention"
| summarize contentions = count() by bin(timestamp, 5m)
| render timechart

// Reconciliation lease contention (should be near zero)
traces
| where cloud_RoleName == "<function_app_name>"
| where message has "rootkey.metric.reconciliation_lease_contention"
| order by timestamp desc
To peek at DLQ contents:
az storage message peek \
  --queue-name rootkey-dlq \
  --account-name $(terraform output -raw storage_account_name) \
  --num-messages 10

Security considerations

The module ships with a defensive default posture; a few choices have intentional trade-offs that are worth understanding upfront:
  • Secrets in Key Vault, not Function App settings. Graph client secret, ROOTKey API key, and webhook clientState are all stored in Key Vault with the Function App’s managed identity granted Key Vault Secrets User (read-only) RBAC.
  • Key Vault purge protection is enabled by default. Set enable_key_vault_purge_protection = false only for short pilots — once enabled it CANNOT be disabled and the vault cannot be fully purged for 7 days after terraform destroy.
  • CORS is closed. The webhook is server-to-server (Graph); browser access is explicitly disallowed.
  • HTTPS-only, TLS 1.2 minimum, FTPS disabled, HTTP/2 enabled on the Function App.
  • shared_access_key_enabled = true on the Storage Account is a known constraint of the Azure Functions Consumption plan: the runtime requires the legacy AzureWebJobsStorage connection string to bootstrap. The connector’s own state operations use RBAC via the managed identity, not the keys. To remove the keys entirely, you need to move to a Premium / Flex Consumption / App Service plan that supports identity-based connections; that path is on the ROOTKey roadmap.

Filtering Rules

To anchor only specific files (e.g., only PDFs, or exclude temporary files), configure Filtering Rules on the connector after creation. Rules apply on the ROOTKey side — files filtered out are not stored in the vault.

Troubleshooting

Check in this order:
  1. Subscriptions registered. Inspect subscriptions.json in the connector-state container — there should be one entry per document library on the site. If absent, the reconciliation timer hasn’t run successfully yet; force it via the Azure Portal (Function App → renewSubscriptionCode + TestRun).
  2. DLQ contents. Use az storage message peek on rootkey-dlq. If messages are present, look at rootkey-dlq-poison too — that’s where messages land after all replays fail.
  3. App Insights traces. Run the Kusto query in Reliability and observability to find recent errors.
  4. Graph permissions. In Microsoft Entra ID → App registrations → API permissions, both Sites.Read.All and Files.Read.All must show admin consent (green check).
  5. Site access. Tenant-wide Graph permissions usually suffice, but custom site permission policies in your tenant (e.g., Sites.Selected workflows) can block the App Registration from reading the site. Check with your SharePoint admin.
Common causes:
  • Client Secret has expired. Generate a new secret in the App Registration, update the graph_client_secret Terraform variable, and run terraform apply. The new secret is written to Key Vault and the Function App picks it up on the next cold start (or restart the Function App to force it).
  • Admin consent was revoked for one of the Graph permissions. Re-grant consent in the App Registration.
  • The destination ROOTKey vault was deactivated or deleted. Reactivate it or change the connector’s vault binding.
The dashboard error panel shows the underlying message from the ROOTKey API or the Graph service.
The reconciliation timer picks up new libraries every 12h. To force an immediate pick-up, restart the Function App (e.g., az functionapp restart) — runOnStartup: true on the timer triggers reconciliation on the next cold start.If the library still doesn’t appear in subscriptions.json after a restart, check the App Insights traces for the renewSubscription invocation — the most likely cause is a permission issue with that specific library.
The Function App’s memory and max_file_size_bytes together cap the maximum file size. The default is 500 MiB on a Consumption plan instance.To support larger files: raise max_file_size_bytes in the Terraform module input and consider moving to a Premium or App Service plan with a higher memory ceiling. Open an issue on the connector repository if you need help.
  1. In the dashboard, delete the connector and create a new one (the App Registration and Site URL can be reused).
  2. Update the rootkey_api_key Terraform variable with the new key.
  3. Run terraform apply — the module writes the new key into Key Vault. The Function App picks it up on the next cold start.
  1. In Microsoft Entra ID → App registrations → your app → Certificates & secrets, create a new client secret. Copy its value immediately.
  2. Update the graph_client_secret Terraform variable.
  3. Run terraform apply — the new value goes into Key Vault.
  4. Restart the Function App (e.g., az functionapp restart) to force it to pick up the new secret immediately, instead of waiting for the cached OAuth token to expire (up to 1 h).
  5. After confirming the connector is healthy, delete the old secret in the App Registration.
Yes — deploy the module once per site. Each instance is fully isolated: its own Function App, Key Vault, storage account, DLQ, and identity, namespaced by name_suffix and a hash of the site URL. You can reuse the same App Registration and Resource Group across sites.
The poison queue holds messages that the queue trigger could not process after all retries. Inspect each message — it includes the original DLQ payload (driveId, itemId, fileName, size, eTag, error). Common root causes:
  • The drive item was deleted or moved by the user before retries could complete (safe — the message can be discarded).
  • The ROOTKey vault is unreachable due to a misconfiguration or a key/vault rotation gone wrong.
  • A persistent Graph permission issue affecting one library.
Once the root cause is resolved you can replay a message by copying it back to rootkey-dlq (Azure Storage Explorer makes this easy).
With enable_key_vault_purge_protection = true (the default), Azure prevents the Key Vault from being fully deleted until the soft-delete retention window (7 days) elapses. If you destroy and re-apply within that window, Terraform may attempt to recover the soft-deleted Key Vault automatically (the module’s provider config enables recover_soft_deleted_key_vaults). For short-lived pilots that need to recycle freely, set enable_key_vault_purge_protection = false.

Source code

The Terraform module and Function App source live in the public rootkey-ai/rootkey-connectors repository under the sharepoint/ directory. The code is licensed under the Apache License 2.0 — you are free to fork it, audit it, or pin to a specific commit if your change-management process requires it.
→ Back to Connectors Overview