Duplicate Detection

The Duplicate Detection tab scans one or more SharePoint Online sites to find redundant files or folders. It supports flexible matching criteria and produces a visual HTML report with collapsible group cards that make it easy to review and act on findings.

Scan Modes

Duplicate Files

Uses the SharePoint Search API to retrieve file metadata across the site(s). Files are grouped by matching criteria. This mode is fast because it leverages the search index rather than enumerating library contents directly.

Duplicate Folders

Enumerates all folders across libraries using the SharePoint REST API. Folders are grouped by matching criteria. This mode is more thorough but slower on large sites, because it must traverse the full folder tree.

Matching Criteria

Criteria are combinable: a group is only formed when all selected criteria match. The more criteria you select, the stricter the matching — resulting in fewer but more reliable duplicate groups.

Always Applied

Criterion	Description
Name	The file or folder name (always used as the base grouping key)

Optional — Files

Criterion	Description
Same size	Files must have the same byte size
Same creation date	Files must have the same creation timestamp (date only)
Same modification date	Files must have the same last-modified timestamp (date only)

Optional — Folders

Criterion	Description
Same creation date	Folders must have the same creation timestamp
Same modification date	Folders must have the same last-modified timestamp
Same sub-folder count	Folders must contain the same number of immediate sub-folders
Same file count	Folders must contain the same number of immediate files

⚠️ Note: Combining name + same size + same modification date is the most reliable way to identify true file duplicates. Using name alone produces many false positives for commonly named files such as README.docx or Template.xlsx.

Version History Exclusion

Paths containing /_vti_history/ are automatically excluded from all scans. This ensures that older versions stored in version history are not misidentified as duplicates of the current version.

Running a Scan

Connect (see Connection and Profiles).
Switch to the Duplicate Detection tab.
Select Mode: Files or Folders.
Select the criteria to apply (checkboxes).
Choose CSV or HTML output.
Click Scan.

Reading the HTML Report

The HTML report organizes duplicate groups into collapsible cards. Each card represents one group of items that match all selected criteria.

Status Badges

Identical — All items in the group are identical across every captured attribute.

Differences detected — Items share the matched criteria but differ in at least one other attribute.

For example: two files named Report.docx with the same size but different modification dates will show Differences detected.

Value Highlighting

Color	Meaning
Green	This value matches all other items in the group
Orange	This value differs from at least one other item

Navigation Controls

Click a card header to expand or collapse the group details.
Use Expand All / Collapse All buttons to manage all groups at once.
Use the filter box to show only groups whose name contains the typed text.

Output Formats

CSV

Semicolon-delimited, UTF-8 with BOM. Each row represents one item, with a GroupId column to link items in the same duplicate group.

Filenames: Duplicates_Files_<YYYYMMDD>.csv / Duplicates_Folders_<YYYYMMDD>.csv

HTML

Filenames: Duplicates_Files_<YYYYMMDD>.html / Duplicates_Folders_<YYYYMMDD>.html

Recommended Workflows

Quick File Duplicate Audit

Mode: Files
Criteria: Name + Same size + Same modification date
Sort the HTML report by group size (groups with the most items first)
Investigate groups flagged as Identical — these are safe to review for deletion

Cross-Site Duplicate Check

Select multiple sites in the site picker
Run the scan — the SiteUrl column identifies which site each item belongs to
Groups containing items from different sites indicate cross-site duplication