Duplicate Detection

The Duplicate Detection tab scans one or more SharePoint Online sites to find redundant files or folders. It supports flexible matching criteria and produces a visual HTML report with collapsible group cards that make it easy to review and act on findings.


Scan Modes

Duplicate Files

Uses the SharePoint Search API to retrieve file metadata across the site(s). Files are grouped by matching criteria. This mode is fast because it leverages the search index rather than enumerating library contents directly.

Duplicate Folders

Enumerates all folders across libraries using the SharePoint REST API. Folders are grouped by matching criteria. This mode is more thorough but slower on large sites, because it must traverse the full folder tree.


Matching Criteria

Criteria are combinable: a group is only formed when all selected criteria match. The more criteria you select, the stricter the matching — resulting in fewer but more reliable duplicate groups.

Always Applied

CriterionDescription
NameThe file or folder name (always used as the base grouping key)

Optional — Files

CriterionDescription
Same sizeFiles must have the same byte size
Same creation dateFiles must have the same creation timestamp (date only)
Same modification dateFiles must have the same last-modified timestamp (date only)

Optional — Folders

CriterionDescription
Same creation dateFolders must have the same creation timestamp
Same modification dateFolders must have the same last-modified timestamp
Same sub-folder countFolders must contain the same number of immediate sub-folders
Same file countFolders must contain the same number of immediate files
⚠️ Note: Combining name + same size + same modification date is the most reliable way to identify true file duplicates. Using name alone produces many false positives for commonly named files such as README.docx or Template.xlsx.

Version History Exclusion

Paths containing /_vti_history/ are automatically excluded from all scans. This ensures that older versions stored in version history are not misidentified as duplicates of the current version.


Running a Scan

  1. Connect (see Connection and Profiles).
  2. Switch to the Duplicate Detection tab.
  3. Select Mode: Files or Folders.
  4. Select the criteria to apply (checkboxes).
  5. Choose CSV or HTML output.
  6. Click Scan.

Reading the HTML Report

The HTML report organizes duplicate groups into collapsible cards. Each card represents one group of items that match all selected criteria.

Status Badges

Identical — All items in the group are identical across every captured attribute.

Differences detected — Items share the matched criteria but differ in at least one other attribute.

For example: two files named Report.docx with the same size but different modification dates will show Differences detected.

Value Highlighting

ColorMeaning
GreenThis value matches all other items in the group
OrangeThis value differs from at least one other item

Navigation Controls


Output Formats

CSV

Semicolon-delimited, UTF-8 with BOM. Each row represents one item, with a GroupId column to link items in the same duplicate group.

Filenames: Duplicates_Files_<YYYYMMDD>.csv / Duplicates_Folders_<YYYYMMDD>.csv

HTML

Filenames: Duplicates_Files_<YYYYMMDD>.html / Duplicates_Folders_<YYYYMMDD>.html


Recommended Workflows

Quick File Duplicate Audit

  1. Mode: Files
  2. Criteria: Name + Same size + Same modification date
  3. Sort the HTML report by group size (groups with the most items first)
  4. Investigate groups flagged as Identical — these are safe to review for deletion

Cross-Site Duplicate Check

  1. Select multiple sites in the site picker
  2. Run the scan — the SiteUrl column identifies which site each item belongs to
  3. Groups containing items from different sites indicate cross-site duplication

See Also