Duplicate Detection
The Duplicate Detection tab scans one or more SharePoint Online sites to find redundant files or folders. It supports flexible matching criteria and produces a visual HTML report with collapsible group cards that make it easy to review and act on findings.
Scan Modes
Duplicate Files
Uses the SharePoint Search API to retrieve file metadata across the site(s). Files are grouped by matching criteria. This mode is fast because it leverages the search index rather than enumerating library contents directly.
Duplicate Folders
Enumerates all folders across libraries using the SharePoint REST API. Folders are grouped by matching criteria. This mode is more thorough but slower on large sites, because it must traverse the full folder tree.
Matching Criteria
Criteria are combinable: a group is only formed when all selected criteria match. The more criteria you select, the stricter the matching — resulting in fewer but more reliable duplicate groups.
Always Applied
| Criterion | Description |
|---|---|
| Name | The file or folder name (always used as the base grouping key) |
Optional — Files
| Criterion | Description |
|---|---|
| Same size | Files must have the same byte size |
| Same creation date | Files must have the same creation timestamp (date only) |
| Same modification date | Files must have the same last-modified timestamp (date only) |
Optional — Folders
| Criterion | Description |
|---|---|
| Same creation date | Folders must have the same creation timestamp |
| Same modification date | Folders must have the same last-modified timestamp |
| Same sub-folder count | Folders must contain the same number of immediate sub-folders |
| Same file count | Folders must contain the same number of immediate files |
⚠️ Note: Combining name + same size + same modification date is the most reliable way to identify true file duplicates. Using name alone produces many false positives for commonly named files such asREADME.docxorTemplate.xlsx.
Version History Exclusion
Paths containing /_vti_history/ are automatically excluded from all scans. This ensures that older versions stored in version history are not misidentified as duplicates of the current version.
Running a Scan
- Connect (see Connection and Profiles).
- Switch to the Duplicate Detection tab.
- Select Mode: Files or Folders.
- Select the criteria to apply (checkboxes).
- Choose CSV or HTML output.
- Click Scan.
Reading the HTML Report
The HTML report organizes duplicate groups into collapsible cards. Each card represents one group of items that match all selected criteria.
Status Badges
Identical — All items in the group are identical across every captured attribute.
Differences detected — Items share the matched criteria but differ in at least one other attribute.
For example: two files named Report.docx with the same size but different modification dates will show Differences detected.
Value Highlighting
| Color | Meaning |
|---|---|
| Green | This value matches all other items in the group |
| Orange | This value differs from at least one other item |
Navigation Controls
- Click a card header to expand or collapse the group details.
- Use Expand All / Collapse All buttons to manage all groups at once.
- Use the filter box to show only groups whose name contains the typed text.
Output Formats
CSV
Semicolon-delimited, UTF-8 with BOM. Each row represents one item, with a GroupId column to link items in the same duplicate group.
Filenames: Duplicates_Files_<YYYYMMDD>.csv / Duplicates_Folders_<YYYYMMDD>.csv
HTML
Filenames: Duplicates_Files_<YYYYMMDD>.html / Duplicates_Folders_<YYYYMMDD>.html
Recommended Workflows
Quick File Duplicate Audit
- Mode: Files
- Criteria: Name + Same size + Same modification date
- Sort the HTML report by group size (groups with the most items first)
- Investigate groups flagged as Identical — these are safe to review for deletion
Cross-Site Duplicate Check
- Select multiple sites in the site picker
- Run the scan — the
SiteUrlcolumn identifies which site each item belongs to - Groups containing items from different sites indicate cross-site duplication
See Also
- File Search — Find specific files by name, type, date, or author
- Connection and Profiles — Multi-site selection
- Output Files — File naming conventions