Detecting duplicated code before you merge the code could save you 10x time instead of fixing it later. Now Softagram detects new and existing copy-paste chunks and informs the locations in the code-review phase in our "Impact Report".
The report could show you e.g. following (example from GitHub discussion flow):
Report will show the file or location where the duplicated code is seen, and the beginning of the code including the amount of characters not presented.
Note: Softagram Impact report will show the duplicate code found only for the files that are modified in that specific Pull/Merge Request (Or Code Review/Patch set).
Softagram looks for copy paste fragments with minimum tokens approach that acts on tokens instead of exact source code text. This helps detecting duplicate code despite of whitespace differences.
Hints for Softagram tool admins for on-premise customers:
Default value for minimum tokens count in duplicate fragment is set to 50, which can be changed with a project level setting analysis.duplicates.min_tokens_in_dupe_fragment
Duplicates may exist in the same file or in multiple files. Because the duplicate seeking takes time 5-30 MLOC/h, and the speed decreases when code gets bigger, it is wise to consider disabling the feature for the huge projects with > 5 MLOC, and make several smaller projects for the independent repositories to ensure fast analysis.
This project level key value pair can be used for disabling it: feature.customization.disabled=duplicates