Script not Working for large data in C#
The arrayGroup below will be containing almost 190,000 int arrays.
so filtering them to retrieve groups of arrays in the array group that have two or more common digits is taking forever.
I am using the linq expression below, any ideas to make this more efficient?
List<int[]> arrayGroup = new List<int[]>();
var filteredGroups = arrayGroup
.SelectMany(arr1 => arrayGroup
.Where(arr2 => arr1 != arr2) // Don't compare array to itself
.Select(arr2 => new
{
Source = arr1,
Target = arr2,
Common = arr1.Intersect(arr2).Distinct().Count()
})
)
.Where(match => match.Common >= 2)
.GroupBy(match => match.Source)
.Select(group => group.Key)
.ToList();
1 answer
-
AI answer
The current query compares every array with every other array and, for each pair, computes
IntersectandDistinct().Count(). With ~190,000 arrays this is effectively an O(n²) algorithm and will be extremely slow regardless of LINQ syntax.To make it efficient, avoid pairwise comparison and instead index arrays by their digits.
Example approach:
- Build a map from each digit to the arrays that contain it.
- For each array, count how many other arrays share at least two digits with it using that map.
Sample code (no LINQ-heavy nested queries):
List<int[]> arrayGroup = new List<int[]>(); // Map: digit -> arrays containing that digit var digitIndex = new Dictionary<int, List<int[]>>(); foreach (var arr in arrayGroup) { foreach (var d in arr.Distinct()) { if (!digitIndex.TryGetValue(d, out var list)) { list = new List<int[]>(); digitIndex[d] = list; } list.Add(arr); } } var result = new List<int[]>(); foreach (var source in arrayGroup) { var sharedCounts = new Dictionary<int[], int>(ReferenceEqualityComparer<int[]>.Instance); foreach (var d in source.Distinct()) { if (!digitIndex.TryGetValue(d, out var list)) continue; foreach (var target in list) { if (ReferenceEquals(source, target)) continue; if (!sharedCounts.TryGetValue(target, out var count)) count = 0; count++; sharedCounts[target] = count; } } // Check if any target shares at least 2 digits if (sharedCounts.Values.Any(c => c >= 2)) result.Add(source); }ReferenceEqualityComparer<int[]>is a comparer that compares arrays by reference (so each array instance is a key). If needed, implement it as:public sealed class ReferenceEqualityComparer<T> : IEqualityComparer<T> where T : class { public static readonly ReferenceEqualityComparer<T> Instance = new(); private ReferenceEqualityComparer() { } public bool Equals(T x, T y) => ReferenceEquals(x, y); public int GetHashCode(T obj) => RuntimeHelpers.GetHashCode(obj); }This approach avoids the full cross-product and uses the digit index to only touch arrays that actually share digits, which is far more scalable than the original nested
SelectMany/Intersectquery.
References:
AI-generated content may be incorrect. Read our transparency notes for more information.
