Converting a list to a set changes element order


Recently I noticed that when I am converting a list to set the order of elements is changed and is sorted by character.

Consider this example:

x=[1,2,20,6,210]
print x 
# [1, 2, 20, 6, 210] # the order is same as initial order

set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted

My questions are -

  1. Why is this happening?
  2. How can I do set operations (especially Set Difference) without losing the initial order?

Answers:


  1. A set is an unordered data structure, so it does not preserve the insertion order.

  2. This depends on your requirements. If you have an normal list, and want to remove some set of elements while preserving the order of the list, you can do this with a list comprehension:

    >>> a = [1, 2, 20, 6, 210]
    >>> b = set([6, 20, 1])
    >>> [x for x in a if x not in b]
    [2, 210]
    

    If you need a data structure that supports both fast membership tests and preservation of insertion order, you can use the keys of a Python dictionary, which starting from Python 3.7 is guaranteed to preserve the insertion order:

    >>> a = dict.fromkeys([1, 2, 20, 6, 210])
    >>> b = dict.fromkeys([6, 20, 1])
    >>> dict.fromkeys(x for x in a if x not in b)
    {2: None, 210: None}
    

    b doesn't really need to be ordered here – you could use a set as well. Note that a.keys() - b.keys() returns the set difference as a set, so it won't preserve the insertion order.

    In older versions of Python, you can use collections.OrderedDict instead:

    >>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
    >>> b = collections.OrderedDict.fromkeys([6, 20, 1])
    >>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
    OrderedDict([(2, None), (210, None)])