pyspark.pandas.Series.compare¶

Series.compare(other: pyspark.pandas.series.Series, keep_shape: bool = False, keep_equal: bool = False) → pyspark.pandas.frame.DataFrame[source]¶

Compare to another Series and show the differences.

Parameters

otherSeries: Object to compare with.
keep_shapebool, default False: If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.
keep_equalbool, default False: If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.

Returns

DataFrame

Notes

Matching NaNs will not appear as a difference.

Examples

>>> from pyspark.pandas.config import set_option, reset_option
>>> set_option("compute.ops_on_diff_frames", True)
>>> s1 = ps.Series(["a", "b", "c", "d", "e"])
>>> s2 = ps.Series(["a", "a", "c", "b", "e"])

Align the differences on columns

>>> s1.compare(s2).sort_index()
  self other
1    b     a
3    d     b

Keep all original rows

>>> s1.compare(s2, keep_shape=True).sort_index()
   self other
None  None
   b     a
None  None
   d     b
None  None

Keep all original rows and also all original values

>>> s1.compare(s2, keep_shape=True, keep_equal=True).sort_index()
  self other
  a     a
  b     a
  c     c
  d     b
  e     e

>>> reset_option("compute.ops_on_diff_frames")

pyspark.pandas.Series.append pyspark.pandas.Series.replace