Skip to main content

Apache Spark

DRAFT This mapping definition is work in progress and may be subject to further change.

  • Spark Data Types coming from here: Link
  • DataSphere Data Types coming from here: Link
CDSSpark / Delta LakeDatasphereComment
cds.BooleanBOOLEANcds.Boolean
cds.String (length )STRINGcds.StringDatasphere Logic: IF cds.String(length = undefined) THEN cds.String(length = 5000)
cds.LargeStringSTRINGcds.LargeStringTODO: Check this. No length Limit?
cds.IntegerINTcds.Integer
cds.Integer64BIGINTcds.Integer64
cds.Decimal (precision, scale)DECIMAL(p,s)cds.DecimalDatasphere Logic: IF cds.Decimal(p < 17) THEN cds.Decimal(p = 17)
cds.Decimal (precision = 34, scale = floating)not supportedcds.DecimalFloatDecimal with scale = floating is not supported in spark
Amounts with Currencies cds.Decimal (precision = 34, scale = 4)cds.Decimal(34, 4)cds.Decimal(34, 4)Since spark does not support cds.DecimalFloat we use cds.Decimal(34,4) as compromise for now
cds.Double (precision, scale)DECIMAL(p,s)cds.DoubleDatasphere Logic: IF cds.Double (precision, scale) THEN cds.Double() (delete precision and scale)
cds.DateDATEcds.Date
cds.Time must be expressed as cds.String(6) or cds.String(12) depending on the source representation for now + the annotation @Semantics.time: trueSTRINGcds.String(6) or cds.String(12)Data is in format HHmmss or HH:mm:ss.SSS - consumer must use the function to_time() to convert to cds.Time
cds.DateTime sec precisionTIMESTAMPcds.Timestamp
cds.Timestamp µs precisionTIMESTAMPcds.Timestamp
cds.UUID + the annotation @Semantics.uuid: trueSTRING (36)cds.UUID