Skip to main content

Apache Spark

DRAFT This mapping definition is work in progress and may be subject to further change.

  • Spark Data Types coming from here: Link
  • DataSphere Data Types coming from here: Link
CDSSpark / Delta LakeDatasphereCommentSpark format
cds.BooleanBOOLEANcds.Boolean
cds.String (length )STRINGcds.StringDatasphere Logic: IF cds.String(length = undefined) THEN cds.String(length = 5000)
cds.LargeString (length )STRINGcds.LargeString
cds.IntegerINTcds.Integer
cds.Integer64BIGINTcds.Integer64
cds.Decimal (precision = p, scale = s)DECIMAL(p,s)cds.DecimalDatasphere Logic: IF cds.Decimal(p < 17) THEN cds.Decimal(p = 17)
cds.Decimal (precision = p, scale = floating)not supportedcds.DecimalDecimal with scale = floating is not supported in spark
Amounts with Currencies cds.Decimal (precision = 34, scale = 4)cds.Decimal(34, 4)cds.Decimal(34, 4)Since spark does not support cds.DecimalFloat we use cds.Decimal(34,4) as compromise for now
cds.DoubleDOUBLEcds.Double
cds.DateDATEcds.Date"yyyyMMdd"
cds.Time must be expressed as cds.String(6) or cds.String(12) depending on the source representation for now + the annotation @Semantics.time: trueSTRINGcds.String(6) or cds.String(12)Data is in format HHmmss or HH:mm:ss.SSS - consumer must use the function to_time() to convert to cds.Time
cds.DateTime sec precisionTIMESTAMPcds.Timestamp
cds.Timestamp µs precisionTIMESTAMPcds.Timestamp"yyyy-MM-dd'T'HH:mm:ss.SSSSSSS"
cds.UUID + the annotation @Semantics.uuid: trueSTRING (36)cds.UUID