Viewing a single comment thread. View all comments

nucLeaRStarcraft t1_jcoo30z wrote

more or less the same. However, the simplest way to start, at least that's what I found, is to randomize a sub sample of real data. It may be the case that synthetic data is simply too simple / does not capture the real distribution and can hide bugs.

Probably both is the ideal solution.

3

gdpoc t1_jcperei wrote

Also depends on privacy constraints, sometimes you can't persist the data.

5

Fender6969 OP t1_jcrnzzg wrote

Many of the clients I support have rather sensitive data and persisting this into a repo would be a security risk. I suppose creating synthetic data would be the next best alternative.

1