We're one step closer to DataFrame 1.0! We've been working hard to iron out bugs and improve documentation and examples throughout.
Try out this release by updating to 1.0.0-Beta3
in your project or use %use dataframe
in your notebook!
📚 Learn more: https://kotlin.github.io/dataframe/
🌟 Examples: https://github.com/Kotlin/dataframe/tree/master/examples
See below for a complete list of changes in this release grouped by category.
1.0.0-Beta3 Highlights
- NEW: Reading data from Parquet files
- NEW: Reading data from DuckDB databases
- Our KDocs and website have received many, many additions, as you can see below, but some that stand out are:
- Docs about the compiler plugin: https://kotlin.github.io/dataframe/compiler-plugin.html
- A setup guide with instructions for each supported platform: https://kotlin.github.io/dataframe/setup.html
- (Un)supported Data sources overview: https://kotlin.github.io/dataframe/data-sources.html
- DataFrame Concepts and Principles: https://kotlin.github.io/dataframe/concepts.html
- Frequently Asked Questions: https://kotlin.github.io/dataframe/faq.html
- Migration away from Gradle/KSP Plugin: https://kotlin.github.io/dataframe/migration-from-plugins.html
- New examples:
- Using the compiler plugin in a project: https://github.com/Kotlin/dataframe/tree/master/examples/kotlin-dataframe-plugin-example
- DataFrame on Android: https://github.com/Kotlin/dataframe/tree/master/examples/android-example
- "Unsupported data sources" examples, including Exposed, Spark, Hibernate, and Multik: https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples
- DataFrame with Spark and Parquet files: https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/spark-parquet-dataframe
format
to HTML has had several improvements and bug fixes; though, the API might have breaking changes.- We continue to improve the compiler plugin to better reflect the schemas of dataframes in compile time. You can see the compile time schema by hovering over a function call in the IDE — the schema will be displayed in the popup. If there are any unexpected results, feel free to open an issue or contact us!
Known Issues
- IN NOTEBOOKS use version
1.0.0-Beta3n
instead of1.0.0-Beta3
. This uses the patch of #1435 for issue #1116 avoiding DefinitelyNotNullable errors. This is the version that will be provided when writing%use dataframe
. CheckdataFrameConfig.version
in your notebook to check which version you are using. - median and percentile require explicit type arguments for non-numeric columns
FormattedFrame
does not render formatted in notebooks- This has since been fixed but might require a yet to be released version of IntelliJ.
- See #1405
Features
DataFrameSchema.generateCode()
by @Jolanrensen in #1230- add nulls parse option in readExcel by @AndreiKingsley in #1247
- Kotlin 2.2 by @Jolanrensen in #1286
- Parse String to UUID #1006 by @EmmanuelBerkowicz in #1287
- Makes
prev()?.newValue()
work in Add DSL by @Jolanrensen in #1303 - Added readOnly mode by @zaleslaw in #1325
- Add support for validating SQL queries with the
WITH
clause by @zaleslaw in #1327 - Add deprecated filterNotNull for discoverability of dropNulls by @koperagen in #1334
parseExperimentalUuid
in ParserOptions by @Jolanrensen in #1306- Add support for reading parquet file thanks to arrow-dataset #576 by @fb64 in #577
- duckdb support by @Jolanrensen in #1366
- Format colgroups by @Jolanrensen in #1374
- Add support for read-only mode in DuckDB and SQLite by @zaleslaw in #1383
- migration from kotlinx.datetime.Instant -> kotlin.time.Instant by @Jolanrensen in #1368
- Make AnyFrame.renderToString public by @bwjohnson92 in #1395
Docs, Examples, and KDocs
- Website docs landing by @AndrewKis in #1170
- Website docs examples dev by @AndreiKingsley in #1179
- fix snippets iframes by @AndreiKingsley in #1190
- Website docs kotlin notebook by @AndreiKingsley in #1180
- Generate docs by @AndreiKingsley in #1194
- Rename WS docs by @AndreiKingsley in #1191
- Website docs dev by @AndreiKingsley in #1196
- fix kodex iframes by @AndreiKingsley in #1201
- Number unification docs by @Jolanrensen in #1200
- Add initial compiler plugin documentation by @koperagen in #1202
- Add few more pages about compiler plugin by @koperagen in #1204
- Remove some of the outdated info from README.md by @koperagen in #1205
- Minor fixes for the docs by @koperagen in #1208
- Update README.md by @koperagen in #1209
- Update Gradle plugin docs and fix hardcoded versions by @koperagen in #1211
- Update documentation for "create dataframe" by @koperagen in #1212
- Remove examples for deprecated overloads from tests by @koperagen in #1213
- Notebook updates by @Jolanrensen in #1224
- Adding links to column selectors in docs by @Jolanrensen in #1226
- fix: fixed German companies notebook sample by @kikoso in #1234
- fix kodex docs iframes by @AndreiKingsley in #1233
- fix escaping in iframes by @AndreiKingsley in #1237
- add pictures for guides by @AndreiKingsley in #1239
- various docs fixes to lower the number of errors by @Jolanrensen in #1238
- Plugin example by @AndreiKingsley in #1241
- KTNB-1041: Fix resource loading by @ileasile in #1250
- Extension properties docs by @AndreiKingsley in #1246
- Readme update by @AndreiKingsley in #1251
- IDE sample of "unsupported sources"->DataFrame by @Jolanrensen in #1231
add
kdocs by @AndreiKingsley in #1261- Gather kdocs by @AndreiKingsley in #1272
- Corr kdocs by @AndreiKingsley in #1275
- filter kdocs by @AndreiKingsley in #1288
- columns selector type by @AndreiKingsley in #1274
explode
docs and tests by @AndreiKingsley in #1291- improve concept docs structure by @AndreiKingsley in #1305
- Simplify compiler plugin setup in documentation by @koperagen in #1317
- modules topic by @AndreiKingsley in #1314
- Flatten navigation of Modify operations by @koperagen in #1312
- faq topic by @AndreiKingsley in #1316
- Docs small fixes by @AndreiKingsley in #1323
- generateCode.kt refactor, KDocs, documentation fixes, and tests by @Jolanrensen in #1311
- Data schemas docs by @AndreiKingsley in #1339
- Data sources docs by @AndreiKingsley in #1352
- Setup docs by @AndreiKingsley in #1357
format
kdocs++ by @Jolanrensen in #1346- count and countDistinct kdocs by @AndreiKingsley in #1301
- clarify how to create BaseColumn by @koperagen in #1371
- add spelling conventions and name fixes in a whole project by @AndreiKingsley in #1364
- android example by @AndreiKingsley in #1349
- Android example: removing arrow from the examples by @Jolanrensen in #1426
- Add Hibernate example with H2 in-memory database to idea-examples by @zaleslaw in #1367
- Extension property name generation fix & docs & tests by @AndreiKingsley in #1380
- Add SQL-to-Kotlin DataFrame transition guide for backend developers by @zaleslaw in #1377
- [Junie]: docs: update ColumnGroup description with relevant links by @jetbrains-junie[bot] in #1404
- Added Spark-Parquet-KDF example by @zaleslaw in #1391
- Rename kdocs by @AndreiKingsley in #1402
- Insert kdocs by @AndreiKingsley in #1406
- Util functions docs by @AndreiKingsley in #1414
Fixes
- Bump apache poi to 5.4.1 by @Badya in #1198
- fixed compilation on android by @Jolanrensen in #1203
- Android compilation fix by @Jolanrensen in #1218
- IDEA goodness by @Jolanrensen in #1235
- Use
DayOfWeek.isoDayNumber
instead ofDayOfWeek.value
inDateTests.kt
. by @chaoren in #1232 - Run configs by @AndreiKingsley in #1236
- Reusage schemas fix by @Jolanrensen in #1252
- Remove mandatory mariadb dependency from jdbc by @koperagen in #1267
- GroupBy Jupyter codegen by @Jolanrensen in #1263
- Add more specific explodeLists overload that helps to avoid unnecessary cast by @koperagen in #1269
- Bump gradle to 8.14.2 by @Jolanrensen in #1285
- Removing
recursively()
remnants from public api by @Jolanrensen in #1302 - Replace static initialization of defaultTimeZone with getter by @koperagen in #1315
- Improve SQL validation logic and add related test by @zaleslaw in #1324
- Migrate publishing by @koperagen in #1331
- Hide
mapToColumn
in CS DSL in favor ofexpr
by @koperagen in #1360 - Adds
tableTypes
toDbType
by @Jolanrensen in #1283 format
kdocs++ by @Jolanrensen in #1346- skipKodex gradle parameter in dataframe-csv by @koperagen in #1370
- Fix jupyter logger by @AndreiKingsley in #1361
- Initialize JetBrains Junie 🚀 by @jetbrains-junie[bot] in #1379
- Fixed reading Parquet file on Windows by @zaleslaw in #1381
- Extension property name generation fix & docs & tests by @AndreiKingsley in #1380
- Migrate usages of deprecated API in implementation & warning fixes by @koperagen in #1382
- [Junie]: fix: update deprecation message for DataRow.get method by @jetbrains-junie[bot] in #1399
- Removed
deprecatedSchemaGeneratorPlugin
Gradle plugin publication by @Jolanrensen in #1400 - migration from kotlinx.datetime.Instant -> kotlin.time.Instant by @Jolanrensen in #1368
- Columngroup map fix by @AndreiKingsley in #1396
- Adds
is_formatted
to json encoding for DF by @Jolanrensen in #1405 - Util functions fixes by @AndreiKingsley in #1412
- Add API that debugger renderer can rely on to avoid breaking changes by @koperagen in #1421
- Update Gradle to 9.0 by @koperagen in #1428
- What's the version 1727 and 1548? Fixed by @Jolanrensen in #436
Compiler Plugin
Note that many changes to the compiler plugin are made in Kotlin, as this is where the compiler plugin now lives.
- Prepare Split for compiler plugin support by @koperagen in #1197
- Prepare update and implode operations for compiler plugin support by @koperagen in #1207
- Deprecate predicate argument in single, colsInGroups, colsAtAnyDepth by @koperagen in #1176
- Add missed compiler plugin annotations for Split by @koperagen in #1262
- add compiler annotations for dropNA by @koperagen in #1330
- add missed compiler annotations for update by @koperagen in #1333
- Add plugin annotations for a new batch of functions by @koperagen in #1362
- Rollback parameter name to keep compatibility with the compiler plugin by @koperagen in #1358
- cols with default argument deprecation by @AndreiKingsley in #1369
- Add basic tests that API exposed for compiler plugin works without excluded dependencies by @koperagen in #1397
- Remove annotations dependency from compiler-plugin-core by @koperagen in #1409
- Add annotations to support types extending DataRow in the compiler plugin by @koperagen in #1410
- [Compiler plugin] Support more functions in toDataFrame DSL and AddDsl by @koperagen in #1425
- Prepare utility to assert compile time column order by @koperagen in #1427
New Contributors
- @Badya made their first contribution in #1198
- @kikoso made their first contribution in #1234
- @chaoren made their first contribution in #1232
- @EmmanuelBerkowicz made their first contribution in #1287
- @jetbrains-junie[bot] made their first contribution in #1379
- @bwjohnson92 made their first contribution in #1395
Full Changelog: v1.0.0-Beta2...v1.0.0-Beta3