In the world of data warehousing and large-scale data analysis, table formats like Apache Iceberg play a pivotal role in managing massive datasets. If you're dealing with Iceberg tables, maintaining and optimizing them can lead to significant performance improvements. Let's delve into the three key ways to maintain Iceberg tables for optimized performance and data management. Partitions Partitioning your Iceberg tables is one of the simplest and most effective methods to boost performance.

Continue reading

After the release of macOS 10.15.2 two days agao I have upgraded my mac at work to latest version today. Immediately, after running pip to install some packages I was greeted with an abort. I checked the crash reporter to find the offender: Application Specific Information: /usr/lib/libcrypto.dylib abort() called Invalid dylib load. Clients should not load the unversioned libcrypto dylib as it does not have a stable ABI. After poking around for a bit I figured out it was because of the asn1crypto library.

Continue reading

When dealing with geospatial data it is sometimes useful to have a grid at hand that represents the given data. One way to create a grid like this is to use Geohashes. GeoHashes are a hierarchical spatial data structure which subdivides space into buckets of grid shape, which is one of the many applications of what is known as a Z-order curve, and generally space-filling curves. A Geohash is an encoded character string that is computed from geographic coordinates.

Continue reading

On October 5th the PostgreSQL Global Development Group announced the release of PostgreSQL 10. It comes with tremendous amount of new features like Table partitioning Logical replication Improved parallel queries Stronger password hashing Durable Hash Indexes and more. A nice list, including explanations can be found on Robert Haas’ blog. This post explains how to upgrade to the latest version of PostgreSQL on macOS using Homebrew. At the time of this writing I was using macOS 10.

Continue reading

After three great days at the PyCon US 2017 in Portland, OR Hendrik and I decided to participate in the development sprints succeeding the conferece. The code sprints are an essential part of PyCon, and a chance to meet some of the maintainers and contributors of various open source projects. For us it was the first time attending a code sprint. The day before the sprint there was a session helping people to set up Git, Python (including virtual environments) and getting familiar with version control.

Continue reading

Google Ads, the globally renowned advertising platform, empowers numerous businesses to strategically place ads, reach prospective customers, and grow their presence. The kaleidoscope of data that Google Ads provides forms the bedrock of insightful business decisions, higher return on investment, and the optimization of AdWords campaigns. While Google Ads features a user-friendly interface for data access and management, some tasks often benefit from a programmatic approach. In response to this need, Google provides the AdWords API.

Continue reading

Recently, I set up Jupyter Notebooks on a server at work. The idea was to create an enviroment where every team member could run analyses using Python and share the results with the rest. After reading the documentation, I found out that the Jupyter Notebook web application comes with a Contents API I quickly put together a little Munin script that collects some statistics about the current notebooks. The graph shows the total number of notebooks on the server as well as the currently open notebooks:

Continue reading

There are a lot of cases when we want to track time when an entity was created or updated. Here is a simple recipe to make some or all of your SQLAlchemy entities auto-timestamping. To achieve this, we will provide a mixin class. from datetime import datetime from sqlalchemy import Column, DateTime, event class TimeStampMixin(object): """ Timestamping mixin """ created_at = Column(DateTime, default=datetime.utcnow) created_at._creation_order = 9998 updated_at = Column(DateTime, default=datetime.utcnow) updated_at.

Continue reading

Everyday at work around noon the question of where to get lunch comes up. Normally, we choose between different restataurants in the vicinity of the office. One exception is the HU Mensa (university cafeteria). Despite being really cheap the food quality there varies a lot and it really depends on the daily menu whether a visit is worthwhile. To tackle this issue I decided to spent another IT Open Space putting together a little script that will help us in the future.

Continue reading

At Project-A we are using Codebase as a project management tool together with its version control. Just as with any other tools you can create tickets and organize them in sprints. Our usual (very simplified) workflow includes: Sprint planning for tickets Priotizing tickets Developer working on tickets Product managers verifying if the tickets were implemented as intended Unfortunately, sometimes your backlog keeps growing and tickets are no longer valid, outdated or, in the worst case, just forgotten.

Continue reading

Author's picture

Christian Stade-Schuldt

Data Engineer @ HERE IoT innovation lab| Full-time geek | Cyclist | Learning from data

Data Engineer @ HERE

Berlin, Germany