I Built the Same Service with Kotlin Coroutines and Java Virtual Threads — Here’s What I Found 🧪
The results were surprising — I expected the opposite.
With the introduction of Project Loom (Virtual Threads in Java), there’s been a lot of debate in the developer community:Should we stick with well-established Kotlin Coroutines or embrace Java’s evolutionary Virtual Threads? 🧵
In this blog post, I’ve implemented the same service using both technologies and benchmarked their performance under load.
🏗️ The Task: Building a UserProfileBuilder
We’ll create a service that builds a complete user profile by:
1. Fetching the user by their id
2. Once the user is available, concurrently fetching:
— Their orders
— Their preferences
— Their notifications
3. Aggregating everything into a single `UserProfile` object
Given the following service and builder interfaces, we’ll implement them using Kotlin Coroutines and Java Virtual Threads — each playing to their strengths.
Kotlin
interface UserService {
suspend fun fetchUser(userId: Int): User
suspend fun fetchOrders(userId: Int): List<Order>
suspend fun fetchPreferences(userId: Int): Preferences
suspend fun fetchNotifications(userId: Int): List<Notification>
}
interface UserProfileBuilder {
suspend fun buildUserProfile(id: Int): UserProfile
}Java
interface UserService {
User fetchUser(int id);
List<Order> fetchOrders(int userId);
Preferences fetchPreferences(int userId);
List<Notification> fetchNotifications(int userId);
}
interface UserProfileBuilder {
UserProfile buildUserProfile(int id);
}🧬 Domain Models
Both implementations share similar domain models, just in their respective syntaxes.
Kotlin
data class UserProfile(
val user: User,
val orders: List<Order>,
val preferences: Preferences,
val notifications: List<Notification>,
)
data class User(val id: Int, val name: String)
data class Order(val id: String, val item: String)
data class Preferences(val darkMode: Boolean, val language: String)
data class Notification(val id: String, val message: String)Java
public record UserProfile(User user, List<Order> orders, Preferences preferences, List<Notification> notifications) {
}
public record User(int id, String name) {
}
public record Order(String id, String item) {
}
public record Preferences(Boolean darkMode, String language) {
}
public record Notification(String id, String message) {
}⚙️ Implementing the Services
Kotlin Coroutine Implementation
In Kotlin, functions that simulate I/O (like API or database calls) are marked with the suspend keyword. This allows them to pause without blocking the underlying thread, making it ideal for concurrent operations.
To run tasks in parallel, we use async, which starts lightweight coroutines and allows us to wait for their results without blocking — resulting in highly efficient, scalable code.
class MockUserService : UserService {
override suspend fun fetchUser(userId: Int) = delayAndReturn(500) { User(userId, "User-$userId") }
override suspend fun fetchOrders(userId: Int) =
delayAndReturn(500) { listOf(Order("O1", "Item A"), Order("O2", "Item B")) }
override suspend fun fetchPreferences(userId: Int) = delayAndReturn(500) { Preferences(true, "en-US") }
override suspend fun fetchNotifications(userId: Int) =
delayAndReturn(500) { listOf(Notification("N1", "Welcome"), Notification("N2", "Discount")) }
private suspend fun <T> delayAndReturn(ms: Long, block: () -> T): T {
delay(ms)
return block()
}
}
class CoroutineUserProfileBuilder(private val userService: UserService) : UserProfileBuilder {
override suspend fun buildUserProfile(id: Int): UserProfile = coroutineScope {
val user = userService.fetchUser(id)
val orders = async { userService.fetchOrders(user.id) }
val prefs = async { userService.fetchPreferences(user.id) }
val notifications = async { userService.fetchNotifications(user.id) }
UserProfile(user, orders.await(), prefs.await(), notifications.await())
}
}☕ Java Virtual Threads Implementation
With Java 21+, we can use Virtual Threads for scalability without changing our blocking code.
public class UserServiceImpl implements UserService {
@Override
public User fetchUser(int id) {
delay(500);
return new User(id, "User-%d".formatted(id));
}
@Override
public List<Order> fetchOrders(int id) {
delay(500);
return List.of(new Order("O1", "Item A"), new Order("O2", "Item B"));
}
@Override
public Preferences fetchPreferences(int id) {
delay(500);
return new Preferences(true, "en-US");
}
@Override
public List<Notification> fetchNotifications(int id) {
delay(500);
return List.of(new Notification("N1", "Welcome"), new Notification("N2", "Discount"));
}
private void delay(long ms) {
try {
Thread.sleep(ms);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}Two ways to implement the builder using virtual threads:
➤ Using ExecutorService (stable)
class UserProfileBuilderImpl implements UserProfileBuilder {
private final UserService userService;
private final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
UserProfileBuilderImpl(UserService service) {
this.userService = service;
}
@Override
public UserProfile buildUserProfile(int id) throws Exception {
var user = userService.fetchUser(id);
var orders = executor.submit(() -> userService.fetchOrders(user.id()));
var prefs = executor.submit(() -> userService.fetchPreferences(user.id()));
var notifs = executor.submit(() -> userService.fetchNotifications(user.id()));
return new UserProfile(user, orders.get(), prefs.get(), notifs.get());
}
}Using StructuredTaskScope(preview)
@Override
public UserProfile buildUserProfile(int id) {
var user = userService.fetchUser(id);
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
var orders = scope.fork(() -> userService.fetchOrders(user.id()));
var prefs = scope.fork(() -> userService.fetchPreferences(user.id()));
var notifs = scope.fork(() -> userService.fetchNotifications(user.id()));
scope.join(); // Wait for all tasks
return new UserProfile(user, orders.get(), prefs.get(), notifs.get());
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}🚀 Benchmarking Setup
Benchmarks simulate concurrent requests at increasing scale:
1,000
10,000
100,000
1,000,000 concurrent requests
The test measures:
Time taken to build all user profiles
Active thread count (including and excluding daemon threads)
🔬 Benchmark Implementation
We wrote separate benchmark runners for Kotlin Coroutines and Java Virtual Threads to simulate concurrent requests and measure execution time and thread usage.
Each benchmark:
Accepts a list of user
idsCalls
buildUserProfile(id)concurrently for all of themMeasures total duration and thread count
Kotlin Benchmark Code
class BenchmarkRunner(
private val builder: UserProfileBuilder,
private val name: String,
private val threadDump: Boolean = true
) {
private val logger: Logger = LoggerFactory.getLogger(BenchmarkRunner::class.java)
suspend fun runBenchmark(userIds: List<Int>) = coroutineScope {
logger.info("🚀 Running benchmark: $name on ${userIds.size} Concurrent Users")
val start = System.nanoTime()
val userProfiles = userIds.map { id ->
async {
try {
builder.buildUserProfile(id)
} catch (ex: Exception) {
logger.warn("❌ User $id failed: ${ex.message}")
null
}
}
}.awaitAll().filterNotNull()
val durationMs = (System.nanoTime() - start) / 1_000_000
logger.info("✅ Completed ${userProfiles.count()} profiles in $durationMs ms")
if (threadDump) {
val threads = Thread.getAllStackTraces().keys
logger.info("🧵 Active threads (total): ${threads.count { it.isAlive }}")
logger.info("🧵 Non-daemon threads: ${threads.count { it.isAlive && !it.isDaemon }}")
}
}
}Usage:
suspend fun main() {
val builder = CoroutineUserProfileBuilder(MockUserService())
val runner = BenchmarkRunner(builder, "CoroutineUserProfileBuilder")
listOf(
(1..1_000),
(1..10_000),
(1..100_000),
(1..1_000_000),
).forEach { ids -> runner.runBenchmark(ids.toList()) }
}☕ Java Benchmark Code
public class BenchmarkRunner {
private final UserProfileBuilder builder;
private final String name;
private final boolean threadDump;
private final ExecutorService executor;
private static final Logger logger = LoggerFactory.getLogger(BenchmarkRunner.class);
public BenchmarkRunner(UserProfileBuilder builder, String name, boolean threadDump, ExecutorService executor) {
this.builder = builder;
this.name = name;
this.threadDump = threadDump;
this.executor = executor;
}
public void runBenchmark(List<Integer> userIds) {
logger.info("🚀 Running benchmark: {} on {} Concurrent Users", name, userIds.size());
long start = System.nanoTime();
List<Future<UserProfile>> futures = userIds.stream()
.map(id -> executor.submit(() -> {
try {
return builder.buildUserProfile(id);
} catch (Exception ex) {
logger.warn("❌ User {} failed: {}", id, ex.getMessage());
return null;
}
}))
.toList();
List<UserProfile> results = futures.stream()
.map(future -> {
try {
return future.get();
} catch (Exception e) {
return null;
}
})
.filter(Objects::nonNull)
.toList();
long durationMs = (System.nanoTime() - start) / 1_000_000;
logger.info("✅ Completed {} profiles in {} ms", results.size(), durationMs);
if (threadDump) {
var threads = Thread.getAllStackTraces().keySet();
logger.info("🧵 Active threads (total): {}", threads.stream().filter(Thread::isAlive).count());
logger.info("🧵 Non-daemon threads: {}", threads.stream().filter(t -> t.isAlive() && !t.isDaemon()).count());
}
}
}Usage:
public static void main(String[] args) {
var builder = new UserProfileBuilderImpl(new UserServiceImpl());
var executor = Executors.newVirtualThreadPerTaskExecutor();
var runner = new BenchmarkRunner(builder, "Virtual Thread Profile Builder", true, executor);
Stream.of(
IntStream.rangeClosed(1, 1_000).boxed(),
IntStream.rangeClosed(1, 10_000).boxed(),
IntStream.rangeClosed(1, 100_000).boxed(),
IntStream.rangeClosed(1, 1_000_000).boxed()
).map(Stream::toList)
.forEach(runner::runBenchmark);
}📊 Benchmark Results
### ✅ Kotlin Coroutines
| Requests | Time (ms) | Threads (Active / Non-daemon) |
| --------- | --------- | ----------------------------- |
| 1,000 | 1,062 | 16 / 1 |
| 10,000 | 1,124 | 16 / 1 |
| 100,000 | 1,359 | 16 / 1 |
| 1,000,000 | 6,652 | 16 / 1 |
---
### 🧵 Java Virtual Threads (ExecutorService)
| Requests | Time (ms) | Threads (Active / Non-daemon) |
| --------- | --------- | ----------------------------- |
| 1,000 | 1,046 | 15 / 1 |
| 10,000 | 1,115 | 15 / 1 |
| 100,000 | 1,351 | 15 / 1 |
| 1,000,000 | 8,501 | 15 / 1 |
---
### 🧵 Java Virtual Threads (StructuredTaskScope)
| Requests | Time (ms) | Threads (Active / Non-daemon) |
| --------- | --------- | ----------------------------- |
| 1,000 | 1,045 | 15 / 1 |
| 10,000 | 1,109 | 15 / 1 |
| 100,000 | 1,369 | 15 / 1 |
| 1,000,000 | 15,110 | 15 / 1 |🧠 Note:
StructuredTaskScopeintroduces a small overhead at scale due to task-scoping logic and structured cancellation. However, it's more robust for hierarchical task lifecycles - useful for APIs and short-lived request trees.
🧠 Note on Memory Usage
This post focused on performance and thread scalability, but it does not cover memory usage, which is equally important — especially at high concurrency levels.
Measuring the heap allocation, GC behavior, and per-request memory footprint of Kotlin Coroutines vs. Java Virtual Threads would be a valuable follow-up and deserves a dedicated investigation.
💡 Beyond Concurrency: Kotlin Coroutines Are More Than Just Lightweight Threads
While this post focuses on concurrent request processing, it’s worth noting that Kotlin Coroutines truly shine in asynchronous and reactive workflows. Features like Flow and Channel enable reactive streams, backpressure handling, and pipeline-style data processing with ease. These primitives go beyond just structured concurrency — they offer a powerful, declarative way to build event-driven systems, making coroutines especially compelling for real-time, UI, or streaming APIs.
🧪 Benchmark Environment
MacBook Air M1 (16-core)
Java 21
Kotlin 2.0
💡 Conclusion
Both Kotlin Coroutines and Java Virtual Threads scale impressively — far beyond what many expect at such high concurrency levels.
That said, Kotlin Coroutines edged out Java Virtual Threads under extreme I/O-bound load, particularly at the 1 million request mark.
Honestly, I expected Java’s native virtual threads to have the advantage — so this result was surprising.
💬 Tried something similar? Got different results? I’d love to hear your thoughts — leave a comment, share your benchmarks, or join the conversation.


