Şevval Alper
Research interests
Şevval focuses on AI coding tools, AI agents, and quantum technologies.She is part of the AIMultiple benchmark team, conducting assessments and providing insights to help readers understand various emerging technologies and their applications.
Professional experience
She contributed to organizing and guiding participants in three “CERN International Masterclasses - hands-on particle physics” events in Türkiye, working alongside faculty to facilitate learning.Education
Şevval holds a Bachelor's degree in Physics from Middle East Technical University.Latest Articles from Şevval
LLM Parameters: GPT-5 High, Medium, Low and Minimal
Some LLMs, such as OpenAI’s GPT-5 family, come in different versions (e.g., GPT-5, GPT-5-mini, and GPT-5-nano) and with various parameter settings, including high, medium, low, and minimal. Below, we explore the differences between these model versions by gathering their benchmark performance and the costs to run the benchmarks. Price vs. success: Key takeaways We used…
Code Execution with MCP: A New Approach to AI Agent Efficiency
Anthropic introduced a method in which AI agents interact with Model Context Protocol (MCP) servers by writing executable code rather than making direct calls to tools. The agent treats tools as files on a computer, finds what it needs, and uses them directly with code, so intermediate data doesn’t have to pass through the model’s…
Top 10 Google Colab Alternatives
Google Colaboratory is a popular platform for data scientists and machine learning scientists, but its limitations and pricing may not meet your needs. Several alternatives offer unique features and capabilities that cater to different data science needs and scenarios. Follow the links to see the top Google Colab alternatives: Why do data scientists prefer cloud-based…
E-Commerce AI Video Maker Benchmark: Veo 3 vs Kling
Product visualization plays a crucial role in e-commerce success, yet creating high-quality product videos remains a significant challenge. Recent advancements in AI video generation technology offer promising solutions. We compared the top 6 AI video makers using 12 image-and-prompt inputs to evaluate their capabilities in generating product demonstration videos: AI video maker benchmark results Check…
AI Coding Benchmark: Claude Code vs Cursor
In AI coding, the market has fragmented into two categories: Agentic CLI tools and AI code editors embedded in IDEs. Each claims to automate development. Few comparisons show how they differ under identical workloads. We benchmarked each agent across 10 full-stack web development tasks, performing ~600 atomic validation checks per agent and more than 9,600…
Best AI Code Editor: Cursor vs Windsurf vs Replit
Making an app without coding skills is highly trending right now. But can these tools successfully build and deploy an app? We benchmarked 6 AI code editors across 10 real-world web development challenges. Each task required implementations such as backend, frontend, authentication, state management. We evaluated backend correctness, frontend behavior, and combined performance, and analyzed…
MSP Automation: Acronis, ConnectWise Automate & Rewst
Managed service providers (MSPs) handle a constant operational load, including ticket management, patch management, onboarding, alert monitoring, billing reconciliation, and documentation updates. These are necessary but time-intensive tasks. Automation changes the equation by reducing manual workload and human error risk, enabling proactive responses through continuous system monitoring, and improving response times and consistency across client…
HALC-Bench: LLM Hallucination on Long-Context Retrieval Benchmark
HALC-Bench (LLM Hallucination on Long-Context Retrieval Benchmark) measures a large language model’s resistance to fabricating evidence for a metric that does not exist in the target document by using 3 haystacks placed at the beginning, middle, and end of the model’s context window, with 204 questions. Results gpt-5.5 is the least hallucinated model in this…
Screenshot to Code: Lovable vs v0 vs Bolt
During my 20 years as a software developer, I led many front-end teams in developing pages based on designs that were inspired by screenshots. Designs can be transferred to code using AI tools. While expecting a pixel-perfect transfer is wrong in the current state of the tools, they can give developers a foundation to work…
VELC-Bench: Verification on Long Context Benchmark
The model’s ability to locate a specific metric in context, compare its value to a claim, and confirm or reject it. This tests fine-grained value matching under long-context conditions. The model must both retrieve the value and perform a precise comparison. Results The models are tested in the following context windows: openai/gpt-5.5: 1,000,000 tokens google/gemini-3.1-pro-preview:…
AIMultiple Newsletter
1 free email per week with the latest B2B tech news & expert insights to accelerate your enterprise.
